-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto. Make Doomgrad HF Review on 16 January
- Loading branch information
1 parent
5f5bdab
commit 7e5b9db
Showing
8 changed files
with
3,058 additions
and
184 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,322 @@ | ||
|
||
<!DOCTYPE html> | ||
<html lang="en"> | ||
<head> | ||
<script async src="https://www.googletagmanager.com/gtag/js?id=G-C1CRWDNJ1J"></script> | ||
<script> | ||
window.dataLayer = window.dataLayer || []; | ||
function gtag(){dataLayer.push(arguments);} | ||
gtag('js', new Date()); | ||
gtag('config', 'G-C1CRWDNJ1J'); | ||
</script> | ||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> | ||
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:[email protected]&display=swap" rel="stylesheet"> | ||
<meta charset="UTF-8"> | ||
<meta name="viewport" content="width=device-width, initial-scale=1.0"> | ||
<title>Chinese reading task about ML</title> | ||
<style> | ||
body { | ||
font-family: Arial, sans-serif; | ||
background-color: #f4f4f9; | ||
color: #333; | ||
margin: 0; | ||
padding: 20px; | ||
} | ||
.container { | ||
max-width: 800px; | ||
margin: 0 auto; | ||
background-color: #fff; | ||
padding: 20px; | ||
border-radius: 8px; | ||
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1); | ||
} | ||
h1 { | ||
color: #0056b3; | ||
text-align: center; | ||
} | ||
p { | ||
line-height: 1.6; | ||
} | ||
.zh-text { | ||
font-size: 1.3em; | ||
font-family: 'Noto Sans SC'; | ||
font-weight: 300; | ||
margin: 0 0 5px 0; | ||
} | ||
.pinyin { | ||
padding-top: 5px; | ||
padding-bottom: 5px; | ||
font-style: italic; | ||
color: #888; | ||
} | ||
table { | ||
width: 100%; | ||
border-collapse: collapse; | ||
margin-top: 20px; | ||
} | ||
th, td { | ||
padding: 12px; | ||
border: 1px solid #ddd; | ||
text-align: left; | ||
} | ||
th { | ||
background-color: #0056b3; | ||
color: #fff; | ||
} | ||
td { | ||
background-color: #f9f9f9; | ||
} | ||
td.zh { | ||
font-family: 'Noto Sans SC'; | ||
font-size: 1.2em; | ||
font-weight: 400; | ||
} | ||
</style> | ||
</head> | ||
<body> | ||
<div class="container"> | ||
<h1>MiniMax-01: Scaling Foundation Models with Lightning Attention</h1> | ||
<div><p class='zh-text'>1. 我们介绍了 MiniMax-01 系列,包括 MiniMax-Text-01 和 MiniMax-VL-01。</p> | ||
<p class='zh-text'>2. 这些模型在处理长上下文方面具有卓越能力。</p> | ||
<p class='zh-text'>3. 核心在于闪电注意力和其高效扩展。</p> | ||
<p class='zh-text'>4. 我们将其与混合专家模型(MoE)集成,创建了一个具有 32 个专家和 4560 亿总参数的模型。</p> | ||
<p class='zh-text'>5. 我们开发了优化的并行策略和高效的计算通信重叠技术。</p> | ||
<p class='zh-text'>6. 这使我们能够在数百亿参数的模型上进行高效训练和推理。</p> | ||
<p class='zh-text'>7. MiniMax-Text-01 的上下文窗口在训练期间可达到 100 万个标记,并在推理期间扩展到 400 万个标记。</p> | ||
<p class='zh-text'>8. MiniMax-VL-01 通过使用 5120 亿视觉语言标记进行持续训练。</p> | ||
<p class='zh-text'>9. 实验表明,我们的模型在标准和内部基准上的性能与 GPT-4o 和 Claude-3.5-Sonnet 相当,同时提供 20-32 倍的上下文窗口。</p> | ||
<p class='zh-text'>10. 我们在 https://github.com/MiniMax-AI 公开发布了 MiniMax-01。</p></div> | ||
<div class="pinyin"> | ||
<p>1. Wǒmen jièshào le MiniMax-01 xìliè, bāokuò MiniMax-Text-01 hé MiniMax-VL-01</p> | ||
<p>2. Zhèxiē móxíng zài chǔlǐ cháng shàngxìawén fāngmiàn jùyǒu zhuóyuè nénglì</p> | ||
<p>3. Héxīn zàiyú shǎndiǎn zhùyìlì hé qí gāoxiào kuòzhǎn</p> | ||
<p>4. Wǒmen jiāng qí yǔ hùn hé zhuānjiā móxíng (MoE) jíchéng, chuàngjiàn le yīgè jùyǒu 32 gè zhuānjiā hé 4560 yì zǒng cānshù de móxíng</p> | ||
<p>5. Wǒmen kāifā le yōuhuà de bìngxíng cèlüè hé gāoxiào de jìsuàn tōngxìn zhòngdié jìshù</p> | ||
<p>6. Zhè shǐ wǒmen nénggòu zài shùbǎiyì cānshù de móxíng shàng jìnxíng gāoxiào xùnliàn hé tuìlǐ</p> | ||
<p>7. MiniMax-Text-01 de shàngxìawén chuāngkǒu zài xùnliàn qījiān kě dádào 100 wàn gè biāojì, bìng zài tuìlǐ qījiān kuòzhǎn dào 400 wàn gè biāojì</p> | ||
<p>8. MiniMax-VL-01 tōngguò shǐyòng 5120 yì shìjué yǔyán biāojì jìnxíng chíxù xùnliàn</p> | ||
<p>9. Shìyàn biǎomíng, wǒmen de móxíng zài biāozhǔn hé nèibù jīzhǔn shàng de xiàonénglì yǔ GPT-4o hé Claude-3</p> | ||
<p>10. 5-Sonnet xiāngdāng, tóngshí tígōng 20-32 bèi de shàngxìawén chuāngkǒu</p> | ||
<p>11. Wǒmen zài https://github</p> | ||
<p>12. com/MiniMax-AI gōngkāi fābù le MiniMax-01</p> | ||
</div> | ||
<div><p>1. We introduced the MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01.</p> | ||
<p>2. These models excel in handling long contexts, with a core focus on flash attention and its efficient scaling.</p> | ||
<p>3. We integrated them with a Mixture of Experts (MoE) model, creating a model with 32 experts and a total of 4560 billion parameters.</p> | ||
<p>4. We developed optimized parallel strategies and efficient computation-communication overlap techniques.</p> | ||
<p>5. This enables us to perform efficient training and inference on models with hundreds of billions of parameters.</p> | ||
<p>6. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and expands to 4 million tokens during inference.</p> | ||
<p>7. MiniMax-VL-01 undergoes continuous training using 5120 billion vision-language tokens.</p> | ||
<p>8. Experiments show that our models perform comparably to GPT-4o and Claude-3.</p> | ||
<p>9. 5-Sonnet on standard and internal benchmarks while providing a 20-32 times larger context window.</p> | ||
<p>10. We have made MiniMax-01 publicly available at https://github.</p> | ||
<p>11. com/MiniMax-AI.</p></div> | ||
<h2>Vocabulary</h2> | ||
<table> | ||
<thead> | ||
<tr> | ||
<th>Word</th> | ||
<th>Pinyin</th> | ||
<th>Translation</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
|
||
<tr> | ||
<td class="zh">介绍</td> | ||
<td>jiè shào</td> | ||
<td>introduce</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">系列</td> | ||
<td>xì liè</td> | ||
<td>series</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">模型</td> | ||
<td>mó xíng</td> | ||
<td>model</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">处理</td> | ||
<td>chǔ lǐ</td> | ||
<td>process</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">上下文</td> | ||
<td>shàng xià wén</td> | ||
<td>context</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">卓越</td> | ||
<td>zhuó yuè</td> | ||
<td>outstanding</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">能力</td> | ||
<td>néng lì</td> | ||
<td>ability</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">核心</td> | ||
<td>hé xīn</td> | ||
<td>core</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">闪电</td> | ||
<td>shǎn diàn</td> | ||
<td>lightning</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">注意力</td> | ||
<td>zhù yì lì</td> | ||
<td>attention</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">高效</td> | ||
<td>gāo xiào</td> | ||
<td>efficient</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">扩展</td> | ||
<td>kuò zhǎn</td> | ||
<td>expand</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">混合</td> | ||
<td>hùn hé</td> | ||
<td>hybrid</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">专家</td> | ||
<td>zhuān jiā</td> | ||
<td>expert</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">集成</td> | ||
<td>jí chéng</td> | ||
<td>integrate</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">并行</td> | ||
<td>bìng xíng</td> | ||
<td>parallel</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">策略</td> | ||
<td>cè lüè</td> | ||
<td>strategy</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">通信</td> | ||
<td>tōng xìn</td> | ||
<td>communication</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">重叠</td> | ||
<td>chóng dié</td> | ||
<td>overlap</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">技术</td> | ||
<td>jì shù</td> | ||
<td>technology</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">训练</td> | ||
<td>xùn liàn</td> | ||
<td>train</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">推理</td> | ||
<td>tuī lǐ</td> | ||
<td>inference</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">窗口</td> | ||
<td>chuāng kǒu</td> | ||
<td>window</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">标记</td> | ||
<td>biāo jì</td> | ||
<td>token</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">视觉</td> | ||
<td>shì jué</td> | ||
<td>visual</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">语言</td> | ||
<td>yǔ yán</td> | ||
<td>language</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">持续</td> | ||
<td>chí xù</td> | ||
<td>continuous</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">实验</td> | ||
<td>shí yàn</td> | ||
<td>experiment</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">性能</td> | ||
<td>xìng néng</td> | ||
<td>performance</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">基准</td> | ||
<td>jī zhǔn</td> | ||
<td>benchmark</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">公开</td> | ||
<td>gōng kāi</td> | ||
<td>public</td> | ||
</tr> | ||
|
||
<tr> | ||
<td class="zh">发布</td> | ||
<td>fā bù</td> | ||
<td>release</td> | ||
</tr> | ||
|
||
</tbody> | ||
</table> | ||
</div> | ||
</body> | ||
</html> | ||
|
Oops, something went wrong.