Skip to content

Commit

Permalink
Auto. Make Doomgrad HF Review on 16 January
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jan 16, 2025
1 parent 5f5bdab commit 7e5b9db
Show file tree
Hide file tree
Showing 8 changed files with 3,058 additions and 184 deletions.
322 changes: 322 additions & 0 deletions d/2025-01-15_zh_reading_task.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,322 @@

<!DOCTYPE html>
<html lang="en">
<head>
<script async src="https://www.googletagmanager.com/gtag/js?id=G-C1CRWDNJ1J"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-C1CRWDNJ1J');
</script>
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:[email protected]&display=swap" rel="stylesheet">
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Chinese reading task about ML</title>
<style>
body {
font-family: Arial, sans-serif;
background-color: #f4f4f9;
color: #333;
margin: 0;
padding: 20px;
}
.container {
max-width: 800px;
margin: 0 auto;
background-color: #fff;
padding: 20px;
border-radius: 8px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
}
h1 {
color: #0056b3;
text-align: center;
}
p {
line-height: 1.6;
}
.zh-text {
font-size: 1.3em;
font-family: 'Noto Sans SC';
font-weight: 300;
margin: 0 0 5px 0;
}
.pinyin {
padding-top: 5px;
padding-bottom: 5px;
font-style: italic;
color: #888;
}
table {
width: 100%;
border-collapse: collapse;
margin-top: 20px;
}
th, td {
padding: 12px;
border: 1px solid #ddd;
text-align: left;
}
th {
background-color: #0056b3;
color: #fff;
}
td {
background-color: #f9f9f9;
}
td.zh {
font-family: 'Noto Sans SC';
font-size: 1.2em;
font-weight: 400;
}
</style>
</head>
<body>
<div class="container">
<h1>MiniMax-01: Scaling Foundation Models with Lightning Attention</h1>
<div><p class='zh-text'>1. 我们介绍了 MiniMax-01 系列,包括 MiniMax-Text-01 和 MiniMax-VL-01。</p>
<p class='zh-text'>2. 这些模型在处理长上下文方面具有卓越能力。</p>
<p class='zh-text'>3. 核心在于闪电注意力和其高效扩展。</p>
<p class='zh-text'>4. 我们将其与混合专家模型(MoE)集成,创建了一个具有 32 个专家和 4560 亿总参数的模型。</p>
<p class='zh-text'>5. 我们开发了优化的并行策略和高效的计算通信重叠技术。</p>
<p class='zh-text'>6. 这使我们能够在数百亿参数的模型上进行高效训练和推理。</p>
<p class='zh-text'>7. MiniMax-Text-01 的上下文窗口在训练期间可达到 100 万个标记,并在推理期间扩展到 400 万个标记。</p>
<p class='zh-text'>8. MiniMax-VL-01 通过使用 5120 亿视觉语言标记进行持续训练。</p>
<p class='zh-text'>9. 实验表明,我们的模型在标准和内部基准上的性能与 GPT-4o 和 Claude-3.5-Sonnet 相当,同时提供 20-32 倍的上下文窗口。</p>
<p class='zh-text'>10. 我们在 https://github.com/MiniMax-AI 公开发布了 MiniMax-01。</p></div>
<div class="pinyin">
<p>1. Wǒmen jièshào le MiniMax-01 xìliè, bāokuò MiniMax-Text-01 hé MiniMax-VL-01</p>
<p>2. Zhèxiē móxíng zài chǔlǐ cháng shàngxìawén fāngmiàn jùyǒu zhuóyuè nénglì</p>
<p>3. Héxīn zàiyú shǎndiǎn zhùyìlì hé qí gāoxiào kuòzhǎn</p>
<p>4. Wǒmen jiāng qí yǔ hùn hé zhuānjiā móxíng (MoE) jíchéng, chuàngjiàn le yīgè jùyǒu 32 gè zhuānjiā hé 4560 yì zǒng cānshù de móxíng</p>
<p>5. Wǒmen kāifā le yōuhuà de bìngxíng cèlüè hé gāoxiào de jìsuàn tōngxìn zhòngdié jìshù</p>
<p>6. Zhè shǐ wǒmen nénggòu zài shùbǎiyì cānshù de móxíng shàng jìnxíng gāoxiào xùnliàn hé tuìlǐ</p>
<p>7. MiniMax-Text-01 de shàngxìawén chuāngkǒu zài xùnliàn qījiān kě dádào 100 wàn gè biāojì, bìng zài tuìlǐ qījiān kuòzhǎn dào 400 wàn gè biāojì</p>
<p>8. MiniMax-VL-01 tōngguò shǐyòng 5120 yì shìjué yǔyán biāojì jìnxíng chíxù xùnliàn</p>
<p>9. Shìyàn biǎomíng, wǒmen de móxíng zài biāozhǔn hé nèibù jīzhǔn shàng de xiàonénglì yǔ GPT-4o hé Claude-3</p>
<p>10. 5-Sonnet xiāngdāng, tóngshí tígōng 20-32 bèi de shàngxìawén chuāngkǒu</p>
<p>11. Wǒmen zài https://github</p>
<p>12. com/MiniMax-AI gōngkāi fābù le MiniMax-01</p>
</div>
<div><p>1. We introduced the MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01.</p>
<p>2. These models excel in handling long contexts, with a core focus on flash attention and its efficient scaling.</p>
<p>3. We integrated them with a Mixture of Experts (MoE) model, creating a model with 32 experts and a total of 4560 billion parameters.</p>
<p>4. We developed optimized parallel strategies and efficient computation-communication overlap techniques.</p>
<p>5. This enables us to perform efficient training and inference on models with hundreds of billions of parameters.</p>
<p>6. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and expands to 4 million tokens during inference.</p>
<p>7. MiniMax-VL-01 undergoes continuous training using 5120 billion vision-language tokens.</p>
<p>8. Experiments show that our models perform comparably to GPT-4o and Claude-3.</p>
<p>9. 5-Sonnet on standard and internal benchmarks while providing a 20-32 times larger context window.</p>
<p>10. We have made MiniMax-01 publicly available at https://github.</p>
<p>11. com/MiniMax-AI.</p></div>
<h2>Vocabulary</h2>
<table>
<thead>
<tr>
<th>Word</th>
<th>Pinyin</th>
<th>Translation</th>
</tr>
</thead>
<tbody>

<tr>
<td class="zh">介绍</td>
<td>jiè shào</td>
<td>introduce</td>
</tr>

<tr>
<td class="zh">系列</td>
<td>xì liè</td>
<td>series</td>
</tr>

<tr>
<td class="zh">模型</td>
<td>mó xíng</td>
<td>model</td>
</tr>

<tr>
<td class="zh">处理</td>
<td>chǔ lǐ</td>
<td>process</td>
</tr>

<tr>
<td class="zh">上下文</td>
<td>shàng xià wén</td>
<td>context</td>
</tr>

<tr>
<td class="zh">卓越</td>
<td>zhuó yuè</td>
<td>outstanding</td>
</tr>

<tr>
<td class="zh">能力</td>
<td>néng lì</td>
<td>ability</td>
</tr>

<tr>
<td class="zh">核心</td>
<td>hé xīn</td>
<td>core</td>
</tr>

<tr>
<td class="zh">闪电</td>
<td>shǎn diàn</td>
<td>lightning</td>
</tr>

<tr>
<td class="zh">注意力</td>
<td>zhù yì lì</td>
<td>attention</td>
</tr>

<tr>
<td class="zh">高效</td>
<td>gāo xiào</td>
<td>efficient</td>
</tr>

<tr>
<td class="zh">扩展</td>
<td>kuò zhǎn</td>
<td>expand</td>
</tr>

<tr>
<td class="zh">混合</td>
<td>hùn hé</td>
<td>hybrid</td>
</tr>

<tr>
<td class="zh">专家</td>
<td>zhuān jiā</td>
<td>expert</td>
</tr>

<tr>
<td class="zh">集成</td>
<td>jí chéng</td>
<td>integrate</td>
</tr>

<tr>
<td class="zh">并行</td>
<td>bìng xíng</td>
<td>parallel</td>
</tr>

<tr>
<td class="zh">策略</td>
<td>cè lüè</td>
<td>strategy</td>
</tr>

<tr>
<td class="zh">通信</td>
<td>tōng xìn</td>
<td>communication</td>
</tr>

<tr>
<td class="zh">重叠</td>
<td>chóng dié</td>
<td>overlap</td>
</tr>

<tr>
<td class="zh">技术</td>
<td>jì shù</td>
<td>technology</td>
</tr>

<tr>
<td class="zh">训练</td>
<td>xùn liàn</td>
<td>train</td>
</tr>

<tr>
<td class="zh">推理</td>
<td>tuī lǐ</td>
<td>inference</td>
</tr>

<tr>
<td class="zh">窗口</td>
<td>chuāng kǒu</td>
<td>window</td>
</tr>

<tr>
<td class="zh">标记</td>
<td>biāo jì</td>
<td>token</td>
</tr>

<tr>
<td class="zh">视觉</td>
<td>shì jué</td>
<td>visual</td>
</tr>

<tr>
<td class="zh">语言</td>
<td>yǔ yán</td>
<td>language</td>
</tr>

<tr>
<td class="zh">持续</td>
<td>chí xù</td>
<td>continuous</td>
</tr>

<tr>
<td class="zh">实验</td>
<td>shí yàn</td>
<td>experiment</td>
</tr>

<tr>
<td class="zh">性能</td>
<td>xìng néng</td>
<td>performance</td>
</tr>

<tr>
<td class="zh">基准</td>
<td>jī zhǔn</td>
<td>benchmark</td>
</tr>

<tr>
<td class="zh">公开</td>
<td>gōng kāi</td>
<td>public</td>
</tr>

<tr>
<td class="zh">发布</td>
<td>fā bù</td>
<td>release</td>
</tr>

</tbody>
</table>
</div>
</body>
</html>

Loading

0 comments on commit 7e5b9db

Please sign in to comment.