Skip to content

Commit

Permalink
Auto. Make Doomgrad HF Review on 15 January
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jan 15, 2025
1 parent 388ecda commit 5f5bdab
Show file tree
Hide file tree
Showing 7 changed files with 241 additions and 241 deletions.
8 changes: 4 additions & 4 deletions d/2025-01-15.html

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions d/2025-01-15.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
"en": "January 15",
"zh": "1月15日"
},
"time_utc": "2025-01-15 21:09",
"time_utc": "2025-01-15 22:08",
"weekday": 2,
"issue_id": 1690,
"issue_id": 1691,
"home_page_url": "https://huggingface.co/papers",
"papers": [
{
"id": "https://huggingface.co/papers/2501.08313",
"title": "MiniMax-01: Scaling Foundation Models with Lightning Attention",
"url": "https://huggingface.co/papers/2501.08313",
"abstract": "We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.",
"score": 190,
"score": 192,
"issue_id": 1672,
"pub_date": "2025-01-14",
"pub_date_card": {
Expand Down
108 changes: 54 additions & 54 deletions hf_papers.json

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions index.html

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions log.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
[15.01.2025 22:09] Read previous papers.
[15.01.2025 22:09] Generating top page (month).
[15.01.2025 22:09] Writing top page (month).
[15.01.2025 23:09] Read previous papers.
[15.01.2025 23:09] Generating top page (month).
[15.01.2025 23:09] Writing top page (month).
338 changes: 169 additions & 169 deletions logs/2025-01-15_last_log.txt

Large diffs are not rendered by default.

8 changes: 4 additions & 4 deletions m/2025-01.html

Large diffs are not rendered by default.

0 comments on commit 5f5bdab

Please sign in to comment.