From d914ca4fc36cbf72e2abecb2355099ab88476af6 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Thu, 30 Jan 2025 00:44:13 +0000
Subject: [PATCH] Auto. Make Doomgrad HF Review on 30 January

---
 d/2025-01-29_zh_reading_task.html |  180 ++++
 d/2025-01-30.html                 | 1270 +++++++++++++++++++++++++++++
 d/2025-01-30.json                 |  513 ++++++++++++
 hf_papers.json                    |  130 +--
 index.html                        |   22 +-
 log.txt                           |    6 +-
 logs/2025-01-30_last_log.txt      |   90 ++
 m/2025-01.html                    |    6 +-
 8 files changed, 2135 insertions(+), 82 deletions(-)
 create mode 100644 d/2025-01-29_zh_reading_task.html
 create mode 100644 d/2025-01-30.html
 create mode 100644 d/2025-01-30.json
 create mode 100644 logs/2025-01-30_last_log.txt
diff --git a/d/2025-01-29_zh_reading_task.html b/d/2025-01-29_zh_reading_task.html
new file mode 100644
index 000000000..ffc6f226f
--- /dev/null
+++ b/d/2025-01-29_zh_reading_task.html
@@ -0,0 +1,180 @@
+
+        <!DOCTYPE html>
+        <html lang="en">
+        <head>
+            <script async src="https://www.googletagmanager.com/gtag/js?id=G-C1CRWDNJ1J"></script>
+            <script>
+                window.dataLayer = window.dataLayer || [];
+                function gtag(){dataLayer.push(arguments);}
+                gtag('js', new Date());
+                gtag('config', 'G-C1CRWDNJ1J');
+            </script>
+            <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
+            <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@100..900&display=swap" rel="stylesheet">
+            <meta charset="UTF-8">
+            <meta name="viewport" content="width=device-width, initial-scale=1.0">
+            <title>Chinese reading task about ML</title>
+            <style>
+                body {
+                    font-family: Arial, sans-serif;
+                    background-color: #f4f4f9;
+                    color: #333;
+                    margin: 0;
+                    padding: 20px;
+                }
+                .container {
+                    max-width: 800px;
+                    margin: 0 auto;
+                    background-color: #fff;
+                    padding: 20px;
+                    border-radius: 8px;
+                    box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
+                }
+                h1 {
+                    color: #0056b3;
+                    text-align: center;
+                }
+                p {
+                    line-height: 1.6;
+                }
+                .zh-text {
+                    font-size: 1.3em;
+                    font-family: 'Noto Sans SC';
+                    font-weight: 300;
+                    margin: 0 0 5px 0;
+                }
+                .pinyin {
+                    padding-top: 5px;
+                    padding-bottom: 5px;
+                    font-style: italic;
+                    color: #888;
+                }
+                table {
+                    width: 100%;
+                    border-collapse: collapse;
+                    margin-top: 20px;
+                }
+                th, td {
+                    padding: 12px;
+                    border: 1px solid #ddd;
+                    text-align: left;
+                }
+                th {
+                    background-color: #0056b3;
+                    color: #fff;
+                }
+                td {
+                    background-color: #f9f9f9;
+                }
+                td.zh {
+                    font-family: 'Noto Sans SC';
+                    font-size: 1.2em;
+                    font-weight: 400;
+                }
+            </style>
+        </head>
+        <body>
+            <div class="container">
+                <h1>SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training</h1>
+                <div><p class='zh-text'>1. 这篇文章比较了监督微调（SFT）和强化学习（RL）在基础模型上的作用。</p>
+<p class='zh-text'>2. 研究发现，RL在文本和视觉任务上都表现出更好的泛化能力。</p>
+<p class='zh-text'>3. SFT倾向于记住训练数据，而RL能够处理未见过的变体。</p>
+<p class='zh-text'>4. RL还提高了模型的视觉识别能力。</p>
+<p class='zh-text'>5. 然而，SFT对于RL的有效训练仍然不可或缺。</p></div>
+                <div class="pinyin">
+                    <p>1. 这篇文章比较了监督微调（SFT）和强化学习（RL）在基础模型上的作用。研究发现，RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据，而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而，SFT对于RL的有效训练仍然不可或缺。
+
+Zhè piān wénzhāng bǐjiào le jiàndū wēitiáo (SFT) hé qiáng huà xuéxí (RL) zài jīchǔ móxíng shàng de zuòyòng</p>
+<p>2.  Yánjiū fāxiàn, RL zài wénběn hé shìjué rènwù shàng dōu biǎoxiàn chū gèng hǎo de fànhuà nénglì</p>
+<p>3.  SFT qīngxiàng yú jìzhù xùnliàn shùjù, ér RL nénggòu chǔlǐ wèi jiànguò de biàntǐ</p>
+<p>4.  RL hái tígāo le móxíng de shìjué shíbié nénglì</p>
+<p>5.  Rán'ér, SFT duìyú RL de yǒuxiào xùnliàn réngrán bùkě huòquē</p>
+                </div>
+                <div><p>1. This article compares the roles of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on base models.</p>
+<p>2.  The study found that RL demonstrates better generalization capabilities in both textual and visual tasks.</p>
+<p>3.  SFT tends to memorize training data, while RL can handle unseen variants.</p>
+<p>4.  RL also enhances the model's visual recognition capabilities.</p>
+<p>5.  However, SFT remains indispensable for effective RL training.</p></div>
+                <h2>Vocabulary</h2>
+                <table>
+                    <thead>
+                        <tr>
+                            <th>Word</th>
+                            <th>Pinyin</th>
+                            <th>Translation</th>
+                        </tr>
+                    </thead>
+                    <tbody>
+        
+                        <tr>
+                            <td class="zh">监督</td>
+                            <td>jiàn dū</td>
+                            <td>supervised</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">微调</td>
+                            <td>wēi tiáo</td>
+                            <td>fine-tuning</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">强化学习</td>
+                            <td>qiáng huà xué xí</td>
+                            <td>reinforcement learning</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">基础模型</td>
+                            <td>jī chǔ mó xíng</td>
+                            <td>foundational model</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">作用</td>
+                            <td>zuò yòng</td>
+                            <td>effect</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">泛化</td>
+                            <td>fàn huà</td>
+                            <td>generalization</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">倾向于</td>
+                            <td>qīng xiàng yú</td>
+                            <td>tend to</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">未见过</td>
+                            <td>wèi jiàn guò</td>
+                            <td>unseen</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">变体</td>
+                            <td>biàn tǐ</td>
+                            <td>variant</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">视觉识别</td>
+                            <td>shì jué shí bié</td>
+                            <td>visual recognition</td>
+                        </tr>
+            
+                        <tr>
+                            <td class="zh">不可或缺</td>
+                            <td>bù kě huò quē</td>
+                            <td>indispensable</td>
+                        </tr>
+            
+                    </tbody>
+                </table>
+            </div>
+        </body>
+        </html>
+        
\ No newline at end of file
diff --git a/d/2025-01-30.html b/d/2025-01-30.html
new file mode 100644
index 000000000..b5910dbb7
--- /dev/null
+++ b/d/2025-01-30.html
@@ -0,0 +1,1270 @@
+
+<!DOCTYPE html>
+<html>
+<head>
+    <script async src="https://www.googletagmanager.com/gtag/js?id=G-C1CRWDNJ1J"></script>
+    <script>
+        window.dataLayer = window.dataLayer || [];
+        function gtag(){dataLayer.push(arguments);}
+        gtag('js', new Date());
+        gtag('config', 'G-C1CRWDNJ1J');
+    </script>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"><title>HF. 8 papers. January 29.</title>
+<link rel="icon" href="favicon.svg" sizes="any" type="image/svg+xml">
+    <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap" rel="stylesheet">
+    <link href="https://fonts.googleapis.com/css2?family=Roboto+Slab:wght@100..900&family=Tiny5&display=swap" rel="stylesheet">
+    <style>
+        :root {
+            --primary-color: cornflowerblue;
+            --primary-color-dark: #fffd87cf;
+            --secondary-color: #fff;
+            --background-color: #eee;
+            --text-color: #333333;
+            --header-color: cornflowerblue;
+            --body-color: #eee;
+            --menu-color: #002370;
+        }
+        .background-digit {
+            position: absolute;
+            font-family: 'Tiny5';
+            bottom: -20px;
+            right: -10px;
+            font-size: 8em;
+            font-weight: 400;
+            color: #0989ea22;
+            z-index: 2;
+            line-height: 1;
+        }
+        .dark-theme .background-digit {
+            color: #e9e78f3d;
+        }
+        body {
+            font-family: 'Roboto Slab', sans-serif;
+            line-height: 1.6;
+            color: var(--text-color);
+            margin: 0;
+            padding: 0;
+            min-height: 100vh;
+            display: flex;
+            flex-direction: column;
+        }
+        .container {
+            max-width: 1500px;
+            margin: 0 auto;
+            flex: 1 0 auto;
+            width: 100%
+        }
+        .a-clean {
+            color: var(--secondary-color);
+            text-decoration: none;
+        }
+        .a-clean:hover {
+            color: #fff;
+        }
+        header {
+            padding: 3.6em 0 2.4em 0;
+            text-align: center;
+        }
+        footer {
+            background-color: var(--primary-color);
+            color: white;
+            text-align: center;
+            margin-top: 2em;
+            flex-shrink: 0;
+            padding: 20px;
+        }
+        h1 {
+            font-size: 2.4em;
+            margin: 0;
+            font-weight: 700;
+        }
+        .article-title-cont {
+            margin: -21px -21px 0px -21px;
+            padding: 10px 20px;
+            background: cornflowerblue;
+            display: table;
+            min-height: 5.9em;
+        }
+        .dark-theme .article-title-cont {
+            background: #444444;
+        }
+        .article-title {
+            color: white;           
+        }
+        .article-title h2 {
+            margin: 0px;
+            padding: 0px;
+            font-weight: 400;
+            text-align:center;
+        }
+        h2 {
+            # color: var(--primary-color);
+            font-size: 1.2em;
+            margin-top: 0;
+            margin-bottom: 0.5em;
+        }
+        header p {
+            font-size: 1.2em;
+            margin-top: 0.5em;
+            font-weight: 300;
+        }
+        main {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(400px, 1fr));
+            gap: 1.5em;
+            padding: 10px 20px 20px 20px;
+        }
+        body.dark-tmeme>header {
+            background-color: background-color: #333333;
+            color: white;
+        }
+        body.dark-theme>div>main>article>div.article-content>p.meta {
+            color: #fff;
+        }
+        body.light-theme>div>main>article>div.article-content>p.meta {
+            color: #555;
+        }
+        body.dark-theme>div>main>article>div.article-content>p.pub-date {
+            color: #ccc;
+        }
+        body.light-theme>div>main>article>div.article-content>p.pub-date {
+            color: #555;
+        }
+        body.dark-theme>div>main>article>div.article-content>div.tags {
+            color: #ccc;
+        }
+        body.light-theme>div>main>article>div.article-content>div.tags {
+            color: #fff;
+        }
+        body.light-theme>header {
+            background-color: var(--header-color);
+            color: white;
+        }
+        article {
+            display: flex;
+            flex-direction: row;
+            justify-content: center;
+        }
+        .article-content {
+            border-radius: 5px;
+            border: 1px solid #ddd;
+            overflow: hidden;
+            transition: background-color 0.2s ease;
+            padding: 1.3em;
+            flex-grow: 1;
+            display: flex;
+            flex-direction: column;
+            position: relative;
+            z-index: 1;
+            cursor: pointer;
+            max-width: 800px;
+            position: relative;
+        }
+        body.dark-theme>div>main>article>div.article-content {
+            background-color: #444;
+            border: none;
+        }
+        body.light-theme>div>main>article>div.article-content {
+            background-color: #fff;
+        }
+        body.dark-theme>div>main>article>div.article-content:hover {
+            background-color: #414141;
+        }
+        body.light-theme>div>main>article>div.article-content:hover {
+            background-color: #fafafa;
+        }
+        .meta {
+            font-size: 0.9em;
+            margin-bottom: 0em;
+            font-weight: 500;
+            margin: 20px 0 0px 0;
+            padding-bottom: 20px;
+            border-bottom: 1px solid #ddd;
+        }
+        .pub-date {
+            font-size: 0.8em;
+            margin-bottom: 0.8em;
+            font-weight: 400;
+            text-align: right;
+            font-family: Roboto;
+        }
+        .tags {
+            font-size: 0.9em;
+            margin-bottom: 0;
+            position: absolute;
+            bottom: 0px;
+            font-weight: 300;
+            font-family: 'Roboto Slab';
+            background: #555;
+            left: 0;
+            width: 100%;
+            padding: 10px 20px;
+        }
+        .abstract {
+            position: relative;
+            max-height: 170px;
+            overflow: hidden;
+            transition: max-height 0.3s ease;
+            cursor: pointer;
+        }
+        .abstract.expanded {
+            max-height: 1000px;
+        }
+        .abstract-toggle {
+            position: absolute;
+            bottom: 4px;
+            right: 0;
+            cursor: pointer;
+            color: var(--primary-color);
+            float: right;
+            font-weight: 400;
+        }
+        .explanation {
+            background-color: #e8f5e9;
+            border-left: 4px solid var(--secondary-color);
+            padding: 1em;
+            margin-top: 1.5em;
+        }
+        .links {
+            margin-top: 1.5em;
+            margin-bottom: 20px;
+        }
+        .affiliations {
+            margin-bottom: 50px;
+            padding:10px;
+            font-size: 0.9em;
+            text-align: center
+        }
+        a {
+            color: var(--primary-color);
+            text-decoration: none;
+            font-weight: 500;
+            transition: color 0.3s ease;
+        }
+        .dark-theme a {
+            color: var(--primary-color-dark);
+        }
+        a:hover {
+            color: #e73838;
+        }
+        .light-theme {
+            background-color: var(--body-color);
+            color: #333333;
+        }
+        .dark-theme {
+            background-color: #333333;
+            color: #ffffff;
+        }
+        .theme-switch {
+            position: absolute;
+            top: 20px;
+            right: 20px;
+            display: flex;
+            align-items: center;
+        }
+        .switch {
+            position: relative;
+            display: inline-block;
+            width: 50px;
+            height: 30px;
+        }
+        .switch input {
+            opacity: 0;
+            width: 0;
+            height: 0;
+        }
+        .slider {
+            position: absolute;
+            cursor: pointer;
+            top: 0;
+            left: 0;
+            right: 0;
+            bottom: 0;
+            background-color: #ccc;
+            transition: .4s;
+            border-radius: 30px;
+        }
+        .slider:before {
+            position: absolute;
+            content: "";
+            height: 24px;
+            width: 24px;
+            left: 3px;
+            bottom: 3px;
+            background-color: white;
+            transition: .4s;
+            border-radius: 50%;
+        }
+        input:checked + .slider {
+            background-color: var(--primary-color);
+        }
+        input:checked + .slider:before {
+            transform: translateX(20px);
+        }
+        .switch-label {
+            margin-right: 10px;
+        }
+
+        .sub-header-container {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            flex-wrap: wrap;
+            gap: 15px;
+            margin-top: 7px;
+            padding: 0 20px;
+        }
+        .sub-header-container-2 {
+            display: flex;
+            justify-content: left;
+            align-items: center;
+            flex-wrap: wrap;
+            gap: 15px;
+            margin: 0 auto;
+            padding: 0 20px;
+        }
+        .update-info-container {
+            margin-top: 15px;
+            margin-bottom: 0px;
+            text-align: left;
+            flex: 1;
+        }
+        .sort-container {
+            margin-top: 15px;
+            margin-bottom: 0px;
+            text-align: right;
+            flex: 2;
+        }
+        
+        .category-toggle-container {
+            display: inline-block;
+            margin-top: 15px;
+            margin-bottom: 10px;
+            cursor: pointer;
+        }
+        .category-option-container {
+            margin-top: 15px;
+            margin-bottom: 10px;
+            display: none;
+            margin-left: auto;
+        }
+        .category-option-container.expanded {
+            display: block;
+        }
+
+        .sort-dropdown {
+            padding: 5px 10px;
+            font-size: 16px;
+            border-radius: 5px;
+            border: 1px solid #ccc;
+            background-color: white;
+            color: var(--text-color);
+            font-family: 'Roboto Slab', sans-serif;
+        }
+        .sort-label {
+            margin-right: 10px;
+            font-size: 1.0em !important;
+        }        
+        .dark-theme .sort-dropdown {
+            background-color: #444;
+            color: white;
+            border-color: var(--text-color);
+        }
+        .title-sign {
+            display: inline-block;
+            transition: all 0.5s ease;            
+        }
+        .rotate {
+            transform: rotate(45deg) translateY(-6px);
+            transform-origin: center;
+        }
+        .title-text {
+            display: inline;
+            padding-left: 10px;
+        }
+        .summary_title {
+            font-size: 1.2em;
+            font-weight: bold;
+            color: #222;
+            margin-bottom: 5px;
+        }
+        .summary_text {
+
+        }
+        .summary_image {
+            max-height: 500px;
+            max-width: 100%;
+            align: center;
+            margin-top: 40px;        
+            margin-bottom: 60px;        
+        }
+        .category-filters {
+            margin-top: 20px;
+            margin-bottom: 20px;
+            text-align: center;
+            display: none;
+        }
+        .category-filters.expanded {
+            display: block;
+            margin-top: 10px;
+        }
+        .category-button {
+            display: inline-block;
+            margin: 5px;
+            padding: 5px 10px;
+            border-radius: 15px;
+            background-color: #f0f0f0;
+            color: #333;
+            cursor: pointer;
+            transition: background-color 0.3s ease;
+        }
+        .category-button.active {
+            background-color: var(--primary-color);
+            color: white;
+        }
+        .category-button.inactive:not(.active) {
+            color: #ccc;
+        }
+        .dark-theme .category-button {
+            background-color: #555;
+            color: #fff;
+        }
+        .dark-theme .category-button.active {
+            background-color: var(--primary-color);
+        }
+        .dark-theme .category-button.inactive:not(.active) {
+            color: #888;
+        }
+        .clear-categories {
+            display: inline-block;
+            margin: 5px;
+            padding: 5px 10px;
+            border-radius: 15px;
+            background-color: #f0f0f0;
+            color: #333;
+            cursor: pointer;
+            transition: background-color 0.3s ease;
+        }
+        .clear-categories:hover {
+            background-color: #bbb;
+        }
+        .svg-container {
+            display: inline-block;
+            position: relative;
+            overflow: hidden;
+        }
+        .svg-container span {
+            position: relative;
+            z-index: 1;
+        }
+        .svg-container svg {
+            position: absolute;
+            bottom: 0;
+            left: 0;
+            z-index: 0;
+        }
+
+        .nav-menu {
+            background-color: var(--menu-color);
+            padding: 2px 0 2px 0;
+            display: inline-block;
+            position: relative;
+            overflow: hidden;
+            width: 100%;
+        }        
+        .nav-container {
+            max-width: 1500px;
+            margin: 0 auto;
+            display: flex;
+            justify-content: left;
+            gap: 3em;
+        }
+        .nav-container span a {
+            color: white;
+        }        
+        .nav-item {
+            color: white;
+            padding: 3px 0px;
+            cursor: pointer;
+            font-weight: 400;
+        }         
+        .nav-prev {
+            margin-left: 20px;
+        }        
+        .nav-item:hover {
+            background-color: rgba(255, 255, 255, 0.1);
+            border-color: rgba(255, 255, 255, 0.3);
+        }        
+        .language-flags {
+            display: flex;
+            gap: 7px;
+            padding: 5px 20px 0 0;
+            margin-left: auto;
+        }
+        .flag-svg {
+            width: 22px;
+            height: 22px;
+            cursor: pointer;
+            opacity: 0.4;
+            transition: opacity 0.3s ease;
+            border-radius: 2px;
+        }
+        .flag-svg.active {
+            opacity: 1;
+        }
+        .flag-svg:hover {
+            opacity: 0.8;
+        }
+        
+        .dark-theme .nav-menu {
+            background-color: #333;
+        }
+        .dark-theme .nav-item {
+            color: white;
+        }
+        
+        .dark-theme .nav-item:hover {
+            background-color: rgba(255, 255, 255, 0.05);
+        }
+
+        .pointer { cursor: pointer; }
+
+        .article-pdf-title-img {
+            max-width: 100%;
+            max-height: 400px;
+            display: inline-block;
+            margin-top: 10px;
+            margin-bottom: 10px;
+            border-radius: 5px;
+        }
+        .article-pdf-title-img-cont {
+            text-align: center;
+        }
+        .dark-theme .article-pdf-title-img {
+            opacity: 0.8;
+            filter: grayscale(1);
+        }
+
+        @media (max-width: 600px) {
+            .nav-container {
+                flex-direction: row;
+                gap: 1.5em;
+            }            
+            .nav-item {
+                padding: 3px 0px;
+            }
+        }
+        
+        @media (max-width: 768px) {
+            .category-filters {
+                display: none;
+            }
+            .category-toggle {
+                display: inline-block;
+                width: 100%;
+                text-align: left;
+            }
+            .category-filters.expanded {
+                display: block;
+                margin-top: 10px;
+            }
+        }
+        @media (max-width: 600px) {
+            .sub-header-container {
+                flex-direction: column;
+                align-items: flex-start;
+            }
+            .sort-container {
+                width: 100%;
+                display: flex;
+                justify-content: left;
+                margin: 0 auto;
+            }
+            .sort-dropdown {
+                margin-left: auto;
+            }
+            .sort-label {
+                margin-top: 5px;
+                float: left;
+            }
+
+            .sub-header-container-2 {
+                flex-direction: row;
+                align-items: flex-start;
+            }
+            .update-info-container {
+                text-align: left;
+                width: 100%;
+                margin-bottom: 0px;
+            }
+            .category-toggle-container {
+                margin-top: 15px;
+                text-align: left;
+                margin-bottom: 10px;
+            }
+            .category-option-container {
+                margin-top: 15px;
+                text-align: center;
+                margin-bottom: 10px;
+            }            
+            main {
+                grid-template-columns: repeat(auto-fit);
+                gap: 0em;
+                padding: 10px 0 20px 0;
+            }
+            footer {
+                margin-top: -20px;
+            }
+            article>div.article-content {
+                border-radius: 0px;
+            }
+        }
+    </style>
+    <script>
+    function toggleAbstract(id) {
+        var abstract = document.getElementById('abstract-' + id);
+        var toggle = document.getElementById('toggle-' + id);
+        if (abstract.classList.contains('expanded')) {
+            abstract.classList.remove('expanded');
+            toggle.textContent = '...';
+        } else {
+            abstract.classList.add('expanded');
+            toggle.textContent = '';
+        }
+    }
+    function getTimeDiff(dateString, lang='ru') {
+        const timeUnits = {
+            ru: {
+                minute: ["минуту", "минуты", "минут"],
+                hour: ["час", "часа", "часов"],
+                day: ["день", "дня", "дней"],
+                justNow: "только что",
+                ago: "назад"
+            },
+            en: {
+                minute: ["minute", "minutes", "minutes"],
+                hour: ["hour", "hours", "hours"],
+                day: ["day", "days", "days"],
+                justNow: "just now",
+                ago: "ago"
+            },
+            zh: {
+                minute: ["分钟", "分钟", "分钟"],
+                hour: ["小时", "小时", "小时"],
+                day: ["天", "天", "天"],
+                justNow: "刚刚",
+                ago: "前"
+            }
+        };
+
+        function getPlural(number, words, lang) {
+            if (lang === 'ru') {
+                if (number % 10 === 1 && number % 100 !== 11) {
+                    return words[0];
+                } else if (number % 10 >= 2 && number % 10 <= 4 && (number % 100 < 10 || number % 100 >= 20)) {
+                    return words[1];
+                } else {
+                    return words[2];
+                }
+            } else if (lang === 'en') {
+                return number === 1 ? words[0] : words[1];
+            } else {
+                // Chinese doesn't need plural forms
+                return words[0];
+            }
+        }
+
+        function formatTimeDiff(number, unit, lang) {
+            const unitWord = getPlural(number, timeUnits[lang][unit], lang);
+            
+            if (lang === 'zh') {
+                return `${number}${unitWord}${timeUnits[lang].ago}`;
+            } else {
+                return `${number} ${unitWord} ${timeUnits[lang].ago}`;
+            }
+        }
+
+        if (!['ru', 'en', 'zh'].includes(lang)) {
+            throw new Error('Unsupported language. Supported languages are: ru, en, zh');
+        }
+
+        const pastDate = new Date(dateString.replace(" ", "T") + ":00Z");
+        const currentDate = new Date();
+        const diffInSeconds = Math.floor((currentDate - pastDate) / 1000);
+        
+        const minutes = Math.floor(diffInSeconds / 60);
+        const hours = Math.floor(diffInSeconds / 3600);
+        const days = Math.floor(diffInSeconds / 86400);
+
+        if (minutes === 0) {
+            return timeUnits[lang].justNow;
+        } else if (minutes < 60) {
+            return formatTimeDiff(minutes, 'minute', lang);
+        } else if (hours < 24) {
+            return formatTimeDiff(hours, 'hour', lang);
+        } else {
+            return formatTimeDiff(days, 'day', lang);
+        }
+    }
+    function isToday(dateString) {
+        const inputDate = new Date(dateString);
+        const today = new Date();
+        return (
+            inputDate.getFullYear() === today.getFullYear() &&
+            inputDate.getMonth() === today.getMonth() &&
+            inputDate.getDate() === today.getDate()
+        );
+    }
+    function isCurrentMonth(dateString) {
+        const inputDate = new Date(dateString);
+        const today = new Date();
+        return (
+            inputDate.getFullYear() === today.getFullYear() &&
+            inputDate.getMonth() === today.getMonth()
+        );
+    }
+    function formatArticlesTitle(number, lang='ru') {
+        const lastDigit = number % 10;
+        const lastTwoDigits = number % 100;
+        let word;
+
+        if (!['ru', 'en', 'zh'].includes(lang)) {
+            throw new Error('Unsupported language. Supported languages are: ru, en, zh');
+        }
+
+        if (lang === 'ru') {
+            if (lastTwoDigits >= 11 && lastTwoDigits <= 14) {
+                word = "статей";
+            } else if (lastDigit === 1) {
+                word = "статья";
+            } else if (lastDigit >= 2 && lastDigit <= 4) {
+                word = "статьи";
+            } else {
+                word = "статей";
+            }
+        } else if (lang === 'en') {
+            if (number === 1) {
+                word = 'paper'
+            } else {
+                word = 'papers'
+            }
+        } else if (lang === 'zh') {
+            word = "篇论文"
+        }
+
+        if (lang === 'zh') {
+            return `${number}${word}`;
+        } else {
+            return `${number} ${word}`;
+        }
+    }
+    </script>
+</head>
+<body class="light-theme">
+    <header>
+        <div class="container">            
+            <a href="https://hfday.ru" class="a-clean"><h1 class="title-sign" id="doomgrad-icon">🔺</h1><h1 class="title-text" id="doomgrad">hf daily</h1></a>
+            <p><span id="title-date">29 января</span> | <span id="title-articles-count">8 papers</span></p>
+        </div>
+        <div class="theme-switch">
+            <label class="switch">
+                <input type="checkbox" id="theme-toggle">
+                <span class="slider"></span>
+            </label>
+        </div>
+    </header>
+    <div class="nav-menu">
+        <div class="nav-container">
+            <span class="nav-item nav-prev" id="nav-prev"><a href="/d/2025-01-28.html">⬅️ <span id="prev-date">28.01</span></a></span>
+            <span class="nav-item" id="nav-next"><a href="/d/2025-01-30.html">➡️ <span id="next-date">30.01</span></a></span>
+            <span class="nav-item" id="nav-monthly"><a href="/m/2025-01.html">📈 <span id='top-month-label'>Месяц</span></a></span>
+            <div class="language-flags">
+                <svg class="flag-svg" data-lang="ru" xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32"><path fill="#1435a1" d="M1 11H31V21H1z"></path><path d="M5,4H27c2.208,0,4,1.792,4,4v4H1v-4c0-2.208,1.792-4,4-4Z" fill="#fff"></path><path d="M5,20H27c2.208,0,4,1.792,4,4v4H1v-4c0-2.208,1.792-4,4-4Z" transform="rotate(180 16 24)" fill="#c53a28"></path><path d="M27,4H5c-2.209,0-4,1.791-4,4V24c0,2.209,1.791,4,4,4H27c2.209,0,4-1.791,4-4V8c0-2.209-1.791-4-4-4Zm3,20c0,1.654-1.346,3-3,3H5c-1.654,0-3-1.346-3-3V8c0-1.654,1.346-3,3-3H27c1.654,0,3,1.346,3,3V24Z" opacity=".15"></path><path d="M27,5H5c-1.657,0-3,1.343-3,3v1c0-1.657,1.343-3,3-3H27c1.657,0,3,1.343,3,3v-1c0-1.657-1.343-3-3-3Z" fill="#fff" opacity=".2"></path></svg>
+                <svg class="flag-svg" data-lang="zh" xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32"><rect x="1" y="4" width="30" height="24" rx="4" ry="4" fill="#db362f"></rect><path d="M27,4H5c-2.209,0-4,1.791-4,4V24c0,2.209,1.791,4,4,4H27c2.209,0,4-1.791,4-4V8c0-2.209-1.791-4-4-4Zm3,20c0,1.654-1.346,3-3,3H5c-1.654,0-3-1.346-3-3V8c0-1.654,1.346-3,3-3H27c1.654,0,3,1.346,3,3V24Z" opacity=".15"></path><path fill="#ff0" d="M7.958 10.152L7.19 7.786 6.421 10.152 3.934 10.152 5.946 11.614 5.177 13.979 7.19 12.517 9.202 13.979 8.433 11.614 10.446 10.152 7.958 10.152z"></path><path fill="#ff0" d="M12.725 8.187L13.152 8.898 13.224 8.072 14.032 7.886 13.269 7.562 13.342 6.736 12.798 7.361 12.035 7.037 12.461 7.748 11.917 8.373 12.725 8.187z"></path><path fill="#ff0" d="M14.865 10.372L14.982 11.193 15.37 10.46 16.187 10.602 15.61 10.007 15.997 9.274 15.253 9.639 14.675 9.044 14.793 9.865 14.048 10.23 14.865 10.372z"></path><path fill="#ff0" d="M15.597 13.612L16.25 13.101 15.421 13.13 15.137 12.352 14.909 13.149 14.081 13.179 14.769 13.642 14.541 14.439 15.194 13.928 15.881 14.391 15.597 13.612z"></path><path fill="#ff0" d="M13.26 15.535L13.298 14.707 12.78 15.354 12.005 15.062 12.46 15.754 11.942 16.402 12.742 16.182 13.198 16.875 13.236 16.047 14.036 15.827 13.26 15.535z"></path><path d="M27,5H5c-1.657,0-3,1.343-3,3v1c0-1.657,1.343-3,3-3H27c1.657,0,3,1.343,3,3v-1c0-1.657-1.343-3-3-3Z" fill="#fff" opacity=".2"></path></svg>
+                <svg class="flag-svg" data-lang="en" xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32"><rect x="1" y="4" width="30" height="24" rx="4" ry="4" fill="#fff"></rect><path d="M1.638,5.846H30.362c-.711-1.108-1.947-1.846-3.362-1.846H5c-1.414,0-2.65,.738-3.362,1.846Z" fill="#a62842"></path><path d="M2.03,7.692c-.008,.103-.03,.202-.03,.308v1.539H31v-1.539c0-.105-.022-.204-.03-.308H2.03Z" fill="#a62842"></path><path fill="#a62842" d="M2 11.385H31V13.231H2z"></path><path fill="#a62842" d="M2 15.077H31V16.923000000000002H2z"></path><path fill="#a62842" d="M1 18.769H31V20.615H1z"></path><path d="M1,24c0,.105,.023,.204,.031,.308H30.969c.008-.103,.031-.202,.031-.308v-1.539H1v1.539Z" fill="#a62842"></path><path d="M30.362,26.154H1.638c.711,1.108,1.947,1.846,3.362,1.846H27c1.414,0,2.65-.738,3.362-1.846Z" fill="#a62842"></path><path d="M5,4h11v12.923H1V8c0-2.208,1.792-4,4-4Z" fill="#102d5e"></path><path d="M27,4H5c-2.209,0-4,1.791-4,4V24c0,2.209,1.791,4,4,4H27c2.209,0,4-1.791,4-4V8c0-2.209-1.791-4-4-4Zm3,20c0,1.654-1.346,3-3,3H5c-1.654,0-3-1.346-3-3V8c0-1.654,1.346-3,3-3H27c1.654,0,3,1.346,3,3V24Z" opacity=".15"></path><path d="M27,5H5c-1.657,0-3,1.343-3,3v1c0-1.657,1.343-3,3-3H27c1.657,0,3,1.343,3,3v-1c0-1.657-1.343-3-3-3Z" fill="#fff" opacity=".2"></path><path fill="#fff" d="M4.601 7.463L5.193 7.033 4.462 7.033 4.236 6.338 4.01 7.033 3.279 7.033 3.87 7.463 3.644 8.158 4.236 7.729 4.827 8.158 4.601 7.463z"></path><path fill="#fff" d="M7.58 7.463L8.172 7.033 7.441 7.033 7.215 6.338 6.989 7.033 6.258 7.033 6.849 7.463 6.623 8.158 7.215 7.729 7.806 8.158 7.58 7.463z"></path><path fill="#fff" d="M10.56 7.463L11.151 7.033 10.42 7.033 10.194 6.338 9.968 7.033 9.237 7.033 9.828 7.463 9.603 8.158 10.194 7.729 10.785 8.158 10.56 7.463z"></path><path fill="#fff" d="M6.066 9.283L6.658 8.854 5.927 8.854 5.701 8.158 5.475 8.854 4.744 8.854 5.335 9.283 5.109 9.979 5.701 9.549 6.292 9.979 6.066 9.283z"></path><path fill="#fff" d="M9.046 9.283L9.637 8.854 8.906 8.854 8.68 8.158 8.454 8.854 7.723 8.854 8.314 9.283 8.089 9.979 8.68 9.549 9.271 9.979 9.046 9.283z"></path><path fill="#fff" d="M12.025 9.283L12.616 8.854 11.885 8.854 11.659 8.158 11.433 8.854 10.702 8.854 11.294 9.283 11.068 9.979 11.659 9.549 12.251 9.979 12.025 9.283z"></path><path fill="#fff" d="M6.066 12.924L6.658 12.494 5.927 12.494 5.701 11.799 5.475 12.494 4.744 12.494 5.335 12.924 5.109 13.619 5.701 13.19 6.292 13.619 6.066 12.924z"></path><path fill="#fff" d="M9.046 12.924L9.637 12.494 8.906 12.494 8.68 11.799 8.454 12.494 7.723 12.494 8.314 12.924 8.089 13.619 8.68 13.19 9.271 13.619 9.046 12.924z"></path><path fill="#fff" d="M12.025 12.924L12.616 12.494 11.885 12.494 11.659 11.799 11.433 12.494 10.702 12.494 11.294 12.924 11.068 13.619 11.659 13.19 12.251 13.619 12.025 12.924z"></path><path fill="#fff" d="M13.539 7.463L14.13 7.033 13.399 7.033 13.173 6.338 12.947 7.033 12.216 7.033 12.808 7.463 12.582 8.158 13.173 7.729 13.765 8.158 13.539 7.463z"></path><path fill="#fff" d="M4.601 11.104L5.193 10.674 4.462 10.674 4.236 9.979 4.01 10.674 3.279 10.674 3.87 11.104 3.644 11.799 4.236 11.369 4.827 11.799 4.601 11.104z"></path><path fill="#fff" d="M7.58 11.104L8.172 10.674 7.441 10.674 7.215 9.979 6.989 10.674 6.258 10.674 6.849 11.104 6.623 11.799 7.215 11.369 7.806 11.799 7.58 11.104z"></path><path fill="#fff" d="M10.56 11.104L11.151 10.674 10.42 10.674 10.194 9.979 9.968 10.674 9.237 10.674 9.828 11.104 9.603 11.799 10.194 11.369 10.785 11.799 10.56 11.104z"></path><path fill="#fff" d="M13.539 11.104L14.13 10.674 13.399 10.674 13.173 9.979 12.947 10.674 12.216 10.674 12.808 11.104 12.582 11.799 13.173 11.369 13.765 11.799 13.539 11.104z"></path><path fill="#fff" d="M4.601 14.744L5.193 14.315 4.462 14.315 4.236 13.619 4.01 14.315 3.279 14.315 3.87 14.744 3.644 15.44 4.236 15.01 4.827 15.44 4.601 14.744z"></path><path fill="#fff" d="M7.58 14.744L8.172 14.315 7.441 14.315 7.215 13.619 6.989 14.315 6.258 14.315 6.849 14.744 6.623 15.44 7.215 15.01 7.806 15.44 7.58 14.744z"></path><path fill="#fff" d="M10.56 14.744L11.151 14.315 10.42 14.315 10.194 13.619 9.968 14.315 9.237 14.315 9.828 14.744 9.603 15.44 10.194 15.01 10.785 15.44 10.56 14.744z"></path><path fill="#fff" d="M13.539 14.744L14.13 14.315 13.399 14.315 13.173 13.619 12.947 14.315 12.216 14.315 12.808 14.744 12.582 15.44 13.173 15.01 13.765 15.44 13.539 14.744z"></path></svg>
+            </div>
+        </div>
+    </div>
+    <div class="container">
+        <div class="sub-header-container">
+            <div class="update-info-container">
+                <label class="update-info-label" id="timeDiff"></label>
+            </div>
+            <div class="sort-container">
+                <label class="sort-label">🔀 <span id="sort-label-text">Сортировка по</span></label>
+                <select id="sort-dropdown" class="sort-dropdown">
+                    <option value="default">рейтингу</option>
+                    <option value="pub_date">дате публикации</option>
+                    <option value="issue_id">добавлению на HF</option>
+                </select>
+            </div>
+        </div>
+        <div class="sub-header-container-2">
+            <div class="category-toggle-container">
+                <div class="svg-container">
+                    <span id="category-toggle">🏷️ Фильтр</span>
+                    <svg height="3" width="200">
+                        <line x1="0" y1="0" x2="200" y2="0" 
+                            stroke="black" 
+                            stroke-width="2" 
+                            stroke-dasharray="3, 3" />
+                    </svg>
+                </div>
+            </div>
+            <div class="category-option-container" id="category-options">                
+                <label class="pointer" for="filter-logic-or"><input type="radio" id="filter-logic-or" name="filter-logic" value="or"> A∪B</label>
+                <label class="pointer" for="filter-logic-and"><input type="radio" id="filter-logic-and" name="filter-logic" value="and"> A∩B</label>
+            </div> 
+        </div>
+        <div class="category-filters" id="category-filters">
+            <span class="clear-categories" id="clear-categories">🧹</span>
+            <!-- Categories -->
+        </div>
+        <main id="articles-container">
+            <!-- Articles -->
+        </main>
+    </div>
+    <footer>
+        <div class="container">
+            <p><a style="color:white;" href="https://t.me/doomgrad">doomgrad</a> ✖️ <a style="color:white;" href="https://huggingface.co/papers">hugging face</a></p>
+        </div>
+    </footer>
+    <script>
+        // Language handling
+        let currentLang = localStorage.getItem('selectedLang') || 'en';
+        let feedDate = {'ru': '29 января', 'en': 'January 29', 'zh': '1月29日'};
+        let feedDateNext = {'ru': '30.01', 'en': '01/30', 'zh': '1月30日'};
+        let feedDatePrev = {'ru': '28.01', 'en': '01/28', 'zh': '1月28日'};
+        let filterLabel = {'ru': 'Фильтр', 'en': 'Topics', 'zh': '主题筛选'}
+        let publishedLabel = {'ru': 'статья от ', 'en': 'published on ', 'zh': '发表于'}
+        let sortLabel = {'ru': 'Сортировка по', 'en': 'Sort by', 'zh': '排序方式'}
+        let paperLabel = {'ru': 'Статья', 'en': 'Paper', 'zh': '论文'}
+        let topMonthLabel = {'ru': 'Месяц', 'en': 'Month', 'zh': '月度论文'}
+        let topDayLabel = {'ru': 'День', 'en': 'Day', 'zh': '日度论文'}
+        
+        function initializeLanguageFlags() {
+            const flags = document.querySelectorAll('.flag-svg');
+            flags.forEach(flag => {
+                if (flag.dataset.lang === currentLang) {
+                    flag.classList.add('active');
+                }
+                flag.addEventListener('click', () => {
+                    flags.forEach(f => f.classList.remove('active'));
+                    flag.classList.add('active');
+                    currentLang = flag.dataset.lang;
+                    localStorage.setItem('selectedLang', currentLang);
+                    updateTimeDiffs();
+                    updateLocalization();
+                    filterAndRenderArticles();
+                });
+            });
+        }
+        function toggleTheme() {
+            const body = document.body;
+            body.classList.toggle('light-theme');
+            body.classList.toggle('dark-theme');
+
+            const isDarkMode = body.classList.contains('dark-theme');
+            localStorage.setItem('darkMode', isDarkMode);
+            
+            if (isDarkMode) {
+                const title = document.getElementById('doomgrad');
+                title.innerHTML = "hf nightly";
+                const titleSign = document.getElementById('doomgrad-icon');
+                titleSign.classList.add('rotate');
+            }  else {
+                const title = document.getElementById('doomgrad');
+                title.innerHTML = "hf daily";
+                const titleSign = document.getElementById('doomgrad-icon');
+                titleSign.classList.remove('rotate');
+            }
+        }
+
+        const articlesData = [{'id': 'https://huggingface.co/papers/2501.17161', 'title': 'SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training', 'url': 'https://huggingface.co/papers/2501.17161', 'abstract': "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.", 'score': 28, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'ce9300709a3cdc7a', 'authors': ['Tianzhe Chu', 'Yuexiang Zhai', 'Jihan Yang', 'Shengbang Tong', 'Saining Xie', 'Dale Schuurmans', 'Quoc V. Le', 'Sergey Levine', 'Yi Ma'], 'affiliations': ['Google DeepMind', 'HKU', 'NYU', 'UC Berkeley'], 'pdf_title_img': 'assets/pdf/title_img/2501.17161.jpg', 'data': {'categories': ['#reasoning', '#training', '#optimization', '#rl', '#multimodal', '#games'], 'emoji': '🧠', 'ru': {'title': 'RL превосходит SFT в обобщении для мультимодальных задач', 'desc': 'Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкреплением (RL). Авторы анализируют способность моделей к обобщению на новые текстовые и визуальные варианты задач. Результаты показывают, что RL лучше обобщается на новые ситуации, особенно при использовании награды, основанной на результате. SFT, напротив, склонно к запоминанию обучающих данных и хуже справляется с обобщением.'}, 'en': {'title': 'Unlocking Generalization: RL Outshines SFT in Multi-Modal Tasks', 'desc': 'This paper investigates how supervised fine-tuning (SFT) and reinforcement learning (RL) affect the generalization abilities of foundation models. It highlights that while SFT often leads to memorization of training data, RL, particularly with outcome-based rewards, enhances generalization across unseen textual and visual variants. The study introduces GeneralPoints, a reasoning game, and V-IRL, a navigation environment, to evaluate model performance. The results indicate that RL not only improves generalization but also strengthens visual recognition, although SFT is still crucial for stabilizing the model before RL training.'}, 'zh': {'title': '强化学习提升模型泛化能力的研究', 'desc': '这篇论文研究了监督微调（SFT）和强化学习（RL）在基础模型中的作用，特别是在提高模型的泛化能力方面。研究表明，RL在处理文本和视觉变体时，能够更好地泛化，而SFT则倾向于记忆训练数据，难以应对未见过的情况。通过引入算术推理卡牌游戏GeneralPoints和真实世界导航环境V-IRL，作者评估了这两种方法的效果。尽管RL在泛化能力上表现优越，但SFT仍然对有效的RL训练至关重要，因为它稳定了模型的输出格式。'}}}, {'id': 'https://huggingface.co/papers/2501.17116', 'title': 'Optimizing Large Language Model Training Using FP4 Quantization', 'url': 'https://huggingface.co/papers/2501.17116', 'abstract': 'The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.', 'score': 13, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '9ce85dc91aee17fc', 'authors': ['Ruizhe Wang', 'Yeyun Gong', 'Xiao Liu', 'Guoshuai Zhao', 'Ziyue Yang', 'Baining Guo', 'Zhengjun Zha', 'Peng Cheng'], 'affiliations': ['Microsoft Research Asia', 'Microsoft SIGMA Team', 'University of Science and Technology of China'], 'pdf_title_img': 'assets/pdf/title_img/2501.17116.jpg', 'data': {'categories': ['#optimization', '#training', '#inference'], 'emoji': '🔢', 'ru': {'title': 'FP4: Революция в эффективности обучения языковых моделей', 'desc': 'Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авторы разработали дифференцируемый оценщик квантования для точного обновления весов и стратегию ограничения и компенсации выбросов для предотвращения коллапса активаций. Система включает схему обучения со смешанной точностью и векторное квантование для обеспечения стабильности. Экспериментальные результаты показывают, что FP4-обучение достигает точности, сравнимой с BF16 и FP8, эффективно масштабируясь до LLM с 13 млрд параметров.'}, 'en': {'title': 'Efficient Training of Large Language Models with FP4 Precision', 'desc': 'This paper addresses the high computational costs associated with training large language models (LLMs) by introducing a novel FP4 training framework. The framework utilizes quantized training techniques, specifically focusing on low-bit arithmetic to enhance efficiency while maintaining model accuracy. Key innovations include a differentiable quantization estimator for better weight updates and a strategy to manage outliers, which helps prevent activation collapse. Experimental results show that this FP4 approach achieves performance similar to higher precision formats like BF16 and FP8, making it suitable for large-scale LLMs.'}, 'zh': {'title': 'FP4训练框架：高效的超低精度训练新方案', 'desc': '随着大型语言模型（LLMs）训练对计算资源的需求不断增加，寻找更高效的方法变得尤为重要。量化训练通过允许低位数算术运算来降低这些成本，展现出良好的前景。尽管FP8精度已被证明可行，但FP4的应用仍面临显著的量化误差和有限的表示能力。本文提出了首个FP4训练框架，通过可微分量化估计器和异常值钳制与补偿策略，解决了这些挑战，并在稳定性方面结合了混合精度训练方案和向量级量化。'}}}, {'id': 'https://huggingface.co/papers/2501.16975', 'title': 'Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling', 'url': 'https://huggingface.co/papers/2501.16975', 'abstract': 'Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.', 'score': 10, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '27930c2f5d17471e', 'authors': ['Hongzhi Huang', 'Defa Zhu', 'Banggu Wu', 'Yutao Zeng', 'Ya Wang', 'Qiyang Min', 'Xun Zhou'], 'affiliations': ['Seed-Foundation-Model Team, Bytedance'], 'pdf_title_img': 'assets/pdf/title_img/2501.16975.jpg', 'data': {'categories': ['#optimization', '#training', '#architecture'], 'emoji': '🔤', 'ru': {'title': 'Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей', 'desc': 'Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера.'}, 'en': {'title': 'Unlocking Performance: The Power of Over-Tokenization in Language Models', 'desc': "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."}, 'zh': {'title': '分词技术提升大语言模型性能的关键', 'desc': '本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。'}}}, {'id': 'https://huggingface.co/papers/2501.16764', 'title': 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation', 'url': 'https://huggingface.co/papers/2501.16764', 'abstract': 'Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.', 'score': 8, 'issue_id': 1921, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '00ee1a0338716711', 'authors': ['Chenguo Lin', 'Panwang Pan', 'Bangbang Yang', 'Zeming Li', 'Yadong Mu'], 'affiliations': ['ByteDance', 'Peking University'], 'pdf_title_img': 'assets/pdf/title_img/2501.16764.jpg', 'data': {'categories': ['#diffusion', '#optimization', '#training', '#dataset', '#3d'], 'emoji': '🎨', 'ru': {'title': 'DiffSplat: Генерация 3D контента на новом уровне', 'desc': 'DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям.'}, 'en': {'title': 'Revolutionizing 3D Generation with DiffSplat', 'desc': 'DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices.'}, 'zh': {'title': 'DiffSplat：3D生成的新突破', 'desc': '最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。'}}}, {'id': 'https://huggingface.co/papers/2501.16496', 'title': 'Open Problems in Mechanistic Interpretability', 'url': 'https://huggingface.co/papers/2501.16496', 'abstract': "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.", 'score': 8, 'issue_id': 1920, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '5a7a914accebfa33', 'authors': ['Lee Sharkey', 'Bilal Chughtai', 'Joshua Batson', 'Jack Lindsey', 'Jeff Wu', 'Lucius Bushnaq', 'Nicholas Goldowsky-Dill', 'Stefan Heimersheim', 'Alejandro Ortega', 'Joseph Bloom', 'Stella Biderman', 'Adria Garriga-Alonso', 'Arthur Conmy', 'Neel Nanda', 'Jessica Rumbelow', 'Martin Wattenberg', 'Nandi Schoots', 'Joseph Miller', 'Eric J. Michaud', 'Stephen Casper', 'Max Tegmark', 'William Saunders', 'David Bau', 'Eric Todd', 'Atticus Geiger', 'Mor Geva', 'Jesse Hoogland', 'Daniel Murfet', 'Tom McGrath'], 'affiliations': ['Anthropic', 'Apollo Research', 'Google DeepMind', 'Harvard University', 'Imperial College London', 'Kings College London', 'Leap Laboratories', 'MIT', 'Northeastern University', 'Tel Aviv University', 'University of Melbourne'], 'pdf_title_img': 'assets/pdf/title_img/2501.16496.jpg', 'data': {'categories': ['#interpretability', '#survey'], 'emoji': '🧠', 'ru': {'title': 'Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта', 'desc': 'Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе их возможностей. Прогресс в этой области обещает обеспечить большую уверенность в поведении систем искусственного интеллекта и пролить свет на природу интеллекта. Авторы обсуждают открытые проблемы в области, требующие решения для реализации научных и практических преимуществ. Статья рассматривает текущие границы механистической интерпретируемости и приоритетные задачи для дальнейшего развития области.'}, 'en': {'title': 'Unlocking the Secrets of Neural Networks for Reliable AI', 'desc': 'Mechanistic interpretability focuses on understanding how neural networks work to achieve specific tasks, which can enhance the reliability of AI systems. This area of research aims to uncover the underlying processes that contribute to the intelligence exhibited by these models. Despite advancements, there are still significant challenges that need to be addressed, including improving methods for deeper insights and applying these methods effectively. Additionally, the field must consider socio-technical issues that affect and are affected by mechanistic interpretability efforts.'}, 'zh': {'title': '揭示神经网络的计算机制', 'desc': '机械解释性旨在理解神经网络能力背后的计算机制，以实现具体的科学和工程目标。该领域的进展有望提高对人工智能系统行为的信心，并揭示关于智能本质的有趣科学问题。尽管最近在这些目标上取得了一些进展，但仍有许多未解决的问题需要解决，以便实现更多的科学和实际利益。本文回顾了机械解释性的当前前沿及该领域应优先解决的开放问题。'}}}, {'id': 'https://huggingface.co/papers/2501.16372', 'title': 'Low-Rank Adapters Meet Neural Architecture Search for LLM Compression', 'url': 'https://huggingface.co/papers/2501.16372', 'abstract': 'The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.', 'score': 5, 'issue_id': 1918, 'pub_date': '2025-01-23', 'pub_date_card': {'ru': '23 января', 'en': 'January 23', 'zh': '1月23日'}, 'hash': 'f1d43a985dbea0af', 'authors': ['J. Pablo Muñoz', 'Jinjie Yuan', 'Nilesh Jain'], 'affiliations': ['Intel Corporation', 'Intel Labs'], 'pdf_title_img': 'assets/pdf/title_img/2501.16372.jpg', 'data': {'categories': ['#inference', '#optimization', '#open_source', '#training', '#low_resource', '#architecture'], 'emoji': '🧠', 'ru': {'title': 'Эффективная настройка крупных языковых моделей для ограниченных ресурсов', 'desc': 'Эта статья рассматривает проблему больших вычислительных ресурсов, необходимых для настройки и развертывания крупных языковых моделей (LLM). Авторы предлагают комбинировать низкоранговые адаптеры и методы поиска нейронных архитектур (NAS) для эффективной настройки параметров. Такой подход позволяет сжимать и дообучать большие предобученные модели, делая их более доступными в условиях ограниченных ресурсов. В результате получаются модели с меньшим потреблением памяти и более быстрым выводом, что открывает путь к более практичному применению LLM.'}, 'en': {'title': 'Democratizing Large Language Models with Efficient Fine-Tuning Techniques', 'desc': 'This paper addresses the challenges of using Large Language Models (LLMs) due to their high computational demands. It explores the use of low-rank adapters for parameter-efficient fine-tuning (PEFT), which helps reduce the resources needed. The authors combine low-rank representations with Neural Architecture Search (NAS) techniques, particularly through weight-sharing super-networks, to create efficient solutions for model compression and fine-tuning. The findings suggest that these strategies can make LLMs more accessible and practical for deployment in environments with limited resources, resulting in models that are faster and require less memory.'}, 'zh': {'title': '低秩适配器助力大型语言模型的高效微调', 'desc': '大型语言模型（LLMs）的快速发展带来了在微调和部署时对计算资源的巨大挑战。最近，低秩适配器在参数高效微调（PEFT）方面显示出了良好的效果。本文回顾了将低秩表示与神经架构搜索（NAS）技术相结合的创新方法，特别是权重共享超网络。通过整合这些方法，开发了压缩和微调大型预训练模型的稳健解决方案，使得LLMs在资源受限的环境中更易于部署。'}}}, {'id': 'https://huggingface.co/papers/2501.15747', 'title': 'IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding', 'url': 'https://huggingface.co/papers/2501.15747', 'abstract': "Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.", 'score': 4, 'issue_id': 1918, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '4b666d035c5e5c4c', 'authors': ['Sankalp KJ', 'Ashutosh Kumar', 'Laxmaan Balaji', 'Nikunj Kotecha', 'Vinija Jain', 'Aman Chadha', 'Sreyoshi Bhaduri'], 'affiliations': ['Amazon Gen AI', 'Artificial Intelligence Institute, University of South Carolina', 'Independent Researcher', 'Meta AI', 'Rochester Institute of Technology'], 'pdf_title_img': 'assets/pdf/title_img/2501.15747.jpg', 'data': {'categories': ['#reasoning', '#low_resource', '#multilingual', '#benchmark'], 'emoji': '🇮🇳', 'ru': {'title': 'Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков', 'desc': 'IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает 9 основных языков Индийского субконтинента и включает широкий спектр задач по пониманию языка, рассуждению и генерации текста. Бенчмарк разработан с учетом уникальных особенностей и сложностей индийских языков. IndicMMLU-Pro предоставляет стандартизированную систему оценки для продвижения исследований в области ИИ для индийских языков.'}, 'en': {'title': 'Empowering Indic Languages with Advanced NLP Benchmarks', 'desc': 'The paper introduces IndicMMLU-Pro, a benchmark specifically designed to assess Large Language Models (LLMs) in the context of Indic languages. It builds on the existing MMLU Pro framework and includes major languages like Hindi, Bengali, and Tamil, addressing the unique linguistic challenges of the Indian subcontinent. The benchmark features a variety of tasks that test language comprehension, reasoning, and generation, ensuring a comprehensive evaluation of models. By providing a standardized framework, IndicMMLU-Pro aims to enhance the development of more accurate and culturally aware AI models for Indic languages.'}, 'zh': {'title': '推动印度语言AI研究的基准', 'desc': 'IndicMMLU-Pro是一个专门为印度语言设计的基准，旨在评估大型语言模型（LLMs）的表现。该基准基于MMLU Pro框架，涵盖了印地语、孟加拉语、古吉拉特语等主要语言，解决了印度次大陆语言的多样性带来的挑战。它包括语言理解、推理和生成等多种任务，旨在捕捉印度语言的复杂性。通过提供标准化的评估框架，IndicMMLU-Pro推动了印度语言人工智能的研究，促进了更准确、高效和文化敏感的模型的发展。'}}}, {'id': 'https://huggingface.co/papers/2501.17117', 'title': 'Histoires Morales: A French Dataset for Assessing Moral Alignment', 'url': 'https://huggingface.co/papers/2501.17117', 'abstract': 'Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.', 'score': 2, 'issue_id': 1924, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'd2d1461e245219e8', 'authors': ['Thibaud Leteno', 'Irina Proskurina', 'Antoine Gourru', 'Julien Velcin', 'Charlotte Laclau', 'Guillaume Metzler', 'Christophe Gravier'], 'affiliations': ['Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France', 'Télécom Paris, Institut Polytechnique de Paris, Paris, France', 'Université Lumière Lyon 2, Université Claude Bernard Lyon 1, ERIC, 69007, Lyon, France'], 'pdf_title_img': 'assets/pdf/title_img/2501.17117.jpg', 'data': {'categories': ['#dataset', '#multilingual', '#alignment', '#ethics'], 'emoji': '🇫🇷', 'ru': {'title': 'Французский датасет для морального выравнивания языковых моделей', 'desc': "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человеческими ценностями. Этот датасет создан на основе 'Moral Stories' путем перевода и адаптации к французскому культурному контексту. Исследование включает эксперименты по выравниванию мультиязычных моделей на французских и английских данных. Результаты показывают, что языковые модели в целом соответствуют человеческим моральным нормам, но могут быть легко подвержены влиянию при оптимизации под предпочтения пользователей."}, 'en': {'title': 'Bridging Language Models and French Moral Values', 'desc': 'This paper emphasizes the importance of aligning language models with human values, particularly in the context of the French language. It introduces Histoires Morales, a dataset created from Moral Stories, which has been translated and refined to reflect French cultural norms and moral reasoning. The dataset includes various social situations to better understand how language models handle moral values in French. Preliminary experiments show that while language models generally align with human morals, they can be swayed by user preferences, highlighting the need for careful optimization.'}, 'zh': {'title': '让语言模型与人类价值观对齐', 'desc': '本论文强调了将语言模型与人类价值观对齐的重要性，尤其是在日常生活中。我们介绍了一个名为Histoires Morales的法语数据集，旨在填补法语在道德推理方面的研究空白。该数据集通过翻译和母语者的帮助进行精细化，确保其语法准确并适应法国文化背景。我们的初步实验表明，尽管大型语言模型通常与人类道德规范一致，但它们可以通过用户偏好优化轻易受到影响。'}}}];
+        const articlesContainer = document.getElementById('articles-container');
+        const sortDropdown = document.getElementById('sort-dropdown');
+        const categoryFiltersContainer = document.getElementById('category-filters');
+        const categoryFiltersLogicOptions = document.getElementById('category-options');
+        const categoryToggle = document.getElementById('category-toggle');
+        const clearCategoriesButton = document.getElementById('clear-categories');
+        let selectedCategories = [];
+        let selectedArticles = [];
+        let sortBy = 'issue_id';     
+        let showLimitHint = false; 
+        let filterLogicIsAnd = false;
+
+        function getUrlParameters() {
+            const urlParams = new URLSearchParams(window.location.search);
+            const categoriesParam = urlParams.get('cat');
+            let categories = categoriesParam ? categoriesParam.split(',') : [];
+            categories = categories.map(element => `#${element}`);
+            return categories
+        }
+
+        function updateUrlWithCategories() {
+            let cleanedCategories = selectedCategories.map(element => element.replace(/^#/, ''));
+            const newUrl = cleanedCategories.length > 0 
+                ? `${window.location.pathname}?cat=${cleanedCategories.join(',')}`
+                : window.location.pathname;
+            console.log("cleanedCategories", cleanedCategories)
+            window.history.pushState({}, '', newUrl);
+        }
+
+        function loadSettings() {
+            const themeToggle = document.getElementById('theme-toggle');
+            const sortDropdown = document.getElementById('sort-dropdown');
+
+            const isDarkMode = localStorage.getItem('darkMode') === 'true';
+            let settingSortBy = localStorage.getItem('sort_by');
+            filterLogicIsAnd = localStorage.getItem('filter_logic_is_and') === 'true';
+            
+            if (isDarkMode) {
+                document.body.classList.remove('light-theme');
+                document.body.classList.add('dark-theme');
+                themeToggle.checked = true;
+                const title = document.getElementById('doomgrad');
+                title.innerHTML = "hf nightly";
+                const titleSign = document.getElementById('doomgrad-icon');
+                titleSign.classList.add('rotate');
+            }
+
+            if ((!settingSortBy) || (settingSortBy === 'null')) {
+                settingSortBy = 'issue_id';
+            }
+
+            if (filterLogicIsAnd) {
+                document.getElementById('filter-logic-and').checked = true;
+            } else {
+                document.getElementById('filter-logic-or').checked = true;
+            }
+
+            sortDropdown.value = settingSortBy;
+            sortBy = settingSortBy;
+        }
+
+        document.getElementById('theme-toggle').addEventListener('change', toggleTheme);
+        document.getElementById('filter-logic-and').addEventListener('change', () => {
+            filterLogicIsAnd = true;
+            localStorage.setItem('filter_logic_is_and', 'true');
+            filterAndRenderArticles();
+            updateSelectedArticlesTitle();
+        });
+        document.getElementById('filter-logic-or').addEventListener('change', () => {
+            filterLogicIsAnd = false;
+            localStorage.setItem('filter_logic_is_and', 'false');
+            filterAndRenderArticles();
+            updateSelectedArticlesTitle();
+        });
+
+        function getUniqueCategories(articles) {
+            const categories = new Set();
+            articles.forEach(article => {
+                if (article.data && article.data.categories) {
+                    article.data.categories.forEach(cat => categories.add(cat));
+                }
+            });
+            let res = Array.from(categories);
+            res.sort();
+            return res;
+        }
+
+        function createCategoryButtons() {
+            //const categories = getUniqueCategories(articlesData);
+            const categories = ['#3d (1)', '#agents', '#agi', '#alignment (1)', '#architecture (2)', '#audio', '#benchmark (1)', '#cv', '#data', '#dataset (2)', '#diffusion (1)', '#ethics (1)', '#games (1)', '#graphs', '#hallucinations', '#healthcare', '#inference (2)', '#interpretability (1)', '#leakage', '#long_context', '#low_resource (2)', '#machine_translation', '#math', '#multilingual (2)', '#multimodal (1)', '#open_source (1)', '#optimization (5)', '#plp', '#rag', '#reasoning (2)', '#rl (1)', '#rlhf', '#robotics', '#science', '#security', '#small_models', '#story_generation', '#survey (1)', '#synthetic', '#training (5)', '#transfer_learning', '#video'];
+
+            categories.forEach(category => {
+                let catNameSplitted = category.split(/(\s+)/);
+                let catName = catNameSplitted[0];
+                const button = document.createElement('span');
+                button.textContent = catName;
+                button.className = 'category-button';
+                if (catNameSplitted.length < 2) {
+                    button.classList.add('inactive');
+                };
+                button.onclick = () => toggleCategory(catName, button);
+                categoryFiltersContainer.appendChild(button);
+            });
+        }
+
+        function toggleCategory(category, button) {
+            const index = selectedCategories.indexOf(category);
+            if (index === -1) {
+                selectedCategories.push(category);
+                button.classList.add('active');
+            } else {
+                selectedCategories.splice(index, 1);
+                button.classList.remove('active');
+            }         
+            filterAndRenderArticles();
+            saveCategorySelection();
+            updateSelectedArticlesTitle();
+            updateUrlWithCategories();
+            setFilterOptionsVisibility();
+        }
+
+        function saveCategorySelection() {
+            localStorage.setItem('selectedCategories', JSON.stringify(selectedCategories));
+        }
+
+        function updateSelectedArticlesTitle() {
+            if ((selectedArticles.length === articlesData.length) & (selectedCategories.length === 0)) {
+                categoryToggle.textContent = `🏷️ ${filterLabel[currentLang]}`;
+            } else {
+                categoryToggle.textContent = `🏷️ ${filterLabel[currentLang]} (${formatArticlesTitle(selectedArticles.length, currentLang)})`;
+            }
+        }
+
+        function cleanCategorySelection() {
+            localStorage.setItem('selectedCategories', JSON.stringify('[]'));
+        }
+
+        function loadCategorySelection() {
+            const urlCategories = getUrlParameters();
+            if (urlCategories.length > 0) {
+                selectedCategories = urlCategories;
+                saveCategorySelection();
+            } else {
+                const savedCategories = localStorage.getItem('selectedCategories');
+                if (savedCategories && savedCategories !== '"[]"') {
+                    selectedCategories = JSON.parse(savedCategories);                    
+                }
+            }
+            updateCategoryButtonStates();
+        }
+
+        function updateCategoryButtonStates() {
+            const buttons = categoryFiltersContainer.getElementsByClassName('category-button');
+            Array.from(buttons).forEach(button => {
+                if (selectedCategories.includes(button.textContent)) {
+                    button.classList.add('active');
+                } else {
+                    button.classList.remove('active');
+                }
+            });
+        }
+
+        function filterAndRenderArticles() {
+            console.log(selectedCategories);
+            let filteredArticles; 
+
+            if (filterLogicIsAnd) {
+                filteredArticles = selectedCategories.length === 0
+                    ? articlesData
+                    : articlesData.filter(article => 
+                        article.data && article.data.categories && 
+                        selectedCategories.every(cat => article.data.categories.includes(cat))
+                );
+            } else {
+                filteredArticles = selectedCategories.length === 0
+                    ? articlesData
+                    : articlesData.filter(article => 
+                        article.data && article.data.categories && 
+                        article.data.categories.some(cat => selectedCategories.includes(cat))
+                    );            
+            }
+
+            console.log('filteredArticles', filteredArticles)
+
+            selectedArticles = filteredArticles;
+            sortArticles(selectedArticles);
+        }
+
+        function clearAllCategories() {
+            selectedCategories = [];
+            updateCategoryButtonStates();
+            filterAndRenderArticles();
+            saveCategorySelection();
+            updateSelectedArticlesTitle();
+            updateUrlWithCategories();
+        }
+
+        function renderArticles(articles) {
+            if (articles.length > 50) {
+                articles = articles.slice(0, 50);
+                showLimitHint = true;
+            } else {
+                showLimitHint = false;
+            }
+            console.log(articles);
+            articlesContainer.innerHTML = '';
+            articles.forEach((item, index) => {
+                if ("error" in item) {
+                    console.log(`Omitting JSON. ${item["raw_data"]}`);
+                    return;
+                }
+                
+                let explanation = item["data"][currentLang]["desc"];
+                let title = item["data"][currentLang]["title"];
+
+                const cats = item["data"]["categories"].slice(0, 5).join(" ");
+                
+                let affiliations = ""
+                if ('affiliations' in item) {
+                    affiliations = item["affiliations"].slice(0, 10).join(", ");
+                }
+
+                let pdfImg = "https://hfday.ru/img/title_stub.png"
+                if ('pdf_title_img' in item) {
+                    pdfImg = 'https://hfday.ru/' + item['pdf_title_img']
+                    
+                }                
+
+                const articleHTML = `
+                    <article class='x${item["hash"]}'>
+                        <div class="article-content" onclick="toggleAbstract(${index})">
+                            <div class="background-digit">${index + 1}</div>
+                            <div class="article-title-cont">
+                                <div style="display:table-cell; vertical-align: middle;">
+                                    <div class="article-title"><h2>${item['data']['emoji']} ${title}</h2></div>
+                                </div>
+                            </div>
+                            <p class="meta">
+                            🔺 ${item['score']}. ${item['title']}</p>
+                            <p class="pub-date">${publishedLabel[currentLang]}${item['pub_date_card'][currentLang]}</p>
+                            
+                            <div class="article-pdf-title-img-cont"><img class="article-pdf-title-img" src="${pdfImg}"/></div>
+                            
+                            <div id="abstract-${index}" class="abstract">
+                                <p>${explanation}</p>
+                                <div id="toggle-${index}" class="abstract-toggle">...</div>
+                            </div>
+
+                            
+
+                            <div class="links">
+                                <a href="${item['url']}" target="_blank">${paperLabel[currentLang]}</a>
+                            </div>
+
+                            <div class="affiliations">${affiliations}</div>
+
+                            <div class="tags">${cats}</div>
+                        </div>
+                    </article>
+                `;
+                articlesContainer.innerHTML += articleHTML;
+            });
+        }
+        
+        function sortArticles() {
+            let sortedArticles = [...selectedArticles];
+            if (sortBy === 'issue_id') {
+                sortedArticles.sort((a, b) => b.issue_id - a.issue_id);
+            } else if (sortBy === 'pub_date') {
+                sortedArticles.sort((a, b) => b.pub_date.localeCompare(a.pub_date));
+            } else {
+                sortedArticles.sort((a, b) => b.score - a.score);
+            }
+            renderArticles(sortedArticles);
+            localStorage.setItem('sort_by', sortBy);
+        }
+        
+        sortDropdown.addEventListener('change', (event) => {
+            sortBy = event.target.value;
+            sortArticles(event.target.value);
+        });
+
+        categoryToggle.addEventListener('click', () => {
+            categoryFiltersContainer.classList.toggle('expanded');
+            setFilterOptionsVisibility();
+        });
+
+        clearCategoriesButton.addEventListener('click', () => {
+            clearAllCategories();
+            setFilterOptionsVisibility();
+        });
+
+        function setFilterOptionsVisibility() {
+            if (selectedCategories.length > 0) {
+                categoryFiltersLogicOptions.style.display = 'inline-block';
+            } else {
+                categoryFiltersLogicOptions.style.display = 'none';
+            }
+        } 
+        
+        function updateTimeDiffs() {
+            const timeDiff = document.getElementById('timeDiff');
+            timeDiff.innerHTML = '🔄 ' + getTimeDiff('2025-01-29 23:09',lang=currentLang);
+        }
+        function updateSortingOptions() {
+            const sortingLabels = {
+                ru: {
+                    default: "рейтингу",
+                    pub_date: "дате публикации",
+                    issue_id: "добавлению на HF"
+                },
+                en: {
+                    default: "rating",
+                    pub_date: "publication date",
+                    issue_id: "HF addition date"
+                },
+                zh: {
+                    default: "评分",
+                    pub_date: "发布日期",
+                    issue_id: "HF上传日期"
+                }
+            };
+
+            const dropdown = document.getElementById('sort-dropdown');
+            const options = dropdown.options;
+
+            for (let i = 0; i < options.length; i++) {
+                const optionValue = options[i].value;
+                console.log(sortingLabels)
+                options[i].text = sortingLabels[currentLang][optionValue];
+            }
+        }
+        function updateLocalization() {
+            const titleDate = document.getElementById('title-date');
+            const prevDate = document.getElementById('prev-date');
+            const nextDate = document.getElementById('next-date');
+            const topMonth = document.getElementById('top-month-label');
+            const topDay = document.getElementById('top-day-label');
+            const papersCount = document.getElementById('title-articles-count');
+            const sortLabelText = document.getElementById('sort-label-text');
+            titleDate.innerHTML = feedDate[currentLang];
+            prevDate.innerHTML = feedDatePrev[currentLang];
+            nextDate.innerHTML = feedDateNext[currentLang];
+            papersCount.innerHTML = formatArticlesTitle(articlesData.length, currentLang);
+            sortLabelText.innerHTML = sortLabel[currentLang];
+            if (topMonth) {
+                topMonth.innerHTML = topMonthLabel[currentLang];
+            }  
+            if (topDay) {
+                topDay.innerHTML = topDayLabel[currentLang];
+            }             
+            updateSelectedArticlesTitle();
+            updateSortingOptions();
+        } 
+        function hideNextLink(format) {
+            if (format === 'monthly') {
+                if (isCurrentMonth('2025-01-29 23:09')) {
+                    const element = document.getElementById('nav-next');
+                    if (element) {    
+                        element.style.display = 'none';
+                    }
+                }
+            } else {            
+                if (isToday('2025-01-29 23:09')) {
+                    const element = document.getElementById('nav-next');
+                    if (element) {    
+                        element.style.display = 'none';
+                    }
+                }
+            }
+        }
+
+        loadSettings();
+        createCategoryButtons();
+        loadCategorySelection();
+        filterAndRenderArticles();
+        updateSelectedArticlesTitle();
+        updateTimeDiffs();
+        hideNextLink('daily'); 
+        initializeLanguageFlags();
+        updateLocalization();
+        setFilterOptionsVisibility();
+    </script>
+</body>
+</html>
+    
\ No newline at end of file
diff --git a/d/2025-01-30.json b/d/2025-01-30.json
new file mode 100644
index 000000000..86b937e9f
--- /dev/null
+++ b/d/2025-01-30.json
@@ -0,0 +1,513 @@
+{
+    "date": {
+        "ru": "29 января",
+        "en": "January 29",
+        "zh": "1月29日"
+    },
+    "time_utc": "2025-01-29 23:09",
+    "weekday": 2,
+    "issue_id": 1937,
+    "home_page_url": "https://huggingface.co/papers",
+    "papers": [
+        {
+            "id": "https://huggingface.co/papers/2501.17161",
+            "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training",
+            "url": "https://huggingface.co/papers/2501.17161",
+            "abstract": "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.",
+            "score": 28,
+            "issue_id": 1920,
+            "pub_date": "2025-01-28",
+            "pub_date_card": {
+                "ru": "28 января",
+                "en": "January 28",
+                "zh": "1月28日"
+            },
+            "hash": "ce9300709a3cdc7a",
+            "authors": [
+                "Tianzhe Chu",
+                "Yuexiang Zhai",
+                "Jihan Yang",
+                "Shengbang Tong",
+                "Saining Xie",
+                "Dale Schuurmans",
+                "Quoc V. Le",
+                "Sergey Levine",
+                "Yi Ma"
+            ],
+            "affiliations": [
+                "Google DeepMind",
+                "HKU",
+                "NYU",
+                "UC Berkeley"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.17161.jpg",
+            "data": {
+                "categories": [
+                    "#reasoning",
+                    "#training",
+                    "#optimization",
+                    "#rl",
+                    "#multimodal",
+                    "#games"
+                ],
+                "emoji": "🧠",
+                "ru": {
+                    "title": "RL превосходит SFT в обобщении для мультимодальных задач",
+                    "desc": "Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкреплением (RL). Авторы анализируют способность моделей к обобщению на новые текстовые и визуальные варианты задач. Результаты показывают, что RL лучше обобщается на новые ситуации, особенно при использовании награды, основанной на результате. SFT, напротив, склонно к запоминанию обучающих данных и хуже справляется с обобщением."
+                },
+                "en": {
+                    "title": "Unlocking Generalization: RL Outshines SFT in Multi-Modal Tasks",
+                    "desc": "This paper investigates how supervised fine-tuning (SFT) and reinforcement learning (RL) affect the generalization abilities of foundation models. It highlights that while SFT often leads to memorization of training data, RL, particularly with outcome-based rewards, enhances generalization across unseen textual and visual variants. The study introduces GeneralPoints, a reasoning game, and V-IRL, a navigation environment, to evaluate model performance. The results indicate that RL not only improves generalization but also strengthens visual recognition, although SFT is still crucial for stabilizing the model before RL training."
+                },
+                "zh": {
+                    "title": "强化学习提升模型泛化能力的研究",
+                    "desc": "这篇论文研究了监督微调（SFT）和强化学习（RL）在基础模型中的作用，特别是在提高模型的泛化能力方面。研究表明，RL在处理文本和视觉变体时，能够更好地泛化，而SFT则倾向于记忆训练数据，难以应对未见过的情况。通过引入算术推理卡牌游戏GeneralPoints和真实世界导航环境V-IRL，作者评估了这两种方法的效果。尽管RL在泛化能力上表现优越，但SFT仍然对有效的RL训练至关重要，因为它稳定了模型的输出格式。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.17116",
+            "title": "Optimizing Large Language Model Training Using FP4 Quantization",
+            "url": "https://huggingface.co/papers/2501.17116",
+            "abstract": "The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.",
+            "score": 13,
+            "issue_id": 1920,
+            "pub_date": "2025-01-28",
+            "pub_date_card": {
+                "ru": "28 января",
+                "en": "January 28",
+                "zh": "1月28日"
+            },
+            "hash": "9ce85dc91aee17fc",
+            "authors": [
+                "Ruizhe Wang",
+                "Yeyun Gong",
+                "Xiao Liu",
+                "Guoshuai Zhao",
+                "Ziyue Yang",
+                "Baining Guo",
+                "Zhengjun Zha",
+                "Peng Cheng"
+            ],
+            "affiliations": [
+                "Microsoft Research Asia",
+                "Microsoft SIGMA Team",
+                "University of Science and Technology of China"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.17116.jpg",
+            "data": {
+                "categories": [
+                    "#optimization",
+                    "#training",
+                    "#inference"
+                ],
+                "emoji": "🔢",
+                "ru": {
+                    "title": "FP4: Революция в эффективности обучения языковых моделей",
+                    "desc": "Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авторы разработали дифференцируемый оценщик квантования для точного обновления весов и стратегию ограничения и компенсации выбросов для предотвращения коллапса активаций. Система включает схему обучения со смешанной точностью и векторное квантование для обеспечения стабильности. Экспериментальные результаты показывают, что FP4-обучение достигает точности, сравнимой с BF16 и FP8, эффективно масштабируясь до LLM с 13 млрд параметров."
+                },
+                "en": {
+                    "title": "Efficient Training of Large Language Models with FP4 Precision",
+                    "desc": "This paper addresses the high computational costs associated with training large language models (LLMs) by introducing a novel FP4 training framework. The framework utilizes quantized training techniques, specifically focusing on low-bit arithmetic to enhance efficiency while maintaining model accuracy. Key innovations include a differentiable quantization estimator for better weight updates and a strategy to manage outliers, which helps prevent activation collapse. Experimental results show that this FP4 approach achieves performance similar to higher precision formats like BF16 and FP8, making it suitable for large-scale LLMs."
+                },
+                "zh": {
+                    "title": "FP4训练框架：高效的超低精度训练新方案",
+                    "desc": "随着大型语言模型（LLMs）训练对计算资源的需求不断增加，寻找更高效的方法变得尤为重要。量化训练通过允许低位数算术运算来降低这些成本，展现出良好的前景。尽管FP8精度已被证明可行，但FP4的应用仍面临显著的量化误差和有限的表示能力。本文提出了首个FP4训练框架，通过可微分量化估计器和异常值钳制与补偿策略，解决了这些挑战，并在稳定性方面结合了混合精度训练方案和向量级量化。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.16975",
+            "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling",
+            "url": "https://huggingface.co/papers/2501.16975",
+            "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.",
+            "score": 10,
+            "issue_id": 1920,
+            "pub_date": "2025-01-28",
+            "pub_date_card": {
+                "ru": "28 января",
+                "en": "January 28",
+                "zh": "1月28日"
+            },
+            "hash": "27930c2f5d17471e",
+            "authors": [
+                "Hongzhi Huang",
+                "Defa Zhu",
+                "Banggu Wu",
+                "Yutao Zeng",
+                "Ya Wang",
+                "Qiyang Min",
+                "Xun Zhou"
+            ],
+            "affiliations": [
+                "Seed-Foundation-Model Team, Bytedance"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg",
+            "data": {
+                "categories": [
+                    "#optimization",
+                    "#training",
+                    "#architecture"
+                ],
+                "emoji": "🔤",
+                "ru": {
+                    "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей",
+                    "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера."
+                },
+                "en": {
+                    "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models",
+                    "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."
+                },
+                "zh": {
+                    "title": "分词技术提升大语言模型性能的关键",
+                    "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.16764",
+            "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation",
+            "url": "https://huggingface.co/papers/2501.16764",
+            "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.",
+            "score": 8,
+            "issue_id": 1921,
+            "pub_date": "2025-01-28",
+            "pub_date_card": {
+                "ru": "28 января",
+                "en": "January 28",
+                "zh": "1月28日"
+            },
+            "hash": "00ee1a0338716711",
+            "authors": [
+                "Chenguo Lin",
+                "Panwang Pan",
+                "Bangbang Yang",
+                "Zeming Li",
+                "Yadong Mu"
+            ],
+            "affiliations": [
+                "ByteDance",
+                "Peking University"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg",
+            "data": {
+                "categories": [
+                    "#diffusion",
+                    "#optimization",
+                    "#training",
+                    "#dataset",
+                    "#3d"
+                ],
+                "emoji": "🎨",
+                "ru": {
+                    "title": "DiffSplat: Генерация 3D контента на новом уровне",
+                    "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям."
+                },
+                "en": {
+                    "title": "Revolutionizing 3D Generation with DiffSplat",
+                    "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices."
+                },
+                "zh": {
+                    "title": "DiffSplat：3D生成的新突破",
+                    "desc": "最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.16496",
+            "title": "Open Problems in Mechanistic Interpretability",
+            "url": "https://huggingface.co/papers/2501.16496",
+            "abstract": "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.",
+            "score": 8,
+            "issue_id": 1920,
+            "pub_date": "2025-01-27",
+            "pub_date_card": {
+                "ru": "27 января",
+                "en": "January 27",
+                "zh": "1月27日"
+            },
+            "hash": "5a7a914accebfa33",
+            "authors": [
+                "Lee Sharkey",
+                "Bilal Chughtai",
+                "Joshua Batson",
+                "Jack Lindsey",
+                "Jeff Wu",
+                "Lucius Bushnaq",
+                "Nicholas Goldowsky-Dill",
+                "Stefan Heimersheim",
+                "Alejandro Ortega",
+                "Joseph Bloom",
+                "Stella Biderman",
+                "Adria Garriga-Alonso",
+                "Arthur Conmy",
+                "Neel Nanda",
+                "Jessica Rumbelow",
+                "Martin Wattenberg",
+                "Nandi Schoots",
+                "Joseph Miller",
+                "Eric J. Michaud",
+                "Stephen Casper",
+                "Max Tegmark",
+                "William Saunders",
+                "David Bau",
+                "Eric Todd",
+                "Atticus Geiger",
+                "Mor Geva",
+                "Jesse Hoogland",
+                "Daniel Murfet",
+                "Tom McGrath"
+            ],
+            "affiliations": [
+                "Anthropic",
+                "Apollo Research",
+                "Google DeepMind",
+                "Harvard University",
+                "Imperial College London",
+                "Kings College London",
+                "Leap Laboratories",
+                "MIT",
+                "Northeastern University",
+                "Tel Aviv University",
+                "University of Melbourne"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.16496.jpg",
+            "data": {
+                "categories": [
+                    "#interpretability",
+                    "#survey"
+                ],
+                "emoji": "🧠",
+                "ru": {
+                    "title": "Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта",
+                    "desc": "Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе их возможностей. Прогресс в этой области обещает обеспечить большую уверенность в поведении систем искусственного интеллекта и пролить свет на природу интеллекта. Авторы обсуждают открытые проблемы в области, требующие решения для реализации научных и практических преимуществ. Статья рассматривает текущие границы механистической интерпретируемости и приоритетные задачи для дальнейшего развития области."
+                },
+                "en": {
+                    "title": "Unlocking the Secrets of Neural Networks for Reliable AI",
+                    "desc": "Mechanistic interpretability focuses on understanding how neural networks work to achieve specific tasks, which can enhance the reliability of AI systems. This area of research aims to uncover the underlying processes that contribute to the intelligence exhibited by these models. Despite advancements, there are still significant challenges that need to be addressed, including improving methods for deeper insights and applying these methods effectively. Additionally, the field must consider socio-technical issues that affect and are affected by mechanistic interpretability efforts."
+                },
+                "zh": {
+                    "title": "揭示神经网络的计算机制",
+                    "desc": "机械解释性旨在理解神经网络能力背后的计算机制，以实现具体的科学和工程目标。该领域的进展有望提高对人工智能系统行为的信心，并揭示关于智能本质的有趣科学问题。尽管最近在这些目标上取得了一些进展，但仍有许多未解决的问题需要解决，以便实现更多的科学和实际利益。本文回顾了机械解释性的当前前沿及该领域应优先解决的开放问题。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.16372",
+            "title": "Low-Rank Adapters Meet Neural Architecture Search for LLM Compression",
+            "url": "https://huggingface.co/papers/2501.16372",
+            "abstract": "The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.",
+            "score": 5,
+            "issue_id": 1918,
+            "pub_date": "2025-01-23",
+            "pub_date_card": {
+                "ru": "23 января",
+                "en": "January 23",
+                "zh": "1月23日"
+            },
+            "hash": "f1d43a985dbea0af",
+            "authors": [
+                "J. Pablo Muñoz",
+                "Jinjie Yuan",
+                "Nilesh Jain"
+            ],
+            "affiliations": [
+                "Intel Corporation",
+                "Intel Labs"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.16372.jpg",
+            "data": {
+                "categories": [
+                    "#inference",
+                    "#optimization",
+                    "#open_source",
+                    "#training",
+                    "#low_resource",
+                    "#architecture"
+                ],
+                "emoji": "🧠",
+                "ru": {
+                    "title": "Эффективная настройка крупных языковых моделей для ограниченных ресурсов",
+                    "desc": "Эта статья рассматривает проблему больших вычислительных ресурсов, необходимых для настройки и развертывания крупных языковых моделей (LLM). Авторы предлагают комбинировать низкоранговые адаптеры и методы поиска нейронных архитектур (NAS) для эффективной настройки параметров. Такой подход позволяет сжимать и дообучать большие предобученные модели, делая их более доступными в условиях ограниченных ресурсов. В результате получаются модели с меньшим потреблением памяти и более быстрым выводом, что открывает путь к более практичному применению LLM."
+                },
+                "en": {
+                    "title": "Democratizing Large Language Models with Efficient Fine-Tuning Techniques",
+                    "desc": "This paper addresses the challenges of using Large Language Models (LLMs) due to their high computational demands. It explores the use of low-rank adapters for parameter-efficient fine-tuning (PEFT), which helps reduce the resources needed. The authors combine low-rank representations with Neural Architecture Search (NAS) techniques, particularly through weight-sharing super-networks, to create efficient solutions for model compression and fine-tuning. The findings suggest that these strategies can make LLMs more accessible and practical for deployment in environments with limited resources, resulting in models that are faster and require less memory."
+                },
+                "zh": {
+                    "title": "低秩适配器助力大型语言模型的高效微调",
+                    "desc": "大型语言模型（LLMs）的快速发展带来了在微调和部署时对计算资源的巨大挑战。最近，低秩适配器在参数高效微调（PEFT）方面显示出了良好的效果。本文回顾了将低秩表示与神经架构搜索（NAS）技术相结合的创新方法，特别是权重共享超网络。通过整合这些方法，开发了压缩和微调大型预训练模型的稳健解决方案，使得LLMs在资源受限的环境中更易于部署。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.15747",
+            "title": "IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding",
+            "url": "https://huggingface.co/papers/2501.15747",
+            "abstract": "Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.",
+            "score": 4,
+            "issue_id": 1918,
+            "pub_date": "2025-01-27",
+            "pub_date_card": {
+                "ru": "27 января",
+                "en": "January 27",
+                "zh": "1月27日"
+            },
+            "hash": "4b666d035c5e5c4c",
+            "authors": [
+                "Sankalp KJ",
+                "Ashutosh Kumar",
+                "Laxmaan Balaji",
+                "Nikunj Kotecha",
+                "Vinija Jain",
+                "Aman Chadha",
+                "Sreyoshi Bhaduri"
+            ],
+            "affiliations": [
+                "Amazon Gen AI",
+                "Artificial Intelligence Institute, University of South Carolina",
+                "Independent Researcher",
+                "Meta AI",
+                "Rochester Institute of Technology"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.15747.jpg",
+            "data": {
+                "categories": [
+                    "#reasoning",
+                    "#low_resource",
+                    "#multilingual",
+                    "#benchmark"
+                ],
+                "emoji": "🇮🇳",
+                "ru": {
+                    "title": "Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков",
+                    "desc": "IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает 9 основных языков Индийского субконтинента и включает широкий спектр задач по пониманию языка, рассуждению и генерации текста. Бенчмарк разработан с учетом уникальных особенностей и сложностей индийских языков. IndicMMLU-Pro предоставляет стандартизированную систему оценки для продвижения исследований в области ИИ для индийских языков."
+                },
+                "en": {
+                    "title": "Empowering Indic Languages with Advanced NLP Benchmarks",
+                    "desc": "The paper introduces IndicMMLU-Pro, a benchmark specifically designed to assess Large Language Models (LLMs) in the context of Indic languages. It builds on the existing MMLU Pro framework and includes major languages like Hindi, Bengali, and Tamil, addressing the unique linguistic challenges of the Indian subcontinent. The benchmark features a variety of tasks that test language comprehension, reasoning, and generation, ensuring a comprehensive evaluation of models. By providing a standardized framework, IndicMMLU-Pro aims to enhance the development of more accurate and culturally aware AI models for Indic languages."
+                },
+                "zh": {
+                    "title": "推动印度语言AI研究的基准",
+                    "desc": "IndicMMLU-Pro是一个专门为印度语言设计的基准，旨在评估大型语言模型（LLMs）的表现。该基准基于MMLU Pro框架，涵盖了印地语、孟加拉语、古吉拉特语等主要语言，解决了印度次大陆语言的多样性带来的挑战。它包括语言理解、推理和生成等多种任务，旨在捕捉印度语言的复杂性。通过提供标准化的评估框架，IndicMMLU-Pro推动了印度语言人工智能的研究，促进了更准确、高效和文化敏感的模型的发展。"
+                }
+            }
+        },
+        {
+            "id": "https://huggingface.co/papers/2501.17117",
+            "title": "Histoires Morales: A French Dataset for Assessing Moral Alignment",
+            "url": "https://huggingface.co/papers/2501.17117",
+            "abstract": "Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.",
+            "score": 2,
+            "issue_id": 1924,
+            "pub_date": "2025-01-28",
+            "pub_date_card": {
+                "ru": "28 января",
+                "en": "January 28",
+                "zh": "1月28日"
+            },
+            "hash": "d2d1461e245219e8",
+            "authors": [
+                "Thibaud Leteno",
+                "Irina Proskurina",
+                "Antoine Gourru",
+                "Julien Velcin",
+                "Charlotte Laclau",
+                "Guillaume Metzler",
+                "Christophe Gravier"
+            ],
+            "affiliations": [
+                "Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France",
+                "Télécom Paris, Institut Polytechnique de Paris, Paris, France",
+                "Université Lumière Lyon 2, Université Claude Bernard Lyon 1, ERIC, 69007, Lyon, France"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.17117.jpg",
+            "data": {
+                "categories": [
+                    "#dataset",
+                    "#multilingual",
+                    "#alignment",
+                    "#ethics"
+                ],
+                "emoji": "🇫🇷",
+                "ru": {
+                    "title": "Французский датасет для морального выравнивания языковых моделей",
+                    "desc": "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человеческими ценностями. Этот датасет создан на основе 'Moral Stories' путем перевода и адаптации к французскому культурному контексту. Исследование включает эксперименты по выравниванию мультиязычных моделей на французских и английских данных. Результаты показывают, что языковые модели в целом соответствуют человеческим моральным нормам, но могут быть легко подвержены влиянию при оптимизации под предпочтения пользователей."
+                },
+                "en": {
+                    "title": "Bridging Language Models and French Moral Values",
+                    "desc": "This paper emphasizes the importance of aligning language models with human values, particularly in the context of the French language. It introduces Histoires Morales, a dataset created from Moral Stories, which has been translated and refined to reflect French cultural norms and moral reasoning. The dataset includes various social situations to better understand how language models handle moral values in French. Preliminary experiments show that while language models generally align with human morals, they can be swayed by user preferences, highlighting the need for careful optimization."
+                },
+                "zh": {
+                    "title": "让语言模型与人类价值观对齐",
+                    "desc": "本论文强调了将语言模型与人类价值观对齐的重要性，尤其是在日常生活中。我们介绍了一个名为Histoires Morales的法语数据集，旨在填补法语在道德推理方面的研究空白。该数据集通过翻译和母语者的帮助进行精细化，确保其语法准确并适应法国文化背景。我们的初步实验表明，尽管大型语言模型通常与人类道德规范一致，但它们可以通过用户偏好优化轻易受到影响。"
+                }
+            }
+        }
+    ],
+    "link_prev": "2025-01-28.html",
+    "link_next": "2025-01-30.html",
+    "link_month": "2025-01.html",
+    "short_date_prev": {
+        "ru": "28.01",
+        "en": "01/28",
+        "zh": "1月28日"
+    },
+    "short_date_next": {
+        "ru": "30.01",
+        "en": "01/30",
+        "zh": "1月30日"
+    },
+    "categories": {
+        "#dataset": 2,
+        "#data": 0,
+        "#benchmark": 1,
+        "#agents": 0,
+        "#cv": 0,
+        "#rl": 1,
+        "#rlhf": 0,
+        "#rag": 0,
+        "#plp": 0,
+        "#inference": 2,
+        "#3d": 1,
+        "#audio": 0,
+        "#video": 0,
+        "#multimodal": 1,
+        "#math": 0,
+        "#multilingual": 2,
+        "#architecture": 2,
+        "#healthcare": 0,
+        "#training": 5,
+        "#robotics": 0,
+        "#agi": 0,
+        "#games": 1,
+        "#interpretability": 1,
+        "#reasoning": 2,
+        "#transfer_learning": 0,
+        "#graphs": 0,
+        "#ethics": 1,
+        "#security": 0,
+        "#optimization": 5,
+        "#survey": 1,
+        "#diffusion": 1,
+        "#alignment": 1,
+        "#story_generation": 0,
+        "#hallucinations": 0,
+        "#long_context": 0,
+        "#synthetic": 0,
+        "#machine_translation": 0,
+        "#leakage": 0,
+        "#open_source": 1,
+        "#small_models": 0,
+        "#science": 0,
+        "#low_resource": 2
+    },
+    "zh": {
+        "text": "这篇文章比较了监督微调（SFT）和强化学习（RL）在基础模型上的作用。研究发现，RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据，而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而，SFT对于RL的有效训练仍然不可或缺。",
+        "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training",
+        "pinyin": "这篇文章比较了监督微调（SFT）和强化学习（RL）在基础模型上的作用。研究发现，RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据，而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而，SFT对于RL的有效训练仍然不可或缺。\n\nZhè piān wénzhāng bǐjiào le jiàndū wēitiáo (SFT) hé qiáng huà xuéxí (RL) zài jīchǔ móxíng shàng de zuòyòng. Yánjiū fāxiàn, RL zài wénběn hé shìjué rènwù shàng dōu biǎoxiàn chū gèng hǎo de fànhuà nénglì. SFT qīngxiàng yú jìzhù xùnliàn shùjù, ér RL nénggòu chǔlǐ wèi jiànguò de biàntǐ. RL hái tígāo le móxíng de shìjué shíbié nénglì. Rán'ér, SFT duìyú RL de yǒuxiào xùnliàn réngrán bùkě huòquē.",
+        "vocab": "[{'word': '监督', 'pinyin': 'jiàn dū', 'trans': 'supervised'},\n{'word': '微调', 'pinyin': 'wēi tiáo', 'trans': 'fine-tuning'},\n{'word': '强化学习', 'pinyin': 'qiáng huà xué xí', 'trans': 'reinforcement learning'},\n{'word': '基础模型', 'pinyin': 'jī chǔ mó xíng', 'trans': 'foundational model'},\n{'word': '作用', 'pinyin': 'zuò yòng', 'trans': 'effect'},\n{'word': '泛化', 'pinyin': 'fàn huà', 'trans': 'generalization'},\n{'word': '倾向于', 'pinyin': 'qīng xiàng yú', 'trans': 'tend to'},\n{'word': '未见过', 'pinyin': 'wèi jiàn guò', 'trans': 'unseen'},\n{'word': '变体', 'pinyin': 'biàn tǐ', 'trans': 'variant'},\n{'word': '视觉识别', 'pinyin': 'shì jué shí bié', 'trans': 'visual recognition'},\n{'word': '不可或缺', 'pinyin': 'bù kě huò quē', 'trans': 'indispensable'}]",
+        "trans": "This article compares the roles of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on base models. The study found that RL demonstrates better generalization capabilities in both textual and visual tasks. SFT tends to memorize training data, while RL can handle unseen variants. RL also enhances the model's visual recognition capabilities. However, SFT remains indispensable for effective RL training.",
+        "update_ts": "2025-01-29 09:10"
+    }
+}
\ No newline at end of file
diff --git a/hf_papers.json b/hf_papers.json
index 86b937e9f..47f8f7105 100644
--- a/hf_papers.json
+++ b/hf_papers.json
@@ -1,12 +1,12 @@
 {
     "date": {
-        "ru": "29 января",
-        "en": "January 29",
-        "zh": "1月29日"
+        "ru": "30 января",
+        "en": "January 30",
+        "zh": "1月30日"
     },
-    "time_utc": "2025-01-29 23:09",
-    "weekday": 2,
-    "issue_id": 1937,
+    "time_utc": "2025-01-30 00:44",
+    "weekday": 3,
+    "issue_id": 1938,
     "home_page_url": "https://huggingface.co/papers",
     "papers": [
         {
@@ -14,7 +14,7 @@
             "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training",
             "url": "https://huggingface.co/papers/2501.17161",
             "abstract": "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.",
-            "score": 28,
+            "score": 29,
             "issue_id": 1920,
             "pub_date": "2025-01-28",
             "pub_date_card": {
@@ -70,7 +70,7 @@
             "title": "Optimizing Large Language Model Training Using FP4 Quantization",
             "url": "https://huggingface.co/papers/2501.17116",
             "abstract": "The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.",
-            "score": 13,
+            "score": 14,
             "issue_id": 1920,
             "pub_date": "2025-01-28",
             "pub_date_card": {
@@ -117,99 +117,99 @@
             }
         },
         {
-            "id": "https://huggingface.co/papers/2501.16975",
-            "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling",
-            "url": "https://huggingface.co/papers/2501.16975",
-            "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.",
+            "id": "https://huggingface.co/papers/2501.16764",
+            "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation",
+            "url": "https://huggingface.co/papers/2501.16764",
+            "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.",
             "score": 10,
-            "issue_id": 1920,
+            "issue_id": 1921,
             "pub_date": "2025-01-28",
             "pub_date_card": {
                 "ru": "28 января",
                 "en": "January 28",
                 "zh": "1月28日"
             },
-            "hash": "27930c2f5d17471e",
+            "hash": "00ee1a0338716711",
             "authors": [
-                "Hongzhi Huang",
-                "Defa Zhu",
-                "Banggu Wu",
-                "Yutao Zeng",
-                "Ya Wang",
-                "Qiyang Min",
-                "Xun Zhou"
+                "Chenguo Lin",
+                "Panwang Pan",
+                "Bangbang Yang",
+                "Zeming Li",
+                "Yadong Mu"
             ],
             "affiliations": [
-                "Seed-Foundation-Model Team, Bytedance"
+                "ByteDance",
+                "Peking University"
             ],
-            "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg",
+            "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg",
             "data": {
                 "categories": [
+                    "#diffusion",
                     "#optimization",
                     "#training",
-                    "#architecture"
+                    "#dataset",
+                    "#3d"
                 ],
-                "emoji": "🔤",
+                "emoji": "🎨",
                 "ru": {
-                    "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей",
-                    "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера."
+                    "title": "DiffSplat: Генерация 3D контента на новом уровне",
+                    "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям."
                 },
                 "en": {
-                    "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models",
-                    "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."
+                    "title": "Revolutionizing 3D Generation with DiffSplat",
+                    "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices."
                 },
                 "zh": {
-                    "title": "分词技术提升大语言模型性能的关键",
-                    "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。"
+                    "title": "DiffSplat：3D生成的新突破",
+                    "desc": "最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。"
                 }
             }
         },
         {
-            "id": "https://huggingface.co/papers/2501.16764",
-            "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation",
-            "url": "https://huggingface.co/papers/2501.16764",
-            "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.",
-            "score": 8,
-            "issue_id": 1921,
+            "id": "https://huggingface.co/papers/2501.16975",
+            "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling",
+            "url": "https://huggingface.co/papers/2501.16975",
+            "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.",
+            "score": 10,
+            "issue_id": 1920,
             "pub_date": "2025-01-28",
             "pub_date_card": {
                 "ru": "28 января",
                 "en": "January 28",
                 "zh": "1月28日"
             },
-            "hash": "00ee1a0338716711",
+            "hash": "27930c2f5d17471e",
             "authors": [
-                "Chenguo Lin",
-                "Panwang Pan",
-                "Bangbang Yang",
-                "Zeming Li",
-                "Yadong Mu"
+                "Hongzhi Huang",
+                "Defa Zhu",
+                "Banggu Wu",
+                "Yutao Zeng",
+                "Ya Wang",
+                "Qiyang Min",
+                "Xun Zhou"
             ],
             "affiliations": [
-                "ByteDance",
-                "Peking University"
+                "Seed-Foundation-Model Team, Bytedance"
             ],
-            "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg",
+            "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg",
             "data": {
                 "categories": [
-                    "#diffusion",
                     "#optimization",
                     "#training",
-                    "#dataset",
-                    "#3d"
+                    "#architecture"
                 ],
-                "emoji": "🎨",
+                "emoji": "🔤",
                 "ru": {
-                    "title": "DiffSplat: Генерация 3D контента на новом уровне",
-                    "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям."
+                    "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей",
+                    "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера."
                 },
                 "en": {
-                    "title": "Revolutionizing 3D Generation with DiffSplat",
-                    "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices."
+                    "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models",
+                    "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."
                 },
                 "zh": {
-                    "title": "DiffSplat：3D生成的新突破",
-                    "desc": "最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。"
+                    "title": "分词技术提升大语言模型性能的关键",
+                    "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。"
                 }
             }
         },
@@ -218,7 +218,7 @@
             "title": "Open Problems in Mechanistic Interpretability",
             "url": "https://huggingface.co/papers/2501.16496",
             "abstract": "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.",
-            "score": 8,
+            "score": 9,
             "issue_id": 1920,
             "pub_date": "2025-01-27",
             "pub_date_card": {
@@ -445,18 +445,18 @@
             }
         }
     ],
-    "link_prev": "2025-01-28.html",
-    "link_next": "2025-01-30.html",
+    "link_prev": "2025-01-29.html",
+    "link_next": "2025-01-31.html",
     "link_month": "2025-01.html",
     "short_date_prev": {
-        "ru": "28.01",
-        "en": "01/28",
-        "zh": "1月28日"
+        "ru": "29.01",
+        "en": "01/29",
+        "zh": "1月29日"
     },
     "short_date_next": {
-        "ru": "30.01",
-        "en": "01/30",
-        "zh": "1月30日"
+        "ru": "31.01",
+        "en": "01/31",
+        "zh": "1月31日"
     },
     "categories": {
         "#dataset": 2,
diff --git a/index.html b/index.html
index b5910dbb7..e261d103f 100644
--- a/index.html
+++ b/index.html
@@ -10,7 +10,7 @@
         gtag('config', 'G-C1CRWDNJ1J');
     </script>
     <meta charset="UTF-8">
-    <meta name="viewport" content="width=device-width, initial-scale=1.0"><title>HF. 8 papers. January 29.</title>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0"><title>HF. 8 papers. January 30.</title>
 <link rel="icon" href="favicon.svg" sizes="any" type="image/svg+xml">
     <link href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;700&display=swap" rel="stylesheet">
     <link href="https://fonts.googleapis.com/css2?family=Roboto+Slab:wght@100..900&family=Tiny5&display=swap" rel="stylesheet">
@@ -765,7 +765,7 @@
     <header>
         <div class="container">            
             <a href="https://hfday.ru" class="a-clean"><h1 class="title-sign" id="doomgrad-icon">🔺</h1><h1 class="title-text" id="doomgrad">hf daily</h1></a>
-            <p><span id="title-date">29 января</span> | <span id="title-articles-count">8 papers</span></p>
+            <p><span id="title-date">30 января</span> | <span id="title-articles-count">8 papers</span></p>
         </div>
         <div class="theme-switch">
             <label class="switch">
@@ -776,8 +776,8 @@
     </header>
     <div class="nav-menu">
         <div class="nav-container">
-            <span class="nav-item nav-prev" id="nav-prev"><a href="/d/2025-01-28.html">⬅️ <span id="prev-date">28.01</span></a></span>
-            <span class="nav-item" id="nav-next"><a href="/d/2025-01-30.html">➡️ <span id="next-date">30.01</span></a></span>
+            <span class="nav-item nav-prev" id="nav-prev"><a href="/d/2025-01-29.html">⬅️ <span id="prev-date">29.01</span></a></span>
+            <span class="nav-item" id="nav-next"><a href="/d/2025-01-31.html">➡️ <span id="next-date">31.01</span></a></span>
             <span class="nav-item" id="nav-monthly"><a href="/m/2025-01.html">📈 <span id='top-month-label'>Месяц</span></a></span>
             <div class="language-flags">
                 <svg class="flag-svg" data-lang="ru" xmlns="http://www.w3.org/2000/svg" width="32" height="32" viewBox="0 0 32 32"><path fill="#1435a1" d="M1 11H31V21H1z"></path><path d="M5,4H27c2.208,0,4,1.792,4,4v4H1v-4c0-2.208,1.792-4,4-4Z" fill="#fff"></path><path d="M5,20H27c2.208,0,4,1.792,4,4v4H1v-4c0-2.208,1.792-4,4-4Z" transform="rotate(180 16 24)" fill="#c53a28"></path><path d="M27,4H5c-2.209,0-4,1.791-4,4V24c0,2.209,1.791,4,4,4H27c2.209,0,4-1.791,4-4V8c0-2.209-1.791-4-4-4Zm3,20c0,1.654-1.346,3-3,3H5c-1.654,0-3-1.346-3-3V8c0-1.654,1.346-3,3-3H27c1.654,0,3,1.346,3,3V24Z" opacity=".15"></path><path d="M27,5H5c-1.657,0-3,1.343-3,3v1c0-1.657,1.343-3,3-3H27c1.657,0,3,1.343,3,3v-1c0-1.657-1.343-3-3-3Z" fill="#fff" opacity=".2"></path></svg>
@@ -833,9 +833,9 @@
     <script>
         // Language handling
         let currentLang = localStorage.getItem('selectedLang') || 'en';
-        let feedDate = {'ru': '29 января', 'en': 'January 29', 'zh': '1月29日'};
-        let feedDateNext = {'ru': '30.01', 'en': '01/30', 'zh': '1月30日'};
-        let feedDatePrev = {'ru': '28.01', 'en': '01/28', 'zh': '1月28日'};
+        let feedDate = {'ru': '30 января', 'en': 'January 30', 'zh': '1月30日'};
+        let feedDateNext = {'ru': '31.01', 'en': '01/31', 'zh': '1月31日'};
+        let feedDatePrev = {'ru': '29.01', 'en': '01/29', 'zh': '1月29日'};
         let filterLabel = {'ru': 'Фильтр', 'en': 'Topics', 'zh': '主题筛选'}
         let publishedLabel = {'ru': 'статья от ', 'en': 'published on ', 'zh': '发表于'}
         let sortLabel = {'ru': 'Сортировка по', 'en': 'Sort by', 'zh': '排序方式'}
@@ -881,7 +881,7 @@
             }
         }
 
-        const articlesData = [{'id': 'https://huggingface.co/papers/2501.17161', 'title': 'SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training', 'url': 'https://huggingface.co/papers/2501.17161', 'abstract': "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.", 'score': 28, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'ce9300709a3cdc7a', 'authors': ['Tianzhe Chu', 'Yuexiang Zhai', 'Jihan Yang', 'Shengbang Tong', 'Saining Xie', 'Dale Schuurmans', 'Quoc V. Le', 'Sergey Levine', 'Yi Ma'], 'affiliations': ['Google DeepMind', 'HKU', 'NYU', 'UC Berkeley'], 'pdf_title_img': 'assets/pdf/title_img/2501.17161.jpg', 'data': {'categories': ['#reasoning', '#training', '#optimization', '#rl', '#multimodal', '#games'], 'emoji': '🧠', 'ru': {'title': 'RL превосходит SFT в обобщении для мультимодальных задач', 'desc': 'Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкреплением (RL). Авторы анализируют способность моделей к обобщению на новые текстовые и визуальные варианты задач. Результаты показывают, что RL лучше обобщается на новые ситуации, особенно при использовании награды, основанной на результате. SFT, напротив, склонно к запоминанию обучающих данных и хуже справляется с обобщением.'}, 'en': {'title': 'Unlocking Generalization: RL Outshines SFT in Multi-Modal Tasks', 'desc': 'This paper investigates how supervised fine-tuning (SFT) and reinforcement learning (RL) affect the generalization abilities of foundation models. It highlights that while SFT often leads to memorization of training data, RL, particularly with outcome-based rewards, enhances generalization across unseen textual and visual variants. The study introduces GeneralPoints, a reasoning game, and V-IRL, a navigation environment, to evaluate model performance. The results indicate that RL not only improves generalization but also strengthens visual recognition, although SFT is still crucial for stabilizing the model before RL training.'}, 'zh': {'title': '强化学习提升模型泛化能力的研究', 'desc': '这篇论文研究了监督微调（SFT）和强化学习（RL）在基础模型中的作用，特别是在提高模型的泛化能力方面。研究表明，RL在处理文本和视觉变体时，能够更好地泛化，而SFT则倾向于记忆训练数据，难以应对未见过的情况。通过引入算术推理卡牌游戏GeneralPoints和真实世界导航环境V-IRL，作者评估了这两种方法的效果。尽管RL在泛化能力上表现优越，但SFT仍然对有效的RL训练至关重要，因为它稳定了模型的输出格式。'}}}, {'id': 'https://huggingface.co/papers/2501.17116', 'title': 'Optimizing Large Language Model Training Using FP4 Quantization', 'url': 'https://huggingface.co/papers/2501.17116', 'abstract': 'The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.', 'score': 13, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '9ce85dc91aee17fc', 'authors': ['Ruizhe Wang', 'Yeyun Gong', 'Xiao Liu', 'Guoshuai Zhao', 'Ziyue Yang', 'Baining Guo', 'Zhengjun Zha', 'Peng Cheng'], 'affiliations': ['Microsoft Research Asia', 'Microsoft SIGMA Team', 'University of Science and Technology of China'], 'pdf_title_img': 'assets/pdf/title_img/2501.17116.jpg', 'data': {'categories': ['#optimization', '#training', '#inference'], 'emoji': '🔢', 'ru': {'title': 'FP4: Революция в эффективности обучения языковых моделей', 'desc': 'Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авторы разработали дифференцируемый оценщик квантования для точного обновления весов и стратегию ограничения и компенсации выбросов для предотвращения коллапса активаций. Система включает схему обучения со смешанной точностью и векторное квантование для обеспечения стабильности. Экспериментальные результаты показывают, что FP4-обучение достигает точности, сравнимой с BF16 и FP8, эффективно масштабируясь до LLM с 13 млрд параметров.'}, 'en': {'title': 'Efficient Training of Large Language Models with FP4 Precision', 'desc': 'This paper addresses the high computational costs associated with training large language models (LLMs) by introducing a novel FP4 training framework. The framework utilizes quantized training techniques, specifically focusing on low-bit arithmetic to enhance efficiency while maintaining model accuracy. Key innovations include a differentiable quantization estimator for better weight updates and a strategy to manage outliers, which helps prevent activation collapse. Experimental results show that this FP4 approach achieves performance similar to higher precision formats like BF16 and FP8, making it suitable for large-scale LLMs.'}, 'zh': {'title': 'FP4训练框架：高效的超低精度训练新方案', 'desc': '随着大型语言模型（LLMs）训练对计算资源的需求不断增加，寻找更高效的方法变得尤为重要。量化训练通过允许低位数算术运算来降低这些成本，展现出良好的前景。尽管FP8精度已被证明可行，但FP4的应用仍面临显著的量化误差和有限的表示能力。本文提出了首个FP4训练框架，通过可微分量化估计器和异常值钳制与补偿策略，解决了这些挑战，并在稳定性方面结合了混合精度训练方案和向量级量化。'}}}, {'id': 'https://huggingface.co/papers/2501.16975', 'title': 'Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling', 'url': 'https://huggingface.co/papers/2501.16975', 'abstract': 'Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.', 'score': 10, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '27930c2f5d17471e', 'authors': ['Hongzhi Huang', 'Defa Zhu', 'Banggu Wu', 'Yutao Zeng', 'Ya Wang', 'Qiyang Min', 'Xun Zhou'], 'affiliations': ['Seed-Foundation-Model Team, Bytedance'], 'pdf_title_img': 'assets/pdf/title_img/2501.16975.jpg', 'data': {'categories': ['#optimization', '#training', '#architecture'], 'emoji': '🔤', 'ru': {'title': 'Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей', 'desc': 'Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера.'}, 'en': {'title': 'Unlocking Performance: The Power of Over-Tokenization in Language Models', 'desc': "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."}, 'zh': {'title': '分词技术提升大语言模型性能的关键', 'desc': '本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。'}}}, {'id': 'https://huggingface.co/papers/2501.16764', 'title': 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation', 'url': 'https://huggingface.co/papers/2501.16764', 'abstract': 'Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.', 'score': 8, 'issue_id': 1921, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '00ee1a0338716711', 'authors': ['Chenguo Lin', 'Panwang Pan', 'Bangbang Yang', 'Zeming Li', 'Yadong Mu'], 'affiliations': ['ByteDance', 'Peking University'], 'pdf_title_img': 'assets/pdf/title_img/2501.16764.jpg', 'data': {'categories': ['#diffusion', '#optimization', '#training', '#dataset', '#3d'], 'emoji': '🎨', 'ru': {'title': 'DiffSplat: Генерация 3D контента на новом уровне', 'desc': 'DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям.'}, 'en': {'title': 'Revolutionizing 3D Generation with DiffSplat', 'desc': 'DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices.'}, 'zh': {'title': 'DiffSplat：3D生成的新突破', 'desc': '最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。'}}}, {'id': 'https://huggingface.co/papers/2501.16496', 'title': 'Open Problems in Mechanistic Interpretability', 'url': 'https://huggingface.co/papers/2501.16496', 'abstract': "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.", 'score': 8, 'issue_id': 1920, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '5a7a914accebfa33', 'authors': ['Lee Sharkey', 'Bilal Chughtai', 'Joshua Batson', 'Jack Lindsey', 'Jeff Wu', 'Lucius Bushnaq', 'Nicholas Goldowsky-Dill', 'Stefan Heimersheim', 'Alejandro Ortega', 'Joseph Bloom', 'Stella Biderman', 'Adria Garriga-Alonso', 'Arthur Conmy', 'Neel Nanda', 'Jessica Rumbelow', 'Martin Wattenberg', 'Nandi Schoots', 'Joseph Miller', 'Eric J. Michaud', 'Stephen Casper', 'Max Tegmark', 'William Saunders', 'David Bau', 'Eric Todd', 'Atticus Geiger', 'Mor Geva', 'Jesse Hoogland', 'Daniel Murfet', 'Tom McGrath'], 'affiliations': ['Anthropic', 'Apollo Research', 'Google DeepMind', 'Harvard University', 'Imperial College London', 'Kings College London', 'Leap Laboratories', 'MIT', 'Northeastern University', 'Tel Aviv University', 'University of Melbourne'], 'pdf_title_img': 'assets/pdf/title_img/2501.16496.jpg', 'data': {'categories': ['#interpretability', '#survey'], 'emoji': '🧠', 'ru': {'title': 'Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта', 'desc': 'Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе их возможностей. Прогресс в этой области обещает обеспечить большую уверенность в поведении систем искусственного интеллекта и пролить свет на природу интеллекта. Авторы обсуждают открытые проблемы в области, требующие решения для реализации научных и практических преимуществ. Статья рассматривает текущие границы механистической интерпретируемости и приоритетные задачи для дальнейшего развития области.'}, 'en': {'title': 'Unlocking the Secrets of Neural Networks for Reliable AI', 'desc': 'Mechanistic interpretability focuses on understanding how neural networks work to achieve specific tasks, which can enhance the reliability of AI systems. This area of research aims to uncover the underlying processes that contribute to the intelligence exhibited by these models. Despite advancements, there are still significant challenges that need to be addressed, including improving methods for deeper insights and applying these methods effectively. Additionally, the field must consider socio-technical issues that affect and are affected by mechanistic interpretability efforts.'}, 'zh': {'title': '揭示神经网络的计算机制', 'desc': '机械解释性旨在理解神经网络能力背后的计算机制，以实现具体的科学和工程目标。该领域的进展有望提高对人工智能系统行为的信心，并揭示关于智能本质的有趣科学问题。尽管最近在这些目标上取得了一些进展，但仍有许多未解决的问题需要解决，以便实现更多的科学和实际利益。本文回顾了机械解释性的当前前沿及该领域应优先解决的开放问题。'}}}, {'id': 'https://huggingface.co/papers/2501.16372', 'title': 'Low-Rank Adapters Meet Neural Architecture Search for LLM Compression', 'url': 'https://huggingface.co/papers/2501.16372', 'abstract': 'The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.', 'score': 5, 'issue_id': 1918, 'pub_date': '2025-01-23', 'pub_date_card': {'ru': '23 января', 'en': 'January 23', 'zh': '1月23日'}, 'hash': 'f1d43a985dbea0af', 'authors': ['J. Pablo Muñoz', 'Jinjie Yuan', 'Nilesh Jain'], 'affiliations': ['Intel Corporation', 'Intel Labs'], 'pdf_title_img': 'assets/pdf/title_img/2501.16372.jpg', 'data': {'categories': ['#inference', '#optimization', '#open_source', '#training', '#low_resource', '#architecture'], 'emoji': '🧠', 'ru': {'title': 'Эффективная настройка крупных языковых моделей для ограниченных ресурсов', 'desc': 'Эта статья рассматривает проблему больших вычислительных ресурсов, необходимых для настройки и развертывания крупных языковых моделей (LLM). Авторы предлагают комбинировать низкоранговые адаптеры и методы поиска нейронных архитектур (NAS) для эффективной настройки параметров. Такой подход позволяет сжимать и дообучать большие предобученные модели, делая их более доступными в условиях ограниченных ресурсов. В результате получаются модели с меньшим потреблением памяти и более быстрым выводом, что открывает путь к более практичному применению LLM.'}, 'en': {'title': 'Democratizing Large Language Models with Efficient Fine-Tuning Techniques', 'desc': 'This paper addresses the challenges of using Large Language Models (LLMs) due to their high computational demands. It explores the use of low-rank adapters for parameter-efficient fine-tuning (PEFT), which helps reduce the resources needed. The authors combine low-rank representations with Neural Architecture Search (NAS) techniques, particularly through weight-sharing super-networks, to create efficient solutions for model compression and fine-tuning. The findings suggest that these strategies can make LLMs more accessible and practical for deployment in environments with limited resources, resulting in models that are faster and require less memory.'}, 'zh': {'title': '低秩适配器助力大型语言模型的高效微调', 'desc': '大型语言模型（LLMs）的快速发展带来了在微调和部署时对计算资源的巨大挑战。最近，低秩适配器在参数高效微调（PEFT）方面显示出了良好的效果。本文回顾了将低秩表示与神经架构搜索（NAS）技术相结合的创新方法，特别是权重共享超网络。通过整合这些方法，开发了压缩和微调大型预训练模型的稳健解决方案，使得LLMs在资源受限的环境中更易于部署。'}}}, {'id': 'https://huggingface.co/papers/2501.15747', 'title': 'IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding', 'url': 'https://huggingface.co/papers/2501.15747', 'abstract': "Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.", 'score': 4, 'issue_id': 1918, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '4b666d035c5e5c4c', 'authors': ['Sankalp KJ', 'Ashutosh Kumar', 'Laxmaan Balaji', 'Nikunj Kotecha', 'Vinija Jain', 'Aman Chadha', 'Sreyoshi Bhaduri'], 'affiliations': ['Amazon Gen AI', 'Artificial Intelligence Institute, University of South Carolina', 'Independent Researcher', 'Meta AI', 'Rochester Institute of Technology'], 'pdf_title_img': 'assets/pdf/title_img/2501.15747.jpg', 'data': {'categories': ['#reasoning', '#low_resource', '#multilingual', '#benchmark'], 'emoji': '🇮🇳', 'ru': {'title': 'Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков', 'desc': 'IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает 9 основных языков Индийского субконтинента и включает широкий спектр задач по пониманию языка, рассуждению и генерации текста. Бенчмарк разработан с учетом уникальных особенностей и сложностей индийских языков. IndicMMLU-Pro предоставляет стандартизированную систему оценки для продвижения исследований в области ИИ для индийских языков.'}, 'en': {'title': 'Empowering Indic Languages with Advanced NLP Benchmarks', 'desc': 'The paper introduces IndicMMLU-Pro, a benchmark specifically designed to assess Large Language Models (LLMs) in the context of Indic languages. It builds on the existing MMLU Pro framework and includes major languages like Hindi, Bengali, and Tamil, addressing the unique linguistic challenges of the Indian subcontinent. The benchmark features a variety of tasks that test language comprehension, reasoning, and generation, ensuring a comprehensive evaluation of models. By providing a standardized framework, IndicMMLU-Pro aims to enhance the development of more accurate and culturally aware AI models for Indic languages.'}, 'zh': {'title': '推动印度语言AI研究的基准', 'desc': 'IndicMMLU-Pro是一个专门为印度语言设计的基准，旨在评估大型语言模型（LLMs）的表现。该基准基于MMLU Pro框架，涵盖了印地语、孟加拉语、古吉拉特语等主要语言，解决了印度次大陆语言的多样性带来的挑战。它包括语言理解、推理和生成等多种任务，旨在捕捉印度语言的复杂性。通过提供标准化的评估框架，IndicMMLU-Pro推动了印度语言人工智能的研究，促进了更准确、高效和文化敏感的模型的发展。'}}}, {'id': 'https://huggingface.co/papers/2501.17117', 'title': 'Histoires Morales: A French Dataset for Assessing Moral Alignment', 'url': 'https://huggingface.co/papers/2501.17117', 'abstract': 'Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.', 'score': 2, 'issue_id': 1924, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'd2d1461e245219e8', 'authors': ['Thibaud Leteno', 'Irina Proskurina', 'Antoine Gourru', 'Julien Velcin', 'Charlotte Laclau', 'Guillaume Metzler', 'Christophe Gravier'], 'affiliations': ['Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France', 'Télécom Paris, Institut Polytechnique de Paris, Paris, France', 'Université Lumière Lyon 2, Université Claude Bernard Lyon 1, ERIC, 69007, Lyon, France'], 'pdf_title_img': 'assets/pdf/title_img/2501.17117.jpg', 'data': {'categories': ['#dataset', '#multilingual', '#alignment', '#ethics'], 'emoji': '🇫🇷', 'ru': {'title': 'Французский датасет для морального выравнивания языковых моделей', 'desc': "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человеческими ценностями. Этот датасет создан на основе 'Moral Stories' путем перевода и адаптации к французскому культурному контексту. Исследование включает эксперименты по выравниванию мультиязычных моделей на французских и английских данных. Результаты показывают, что языковые модели в целом соответствуют человеческим моральным нормам, но могут быть легко подвержены влиянию при оптимизации под предпочтения пользователей."}, 'en': {'title': 'Bridging Language Models and French Moral Values', 'desc': 'This paper emphasizes the importance of aligning language models with human values, particularly in the context of the French language. It introduces Histoires Morales, a dataset created from Moral Stories, which has been translated and refined to reflect French cultural norms and moral reasoning. The dataset includes various social situations to better understand how language models handle moral values in French. Preliminary experiments show that while language models generally align with human morals, they can be swayed by user preferences, highlighting the need for careful optimization.'}, 'zh': {'title': '让语言模型与人类价值观对齐', 'desc': '本论文强调了将语言模型与人类价值观对齐的重要性，尤其是在日常生活中。我们介绍了一个名为Histoires Morales的法语数据集，旨在填补法语在道德推理方面的研究空白。该数据集通过翻译和母语者的帮助进行精细化，确保其语法准确并适应法国文化背景。我们的初步实验表明，尽管大型语言模型通常与人类道德规范一致，但它们可以通过用户偏好优化轻易受到影响。'}}}];
+        const articlesData = [{'id': 'https://huggingface.co/papers/2501.17161', 'title': 'SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training', 'url': 'https://huggingface.co/papers/2501.17161', 'abstract': "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.", 'score': 29, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'ce9300709a3cdc7a', 'authors': ['Tianzhe Chu', 'Yuexiang Zhai', 'Jihan Yang', 'Shengbang Tong', 'Saining Xie', 'Dale Schuurmans', 'Quoc V. Le', 'Sergey Levine', 'Yi Ma'], 'affiliations': ['Google DeepMind', 'HKU', 'NYU', 'UC Berkeley'], 'pdf_title_img': 'assets/pdf/title_img/2501.17161.jpg', 'data': {'categories': ['#reasoning', '#training', '#optimization', '#rl', '#multimodal', '#games'], 'emoji': '🧠', 'ru': {'title': 'RL превосходит SFT в обобщении для мультимодальных задач', 'desc': 'Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкреплением (RL). Авторы анализируют способность моделей к обобщению на новые текстовые и визуальные варианты задач. Результаты показывают, что RL лучше обобщается на новые ситуации, особенно при использовании награды, основанной на результате. SFT, напротив, склонно к запоминанию обучающих данных и хуже справляется с обобщением.'}, 'en': {'title': 'Unlocking Generalization: RL Outshines SFT in Multi-Modal Tasks', 'desc': 'This paper investigates how supervised fine-tuning (SFT) and reinforcement learning (RL) affect the generalization abilities of foundation models. It highlights that while SFT often leads to memorization of training data, RL, particularly with outcome-based rewards, enhances generalization across unseen textual and visual variants. The study introduces GeneralPoints, a reasoning game, and V-IRL, a navigation environment, to evaluate model performance. The results indicate that RL not only improves generalization but also strengthens visual recognition, although SFT is still crucial for stabilizing the model before RL training.'}, 'zh': {'title': '强化学习提升模型泛化能力的研究', 'desc': '这篇论文研究了监督微调（SFT）和强化学习（RL）在基础模型中的作用，特别是在提高模型的泛化能力方面。研究表明，RL在处理文本和视觉变体时，能够更好地泛化，而SFT则倾向于记忆训练数据，难以应对未见过的情况。通过引入算术推理卡牌游戏GeneralPoints和真实世界导航环境V-IRL，作者评估了这两种方法的效果。尽管RL在泛化能力上表现优越，但SFT仍然对有效的RL训练至关重要，因为它稳定了模型的输出格式。'}}}, {'id': 'https://huggingface.co/papers/2501.17116', 'title': 'Optimizing Large Language Model Training Using FP4 Quantization', 'url': 'https://huggingface.co/papers/2501.17116', 'abstract': 'The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.', 'score': 14, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '9ce85dc91aee17fc', 'authors': ['Ruizhe Wang', 'Yeyun Gong', 'Xiao Liu', 'Guoshuai Zhao', 'Ziyue Yang', 'Baining Guo', 'Zhengjun Zha', 'Peng Cheng'], 'affiliations': ['Microsoft Research Asia', 'Microsoft SIGMA Team', 'University of Science and Technology of China'], 'pdf_title_img': 'assets/pdf/title_img/2501.17116.jpg', 'data': {'categories': ['#optimization', '#training', '#inference'], 'emoji': '🔢', 'ru': {'title': 'FP4: Революция в эффективности обучения языковых моделей', 'desc': 'Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авторы разработали дифференцируемый оценщик квантования для точного обновления весов и стратегию ограничения и компенсации выбросов для предотвращения коллапса активаций. Система включает схему обучения со смешанной точностью и векторное квантование для обеспечения стабильности. Экспериментальные результаты показывают, что FP4-обучение достигает точности, сравнимой с BF16 и FP8, эффективно масштабируясь до LLM с 13 млрд параметров.'}, 'en': {'title': 'Efficient Training of Large Language Models with FP4 Precision', 'desc': 'This paper addresses the high computational costs associated with training large language models (LLMs) by introducing a novel FP4 training framework. The framework utilizes quantized training techniques, specifically focusing on low-bit arithmetic to enhance efficiency while maintaining model accuracy. Key innovations include a differentiable quantization estimator for better weight updates and a strategy to manage outliers, which helps prevent activation collapse. Experimental results show that this FP4 approach achieves performance similar to higher precision formats like BF16 and FP8, making it suitable for large-scale LLMs.'}, 'zh': {'title': 'FP4训练框架：高效的超低精度训练新方案', 'desc': '随着大型语言模型（LLMs）训练对计算资源的需求不断增加，寻找更高效的方法变得尤为重要。量化训练通过允许低位数算术运算来降低这些成本，展现出良好的前景。尽管FP8精度已被证明可行，但FP4的应用仍面临显著的量化误差和有限的表示能力。本文提出了首个FP4训练框架，通过可微分量化估计器和异常值钳制与补偿策略，解决了这些挑战，并在稳定性方面结合了混合精度训练方案和向量级量化。'}}}, {'id': 'https://huggingface.co/papers/2501.16764', 'title': 'DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation', 'url': 'https://huggingface.co/papers/2501.16764', 'abstract': 'Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.', 'score': 10, 'issue_id': 1921, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '00ee1a0338716711', 'authors': ['Chenguo Lin', 'Panwang Pan', 'Bangbang Yang', 'Zeming Li', 'Yadong Mu'], 'affiliations': ['ByteDance', 'Peking University'], 'pdf_title_img': 'assets/pdf/title_img/2501.16764.jpg', 'data': {'categories': ['#diffusion', '#optimization', '#training', '#dataset', '#3d'], 'emoji': '🎨', 'ru': {'title': 'DiffSplat: Генерация 3D контента на новом уровне', 'desc': 'DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям.'}, 'en': {'title': 'Revolutionizing 3D Generation with DiffSplat', 'desc': 'DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices.'}, 'zh': {'title': 'DiffSplat：3D生成的新突破', 'desc': '最近，3D内容生成从文本或单张图像中取得了进展，但高质量3D数据集有限，且2D多视图生成存在不一致性。我们提出了DiffSplat，这是一种新颖的3D生成框架，能够通过控制大规模文本到图像的扩散模型，原生生成3D高斯点云。与以往的3D生成模型不同，DiffSplat有效利用了网络规模的2D先验，同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失，DiffSplat在文本和图像条件生成任务中表现出色，且在下游应用中也显示出其优越性。'}}}, {'id': 'https://huggingface.co/papers/2501.16975', 'title': 'Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling', 'url': 'https://huggingface.co/papers/2501.16975', 'abstract': 'Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.', 'score': 10, 'issue_id': 1920, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': '27930c2f5d17471e', 'authors': ['Hongzhi Huang', 'Defa Zhu', 'Banggu Wu', 'Yutao Zeng', 'Ya Wang', 'Qiyang Min', 'Xun Zhou'], 'affiliations': ['Seed-Foundation-Model Team, Bytedance'], 'pdf_title_img': 'assets/pdf/title_img/2501.16975.jpg', 'data': {'categories': ['#optimization', '#training', '#architecture'], 'emoji': '🔤', 'ru': {'title': 'Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей', 'desc': 'Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера.'}, 'en': {'title': 'Unlocking Performance: The Power of Over-Tokenization in Language Models', 'desc': "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers."}, 'zh': {'title': '分词技术提升大语言模型性能的关键', 'desc': '本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器，旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明，增大输入词汇表可以有效降低训练损失，从而提高模型性能。我们的实验结果显示，使用更大的输入词汇表可以在不增加成本的情况下，达到与双倍基线相当的性能。'}}}, {'id': 'https://huggingface.co/papers/2501.16496', 'title': 'Open Problems in Mechanistic Interpretability', 'url': 'https://huggingface.co/papers/2501.16496', 'abstract': "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.", 'score': 9, 'issue_id': 1920, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '5a7a914accebfa33', 'authors': ['Lee Sharkey', 'Bilal Chughtai', 'Joshua Batson', 'Jack Lindsey', 'Jeff Wu', 'Lucius Bushnaq', 'Nicholas Goldowsky-Dill', 'Stefan Heimersheim', 'Alejandro Ortega', 'Joseph Bloom', 'Stella Biderman', 'Adria Garriga-Alonso', 'Arthur Conmy', 'Neel Nanda', 'Jessica Rumbelow', 'Martin Wattenberg', 'Nandi Schoots', 'Joseph Miller', 'Eric J. Michaud', 'Stephen Casper', 'Max Tegmark', 'William Saunders', 'David Bau', 'Eric Todd', 'Atticus Geiger', 'Mor Geva', 'Jesse Hoogland', 'Daniel Murfet', 'Tom McGrath'], 'affiliations': ['Anthropic', 'Apollo Research', 'Google DeepMind', 'Harvard University', 'Imperial College London', 'Kings College London', 'Leap Laboratories', 'MIT', 'Northeastern University', 'Tel Aviv University', 'University of Melbourne'], 'pdf_title_img': 'assets/pdf/title_img/2501.16496.jpg', 'data': {'categories': ['#interpretability', '#survey'], 'emoji': '🧠', 'ru': {'title': 'Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта', 'desc': 'Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе их возможностей. Прогресс в этой области обещает обеспечить большую уверенность в поведении систем искусственного интеллекта и пролить свет на природу интеллекта. Авторы обсуждают открытые проблемы в области, требующие решения для реализации научных и практических преимуществ. Статья рассматривает текущие границы механистической интерпретируемости и приоритетные задачи для дальнейшего развития области.'}, 'en': {'title': 'Unlocking the Secrets of Neural Networks for Reliable AI', 'desc': 'Mechanistic interpretability focuses on understanding how neural networks work to achieve specific tasks, which can enhance the reliability of AI systems. This area of research aims to uncover the underlying processes that contribute to the intelligence exhibited by these models. Despite advancements, there are still significant challenges that need to be addressed, including improving methods for deeper insights and applying these methods effectively. Additionally, the field must consider socio-technical issues that affect and are affected by mechanistic interpretability efforts.'}, 'zh': {'title': '揭示神经网络的计算机制', 'desc': '机械解释性旨在理解神经网络能力背后的计算机制，以实现具体的科学和工程目标。该领域的进展有望提高对人工智能系统行为的信心，并揭示关于智能本质的有趣科学问题。尽管最近在这些目标上取得了一些进展，但仍有许多未解决的问题需要解决，以便实现更多的科学和实际利益。本文回顾了机械解释性的当前前沿及该领域应优先解决的开放问题。'}}}, {'id': 'https://huggingface.co/papers/2501.16372', 'title': 'Low-Rank Adapters Meet Neural Architecture Search for LLM Compression', 'url': 'https://huggingface.co/papers/2501.16372', 'abstract': 'The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.', 'score': 5, 'issue_id': 1918, 'pub_date': '2025-01-23', 'pub_date_card': {'ru': '23 января', 'en': 'January 23', 'zh': '1月23日'}, 'hash': 'f1d43a985dbea0af', 'authors': ['J. Pablo Muñoz', 'Jinjie Yuan', 'Nilesh Jain'], 'affiliations': ['Intel Corporation', 'Intel Labs'], 'pdf_title_img': 'assets/pdf/title_img/2501.16372.jpg', 'data': {'categories': ['#inference', '#optimization', '#open_source', '#training', '#low_resource', '#architecture'], 'emoji': '🧠', 'ru': {'title': 'Эффективная настройка крупных языковых моделей для ограниченных ресурсов', 'desc': 'Эта статья рассматривает проблему больших вычислительных ресурсов, необходимых для настройки и развертывания крупных языковых моделей (LLM). Авторы предлагают комбинировать низкоранговые адаптеры и методы поиска нейронных архитектур (NAS) для эффективной настройки параметров. Такой подход позволяет сжимать и дообучать большие предобученные модели, делая их более доступными в условиях ограниченных ресурсов. В результате получаются модели с меньшим потреблением памяти и более быстрым выводом, что открывает путь к более практичному применению LLM.'}, 'en': {'title': 'Democratizing Large Language Models with Efficient Fine-Tuning Techniques', 'desc': 'This paper addresses the challenges of using Large Language Models (LLMs) due to their high computational demands. It explores the use of low-rank adapters for parameter-efficient fine-tuning (PEFT), which helps reduce the resources needed. The authors combine low-rank representations with Neural Architecture Search (NAS) techniques, particularly through weight-sharing super-networks, to create efficient solutions for model compression and fine-tuning. The findings suggest that these strategies can make LLMs more accessible and practical for deployment in environments with limited resources, resulting in models that are faster and require less memory.'}, 'zh': {'title': '低秩适配器助力大型语言模型的高效微调', 'desc': '大型语言模型（LLMs）的快速发展带来了在微调和部署时对计算资源的巨大挑战。最近，低秩适配器在参数高效微调（PEFT）方面显示出了良好的效果。本文回顾了将低秩表示与神经架构搜索（NAS）技术相结合的创新方法，特别是权重共享超网络。通过整合这些方法，开发了压缩和微调大型预训练模型的稳健解决方案，使得LLMs在资源受限的环境中更易于部署。'}}}, {'id': 'https://huggingface.co/papers/2501.15747', 'title': 'IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding', 'url': 'https://huggingface.co/papers/2501.15747', 'abstract': "Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.", 'score': 4, 'issue_id': 1918, 'pub_date': '2025-01-27', 'pub_date_card': {'ru': '27 января', 'en': 'January 27', 'zh': '1月27日'}, 'hash': '4b666d035c5e5c4c', 'authors': ['Sankalp KJ', 'Ashutosh Kumar', 'Laxmaan Balaji', 'Nikunj Kotecha', 'Vinija Jain', 'Aman Chadha', 'Sreyoshi Bhaduri'], 'affiliations': ['Amazon Gen AI', 'Artificial Intelligence Institute, University of South Carolina', 'Independent Researcher', 'Meta AI', 'Rochester Institute of Technology'], 'pdf_title_img': 'assets/pdf/title_img/2501.15747.jpg', 'data': {'categories': ['#reasoning', '#low_resource', '#multilingual', '#benchmark'], 'emoji': '🇮🇳', 'ru': {'title': 'Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков', 'desc': 'IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает 9 основных языков Индийского субконтинента и включает широкий спектр задач по пониманию языка, рассуждению и генерации текста. Бенчмарк разработан с учетом уникальных особенностей и сложностей индийских языков. IndicMMLU-Pro предоставляет стандартизированную систему оценки для продвижения исследований в области ИИ для индийских языков.'}, 'en': {'title': 'Empowering Indic Languages with Advanced NLP Benchmarks', 'desc': 'The paper introduces IndicMMLU-Pro, a benchmark specifically designed to assess Large Language Models (LLMs) in the context of Indic languages. It builds on the existing MMLU Pro framework and includes major languages like Hindi, Bengali, and Tamil, addressing the unique linguistic challenges of the Indian subcontinent. The benchmark features a variety of tasks that test language comprehension, reasoning, and generation, ensuring a comprehensive evaluation of models. By providing a standardized framework, IndicMMLU-Pro aims to enhance the development of more accurate and culturally aware AI models for Indic languages.'}, 'zh': {'title': '推动印度语言AI研究的基准', 'desc': 'IndicMMLU-Pro是一个专门为印度语言设计的基准，旨在评估大型语言模型（LLMs）的表现。该基准基于MMLU Pro框架，涵盖了印地语、孟加拉语、古吉拉特语等主要语言，解决了印度次大陆语言的多样性带来的挑战。它包括语言理解、推理和生成等多种任务，旨在捕捉印度语言的复杂性。通过提供标准化的评估框架，IndicMMLU-Pro推动了印度语言人工智能的研究，促进了更准确、高效和文化敏感的模型的发展。'}}}, {'id': 'https://huggingface.co/papers/2501.17117', 'title': 'Histoires Morales: A French Dataset for Assessing Moral Alignment', 'url': 'https://huggingface.co/papers/2501.17117', 'abstract': 'Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.', 'score': 2, 'issue_id': 1924, 'pub_date': '2025-01-28', 'pub_date_card': {'ru': '28 января', 'en': 'January 28', 'zh': '1月28日'}, 'hash': 'd2d1461e245219e8', 'authors': ['Thibaud Leteno', 'Irina Proskurina', 'Antoine Gourru', 'Julien Velcin', 'Charlotte Laclau', 'Guillaume Metzler', 'Christophe Gravier'], 'affiliations': ['Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France', 'Télécom Paris, Institut Polytechnique de Paris, Paris, France', 'Université Lumière Lyon 2, Université Claude Bernard Lyon 1, ERIC, 69007, Lyon, France'], 'pdf_title_img': 'assets/pdf/title_img/2501.17117.jpg', 'data': {'categories': ['#dataset', '#multilingual', '#alignment', '#ethics'], 'emoji': '🇫🇷', 'ru': {'title': 'Французский датасет для морального выравнивания языковых моделей', 'desc': "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человеческими ценностями. Этот датасет создан на основе 'Moral Stories' путем перевода и адаптации к французскому культурному контексту. Исследование включает эксперименты по выравниванию мультиязычных моделей на французских и английских данных. Результаты показывают, что языковые модели в целом соответствуют человеческим моральным нормам, но могут быть легко подвержены влиянию при оптимизации под предпочтения пользователей."}, 'en': {'title': 'Bridging Language Models and French Moral Values', 'desc': 'This paper emphasizes the importance of aligning language models with human values, particularly in the context of the French language. It introduces Histoires Morales, a dataset created from Moral Stories, which has been translated and refined to reflect French cultural norms and moral reasoning. The dataset includes various social situations to better understand how language models handle moral values in French. Preliminary experiments show that while language models generally align with human morals, they can be swayed by user preferences, highlighting the need for careful optimization.'}, 'zh': {'title': '让语言模型与人类价值观对齐', 'desc': '本论文强调了将语言模型与人类价值观对齐的重要性，尤其是在日常生活中。我们介绍了一个名为Histoires Morales的法语数据集，旨在填补法语在道德推理方面的研究空白。该数据集通过翻译和母语者的帮助进行精细化，确保其语法准确并适应法国文化背景。我们的初步实验表明，尽管大型语言模型通常与人类道德规范一致，但它们可以通过用户偏好优化轻易受到影响。'}}}];
         const articlesContainer = document.getElementById('articles-container');
         const sortDropdown = document.getElementById('sort-dropdown');
         const categoryFiltersContainer = document.getElementById('category-filters');
@@ -1184,7 +1184,7 @@
         
         function updateTimeDiffs() {
             const timeDiff = document.getElementById('timeDiff');
-            timeDiff.innerHTML = '🔄 ' + getTimeDiff('2025-01-29 23:09',lang=currentLang);
+            timeDiff.innerHTML = '🔄 ' + getTimeDiff('2025-01-30 00:44',lang=currentLang);
         }
         function updateSortingOptions() {
             const sortingLabels = {
@@ -1238,14 +1238,14 @@
         } 
         function hideNextLink(format) {
             if (format === 'monthly') {
-                if (isCurrentMonth('2025-01-29 23:09')) {
+                if (isCurrentMonth('2025-01-30 00:44')) {
                     const element = document.getElementById('nav-next');
                     if (element) {    
                         element.style.display = 'none';
                     }
                 }
             } else {            
-                if (isToday('2025-01-29 23:09')) {
+                if (isToday('2025-01-30 00:44')) {
                     const element = document.getElementById('nav-next');
                     if (element) {    
                         element.style.display = 'none';
diff --git a/log.txt b/log.txt
index d2ef000d1..5600a782d 100644
--- a/log.txt
+++ b/log.txt
@@ -1,3 +1,3 @@
-[29.01.2025 23:09] Read previous papers.
-[29.01.2025 23:09] Generating top page (month).
-[29.01.2025 23:09] Writing top page (month).
+[30.01.2025 00:44] Read previous papers.
+[30.01.2025 00:44] Generating top page (month).
+[30.01.2025 00:44] Writing top page (month).
diff --git a/logs/2025-01-30_last_log.txt b/logs/2025-01-30_last_log.txt
new file mode 100644
index 000000000..4d0739ff3
--- /dev/null
+++ b/logs/2025-01-30_last_log.txt
@@ -0,0 +1,90 @@
+[29.01.2025 23:09] Read previous papers.
+[29.01.2025 23:09] Generating top page (month).
+[29.01.2025 23:09] Writing top page (month).
+[30.01.2025 00:44] Read previous papers.
+[30.01.2025 00:44] Get feed.
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.17161
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.17116
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.16764
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.16975
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.16496
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.16372
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.15747
+[30.01.2025 00:44] Get page data from previous paper. URL: https://huggingface.co/papers/2501.17117
+[30.01.2025 00:44] Obtaining deleted papers (sometimes HF Daily Papers move some articles from today to past days).
+[30.01.2025 00:44] No deleted papers detected.
+[30.01.2025 00:44] Downloading and parsing papers (pdf, html). Total: 8.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.17161.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.17161.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.17161.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.17116.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.17116.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.17116.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.16764.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.16764.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.16764.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.16975.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.16975.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.16975.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.16496.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.16496.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.16496.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.16372.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.16372.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.16372.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.15747.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.15747.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.15747.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Downloading and parsing paper https://huggingface.co/papers/2501.17117.
+[30.01.2025 00:44] Extra JSON file exists (./assets/json/2501.17117.json), skip PDF parsing.
+[30.01.2025 00:44] Paper image links file exists (./assets/img_data/2501.17117.json), skip HTML parsing.
+[30.01.2025 00:44] Success.
+[30.01.2025 00:44] Enriching papers with extra data.
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 0. Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, fo...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 1. The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a ...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 2. Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-im...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 3. Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling perf...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 4. Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 5. The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. Thi...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 6. Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark d...
+[30.01.2025 00:44] ********************************************************************************
+[30.01.2025 00:44] Abstract 7. Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant p...
+[30.01.2025 00:44] Read previous papers.
+[30.01.2025 00:44] Generating reviews via LLM API.
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#reasoning", "#training", "#optimization", "#rl", "#multimodal", "#games"], "emoji": "🧠", "ru": {"title": "RL превосходит SFT в обобщении для мультимодальных задач", "desc": "Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкр
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#optimization", "#training", "#inference"], "emoji": "🔢", "ru": {"title": "FP4: Революция в эффективности обучения языковых моделей", "desc": "Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авт
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#diffusion", "#optimization", "#training", "#dataset", "#3d"], "emoji": "🎨", "ru": {"title": "DiffSplat: Генерация 3D контента на новом уровне", "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. 
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#optimization", "#training", "#architecture"], "emoji": "🔤", "ru": {"title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей", "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transfo
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#interpretability", "#survey"], "emoji": "🧠", "ru": {"title": "Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта", "desc": "Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#inference", "#optimization", "#open_source", "#training", "#low_resource", "#architecture"], "emoji": "🧠", "ru": {"title": "Эффективная настройка крупных языковых моделей для ограниченных ресурсов", "desc": "Эта статья рассматривает проблему больших вычислительных ресурсов, необход
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#reasoning", "#low_resource", "#multilingual", "#benchmark"], "emoji": "🇮🇳", "ru": {"title": "Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков", "desc": "IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает
+[30.01.2025 00:44] Using data from previous issue: {"categories": ["#dataset", "#multilingual", "#alignment", "#ethics"], "emoji": "🇫🇷", "ru": {"title": "Французский датасет для морального выравнивания языковых моделей", "desc": "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человечески
+[30.01.2025 00:44] Loading Chinese text from previous data.
+[30.01.2025 00:44] Renaming data file.
+[30.01.2025 00:44] Renaming previous data. hf_papers.json to ./d/2025-01-30.json
+[30.01.2025 00:44] Saving new data file.
+[30.01.2025 00:44] Generating page.
+[30.01.2025 00:44] Renaming previous page.
+[30.01.2025 00:44] Renaming previous data. index.html to ./d/2025-01-30.html
+[30.01.2025 00:44] [Experimental] Generating Chinese page for reading.
+[30.01.2025 00:44] Chinese vocab [{'word': '监督', 'pinyin': 'jiàn dū', 'trans': 'supervised'}, {'word': '微调', 'pinyin': 'wēi tiáo', 'trans': 'fine-tuning'}, {'word': '强化学习', 'pinyin': 'qiáng huà xué xí', 'trans': 'reinforcement learning'}, {'word': '基础模型', 'pinyin': 'jī chǔ mó xíng', 'trans': 'foundational model'}, {'word': '作用', 'pinyin': 'zuò yòng', 'trans': 'effect'}, {'word': '泛化', 'pinyin': 'fàn huà', 'trans': 'generalization'}, {'word': '倾向于', 'pinyin': 'qīng xiàng yú', 'trans': 'tend to'}, {'word': '未见过', 'pinyin': 'wèi jiàn guò', 'trans': 'unseen'}, {'word': '变体', 'pinyin': 'biàn tǐ', 'trans': 'variant'}, {'word': '视觉识别', 'pinyin': 'shì jué shí bié', 'trans': 'visual recognition'}, {'word': '不可或缺', 'pinyin': 'bù kě huò quē', 'trans': 'indispensable'}]
+[30.01.2025 00:44] Renaming previous Chinese page.
+[30.01.2025 00:44] Renaming previous data. zh.html to ./d/2025-01-29_zh_reading_task.html
+[30.01.2025 00:44] Writing Chinese reading task.
+[30.01.2025 00:44] Writing result.
+[30.01.2025 00:44] Renaming log file.
+[30.01.2025 00:44] Renaming previous data. log.txt to ./logs/2025-01-30_last_log.txt
diff --git a/m/2025-01.html b/m/2025-01.html
index 2b05241d7..5bf727353 100644
--- a/m/2025-01.html
+++ b/m/2025-01.html
@@ -1184,7 +1184,7 @@
         
         function updateTimeDiffs() {
             const timeDiff = document.getElementById('timeDiff');
-            timeDiff.innerHTML = '🔄 ' + getTimeDiff('2025-01-29 23:09',lang=currentLang);
+            timeDiff.innerHTML = '🔄 ' + getTimeDiff('2025-01-30 00:44',lang=currentLang);
         }
         function updateSortingOptions() {
             const sortingLabels = {
@@ -1238,14 +1238,14 @@
         } 
         function hideNextLink(format) {
             if (format === 'monthly') {
-                if (isCurrentMonth('2025-01-29 23:09')) {
+                if (isCurrentMonth('2025-01-30 00:44')) {
                     const element = document.getElementById('nav-next');
                     if (element) {    
                         element.style.display = 'none';
                     }
                 }
             } else {            
-                if (isToday('2025-01-29 23:09')) {
+                if (isToday('2025-01-30 00:44')) {
                     const element = document.getElementById('nav-next');
                     if (element) {    
                         element.style.display = 'none';

Word	Pinyin	Translation
监督	jiàn dū	supervised
微调	wēi tiáo	fine-tuning
强化学习	qiáng huà xué xí	reinforcement learning
基础模型	jī chǔ mó xíng	foundational model
作用	zuò yòng	effect
泛化	fàn huà	generalization
倾向于	qīng xiàng yú	tend to
未见过	wèi jiàn guò	unseen
变体	biàn tǐ	variant
视觉识别	shì jué shí bié	visual recognition
不可或缺	bù kě huò quē	indispensable