From d914ca4fc36cbf72e2abecb2355099ab88476af6 Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Thu, 30 Jan 2025 00:44:13 +0000 Subject: [PATCH] Auto. Make Doomgrad HF Review on 30 January --- d/2025-01-29_zh_reading_task.html | 180 ++++ d/2025-01-30.html | 1270 +++++++++++++++++++++++++++++ d/2025-01-30.json | 513 ++++++++++++ hf_papers.json | 130 +-- index.html | 22 +- log.txt | 6 +- logs/2025-01-30_last_log.txt | 90 ++ m/2025-01.html | 6 +- 8 files changed, 2135 insertions(+), 82 deletions(-) create mode 100644 d/2025-01-29_zh_reading_task.html create mode 100644 d/2025-01-30.html create mode 100644 d/2025-01-30.json create mode 100644 logs/2025-01-30_last_log.txt diff --git a/d/2025-01-29_zh_reading_task.html b/d/2025-01-29_zh_reading_task.html new file mode 100644 index 000000000..ffc6f226f --- /dev/null +++ b/d/2025-01-29_zh_reading_task.html @@ -0,0 +1,180 @@ + + + + + + + + + + + Chinese reading task about ML + + + +
+

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

+

1. 这篇文章比较了监督微调(SFT)和强化学习(RL)在基础模型上的作用。

+

2. 研究发现,RL在文本和视觉任务上都表现出更好的泛化能力。

+

3. SFT倾向于记住训练数据,而RL能够处理未见过的变体。

+

4. RL还提高了模型的视觉识别能力。

+

5. 然而,SFT对于RL的有效训练仍然不可或缺。

+
+

1. 这篇文章比较了监督微调(SFT)和强化学习(RL)在基础模型上的作用。研究发现,RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据,而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而,SFT对于RL的有效训练仍然不可或缺。 + +Zhè piān wénzhāng bǐjiào le jiàndū wēitiáo (SFT) hé qiáng huà xuéxí (RL) zài jīchǔ móxíng shàng de zuòyòng

+

2. Yánjiū fāxiàn, RL zài wénběn hé shìjué rènwù shàng dōu biǎoxiàn chū gèng hǎo de fànhuà nénglì

+

3. SFT qīngxiàng yú jìzhù xùnliàn shùjù, ér RL nénggòu chǔlǐ wèi jiànguò de biàntǐ

+

4. RL hái tígāo le móxíng de shìjué shíbié nénglì

+

5. Rán'ér, SFT duìyú RL de yǒuxiào xùnliàn réngrán bùkě huòquē

+
+

1. This article compares the roles of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on base models.

+

2. The study found that RL demonstrates better generalization capabilities in both textual and visual tasks.

+

3. SFT tends to memorize training data, while RL can handle unseen variants.

+

4. RL also enhances the model's visual recognition capabilities.

+

5. However, SFT remains indispensable for effective RL training.

+

Vocabulary

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
WordPinyinTranslation
监督jiàn dūsupervised
微调wēi tiáofine-tuning
强化学习qiáng huà xué xíreinforcement learning
基础模型jī chǔ mó xíngfoundational model
作用zuò yòngeffect
泛化fàn huàgeneralization
倾向于qīng xiàng yútend to
未见过wèi jiàn guòunseen
变体biàn tǐvariant
视觉识别shì jué shí biévisual recognition
不可或缺bù kě huò quēindispensable
+
+ + + \ No newline at end of file diff --git a/d/2025-01-30.html b/d/2025-01-30.html new file mode 100644 index 000000000..b5910dbb7 --- /dev/null +++ b/d/2025-01-30.html @@ -0,0 +1,1270 @@ + + + + + + + + HF. 8 papers. January 29. + + + + + + + +
+
+

🔺

hf daily

+

29 января | 8 papers

+
+
+ +
+
+ +
+
+
+ +
+
+ + +
+
+
+
+
+ 🏷️ Фильтр + + + +
+
+
+ + +
+
+
+ 🧹 + +
+
+ +
+
+ + + + + \ No newline at end of file diff --git a/d/2025-01-30.json b/d/2025-01-30.json new file mode 100644 index 000000000..86b937e9f --- /dev/null +++ b/d/2025-01-30.json @@ -0,0 +1,513 @@ +{ + "date": { + "ru": "29 января", + "en": "January 29", + "zh": "1月29日" + }, + "time_utc": "2025-01-29 23:09", + "weekday": 2, + "issue_id": 1937, + "home_page_url": "https://huggingface.co/papers", + "papers": [ + { + "id": "https://huggingface.co/papers/2501.17161", + "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training", + "url": "https://huggingface.co/papers/2501.17161", + "abstract": "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.", + "score": 28, + "issue_id": 1920, + "pub_date": "2025-01-28", + "pub_date_card": { + "ru": "28 января", + "en": "January 28", + "zh": "1月28日" + }, + "hash": "ce9300709a3cdc7a", + "authors": [ + "Tianzhe Chu", + "Yuexiang Zhai", + "Jihan Yang", + "Shengbang Tong", + "Saining Xie", + "Dale Schuurmans", + "Quoc V. Le", + "Sergey Levine", + "Yi Ma" + ], + "affiliations": [ + "Google DeepMind", + "HKU", + "NYU", + "UC Berkeley" + ], + "pdf_title_img": "assets/pdf/title_img/2501.17161.jpg", + "data": { + "categories": [ + "#reasoning", + "#training", + "#optimization", + "#rl", + "#multimodal", + "#games" + ], + "emoji": "🧠", + "ru": { + "title": "RL превосходит SFT в обобщении для мультимодальных задач", + "desc": "Это исследование сравнивает методы дообучения языковых моделей: обучение с учителем (SFT) и обучение с подкреплением (RL). Авторы анализируют способность моделей к обобщению на новые текстовые и визуальные варианты задач. Результаты показывают, что RL лучше обобщается на новые ситуации, особенно при использовании награды, основанной на результате. SFT, напротив, склонно к запоминанию обучающих данных и хуже справляется с обобщением." + }, + "en": { + "title": "Unlocking Generalization: RL Outshines SFT in Multi-Modal Tasks", + "desc": "This paper investigates how supervised fine-tuning (SFT) and reinforcement learning (RL) affect the generalization abilities of foundation models. It highlights that while SFT often leads to memorization of training data, RL, particularly with outcome-based rewards, enhances generalization across unseen textual and visual variants. The study introduces GeneralPoints, a reasoning game, and V-IRL, a navigation environment, to evaluate model performance. The results indicate that RL not only improves generalization but also strengthens visual recognition, although SFT is still crucial for stabilizing the model before RL training." + }, + "zh": { + "title": "强化学习提升模型泛化能力的研究", + "desc": "这篇论文研究了监督微调(SFT)和强化学习(RL)在基础模型中的作用,特别是在提高模型的泛化能力方面。研究表明,RL在处理文本和视觉变体时,能够更好地泛化,而SFT则倾向于记忆训练数据,难以应对未见过的情况。通过引入算术推理卡牌游戏GeneralPoints和真实世界导航环境V-IRL,作者评估了这两种方法的效果。尽管RL在泛化能力上表现优越,但SFT仍然对有效的RL训练至关重要,因为它稳定了模型的输出格式。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.17116", + "title": "Optimizing Large Language Model Training Using FP4 Quantization", + "url": "https://huggingface.co/papers/2501.17116", + "abstract": "The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.", + "score": 13, + "issue_id": 1920, + "pub_date": "2025-01-28", + "pub_date_card": { + "ru": "28 января", + "en": "January 28", + "zh": "1月28日" + }, + "hash": "9ce85dc91aee17fc", + "authors": [ + "Ruizhe Wang", + "Yeyun Gong", + "Xiao Liu", + "Guoshuai Zhao", + "Ziyue Yang", + "Baining Guo", + "Zhengjun Zha", + "Peng Cheng" + ], + "affiliations": [ + "Microsoft Research Asia", + "Microsoft SIGMA Team", + "University of Science and Technology of China" + ], + "pdf_title_img": "assets/pdf/title_img/2501.17116.jpg", + "data": { + "categories": [ + "#optimization", + "#training", + "#inference" + ], + "emoji": "🔢", + "ru": { + "title": "FP4: Революция в эффективности обучения языковых моделей", + "desc": "Статья представляет первую систему обучения больших языковых моделей (LLM) с использованием 4-битной точности с плавающей запятой (FP4). Авторы разработали дифференцируемый оценщик квантования для точного обновления весов и стратегию ограничения и компенсации выбросов для предотвращения коллапса активаций. Система включает схему обучения со смешанной точностью и векторное квантование для обеспечения стабильности. Экспериментальные результаты показывают, что FP4-обучение достигает точности, сравнимой с BF16 и FP8, эффективно масштабируясь до LLM с 13 млрд параметров." + }, + "en": { + "title": "Efficient Training of Large Language Models with FP4 Precision", + "desc": "This paper addresses the high computational costs associated with training large language models (LLMs) by introducing a novel FP4 training framework. The framework utilizes quantized training techniques, specifically focusing on low-bit arithmetic to enhance efficiency while maintaining model accuracy. Key innovations include a differentiable quantization estimator for better weight updates and a strategy to manage outliers, which helps prevent activation collapse. Experimental results show that this FP4 approach achieves performance similar to higher precision formats like BF16 and FP8, making it suitable for large-scale LLMs." + }, + "zh": { + "title": "FP4训练框架:高效的超低精度训练新方案", + "desc": "随着大型语言模型(LLMs)训练对计算资源的需求不断增加,寻找更高效的方法变得尤为重要。量化训练通过允许低位数算术运算来降低这些成本,展现出良好的前景。尽管FP8精度已被证明可行,但FP4的应用仍面临显著的量化误差和有限的表示能力。本文提出了首个FP4训练框架,通过可微分量化估计器和异常值钳制与补偿策略,解决了这些挑战,并在稳定性方面结合了混合精度训练方案和向量级量化。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.16975", + "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling", + "url": "https://huggingface.co/papers/2501.16975", + "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.", + "score": 10, + "issue_id": 1920, + "pub_date": "2025-01-28", + "pub_date_card": { + "ru": "28 января", + "en": "January 28", + "zh": "1月28日" + }, + "hash": "27930c2f5d17471e", + "authors": [ + "Hongzhi Huang", + "Defa Zhu", + "Banggu Wu", + "Yutao Zeng", + "Ya Wang", + "Qiyang Min", + "Xun Zhou" + ], + "affiliations": [ + "Seed-Foundation-Model Team, Bytedance" + ], + "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg", + "data": { + "categories": [ + "#optimization", + "#training", + "#architecture" + ], + "emoji": "🔤", + "ru": { + "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей", + "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера." + }, + "en": { + "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models", + "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers." + }, + "zh": { + "title": "分词技术提升大语言模型性能的关键", + "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器,旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明,增大输入词汇表可以有效降低训练损失,从而提高模型性能。我们的实验结果显示,使用更大的输入词汇表可以在不增加成本的情况下,达到与双倍基线相当的性能。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.16764", + "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation", + "url": "https://huggingface.co/papers/2501.16764", + "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.", + "score": 8, + "issue_id": 1921, + "pub_date": "2025-01-28", + "pub_date_card": { + "ru": "28 января", + "en": "January 28", + "zh": "1月28日" + }, + "hash": "00ee1a0338716711", + "authors": [ + "Chenguo Lin", + "Panwang Pan", + "Bangbang Yang", + "Zeming Li", + "Yadong Mu" + ], + "affiliations": [ + "ByteDance", + "Peking University" + ], + "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg", + "data": { + "categories": [ + "#diffusion", + "#optimization", + "#training", + "#dataset", + "#3d" + ], + "emoji": "🎨", + "ru": { + "title": "DiffSplat: Генерация 3D контента на новом уровне", + "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям." + }, + "en": { + "title": "Revolutionizing 3D Generation with DiffSplat", + "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices." + }, + "zh": { + "title": "DiffSplat:3D生成的新突破", + "desc": "最近,3D内容生成从文本或单张图像中取得了进展,但高质量3D数据集有限,且2D多视图生成存在不一致性。我们提出了DiffSplat,这是一种新颖的3D生成框架,能够通过控制大规模文本到图像的扩散模型,原生生成3D高斯点云。与以往的3D生成模型不同,DiffSplat有效利用了网络规模的2D先验,同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失,DiffSplat在文本和图像条件生成任务中表现出色,且在下游应用中也显示出其优越性。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.16496", + "title": "Open Problems in Mechanistic Interpretability", + "url": "https://huggingface.co/papers/2501.16496", + "abstract": "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.", + "score": 8, + "issue_id": 1920, + "pub_date": "2025-01-27", + "pub_date_card": { + "ru": "27 января", + "en": "January 27", + "zh": "1月27日" + }, + "hash": "5a7a914accebfa33", + "authors": [ + "Lee Sharkey", + "Bilal Chughtai", + "Joshua Batson", + "Jack Lindsey", + "Jeff Wu", + "Lucius Bushnaq", + "Nicholas Goldowsky-Dill", + "Stefan Heimersheim", + "Alejandro Ortega", + "Joseph Bloom", + "Stella Biderman", + "Adria Garriga-Alonso", + "Arthur Conmy", + "Neel Nanda", + "Jessica Rumbelow", + "Martin Wattenberg", + "Nandi Schoots", + "Joseph Miller", + "Eric J. Michaud", + "Stephen Casper", + "Max Tegmark", + "William Saunders", + "David Bau", + "Eric Todd", + "Atticus Geiger", + "Mor Geva", + "Jesse Hoogland", + "Daniel Murfet", + "Tom McGrath" + ], + "affiliations": [ + "Anthropic", + "Apollo Research", + "Google DeepMind", + "Harvard University", + "Imperial College London", + "Kings College London", + "Leap Laboratories", + "MIT", + "Northeastern University", + "Tel Aviv University", + "University of Melbourne" + ], + "pdf_title_img": "assets/pdf/title_img/2501.16496.jpg", + "data": { + "categories": [ + "#interpretability", + "#survey" + ], + "emoji": "🧠", + "ru": { + "title": "Раскрывая тайны нейронных сетей: путь к пониманию искусственного интеллекта", + "desc": "Статья посвящена механистической интерпретируемости нейронных сетей, цель которой - понять вычислительные механизмы, лежащие в основе их возможностей. Прогресс в этой области обещает обеспечить большую уверенность в поведении систем искусственного интеллекта и пролить свет на природу интеллекта. Авторы обсуждают открытые проблемы в области, требующие решения для реализации научных и практических преимуществ. Статья рассматривает текущие границы механистической интерпретируемости и приоритетные задачи для дальнейшего развития области." + }, + "en": { + "title": "Unlocking the Secrets of Neural Networks for Reliable AI", + "desc": "Mechanistic interpretability focuses on understanding how neural networks work to achieve specific tasks, which can enhance the reliability of AI systems. This area of research aims to uncover the underlying processes that contribute to the intelligence exhibited by these models. Despite advancements, there are still significant challenges that need to be addressed, including improving methods for deeper insights and applying these methods effectively. Additionally, the field must consider socio-technical issues that affect and are affected by mechanistic interpretability efforts." + }, + "zh": { + "title": "揭示神经网络的计算机制", + "desc": "机械解释性旨在理解神经网络能力背后的计算机制,以实现具体的科学和工程目标。该领域的进展有望提高对人工智能系统行为的信心,并揭示关于智能本质的有趣科学问题。尽管最近在这些目标上取得了一些进展,但仍有许多未解决的问题需要解决,以便实现更多的科学和实际利益。本文回顾了机械解释性的当前前沿及该领域应优先解决的开放问题。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.16372", + "title": "Low-Rank Adapters Meet Neural Architecture Search for LLM Compression", + "url": "https://huggingface.co/papers/2501.16372", + "abstract": "The rapid expansion of Large Language Models (LLMs) has posed significant challenges regarding the computational resources required for fine-tuning and deployment. Recent advancements in low-rank adapters have demonstrated their efficacy in parameter-efficient fine-tuning (PEFT) of these models. This retrospective paper comprehensively discusses innovative approaches that synergize low-rank representations with Neural Architecture Search (NAS) techniques, particularly weight-sharing super-networks. Robust solutions for compressing and fine-tuning large pre-trained models are developed by integrating these methodologies. Our analysis highlights the potential of these combined strategies to democratize the use of LLMs, making them more accessible for deployment in resource-constrained environments. The resulting models exhibit reduced memory footprints and faster inference times, paving the way for more practical and scalable applications of LLMs. Models and code are available at https://github.com/IntelLabs/Hardware-Aware-Automated-Machine-Learning.", + "score": 5, + "issue_id": 1918, + "pub_date": "2025-01-23", + "pub_date_card": { + "ru": "23 января", + "en": "January 23", + "zh": "1月23日" + }, + "hash": "f1d43a985dbea0af", + "authors": [ + "J. Pablo Muñoz", + "Jinjie Yuan", + "Nilesh Jain" + ], + "affiliations": [ + "Intel Corporation", + "Intel Labs" + ], + "pdf_title_img": "assets/pdf/title_img/2501.16372.jpg", + "data": { + "categories": [ + "#inference", + "#optimization", + "#open_source", + "#training", + "#low_resource", + "#architecture" + ], + "emoji": "🧠", + "ru": { + "title": "Эффективная настройка крупных языковых моделей для ограниченных ресурсов", + "desc": "Эта статья рассматривает проблему больших вычислительных ресурсов, необходимых для настройки и развертывания крупных языковых моделей (LLM). Авторы предлагают комбинировать низкоранговые адаптеры и методы поиска нейронных архитектур (NAS) для эффективной настройки параметров. Такой подход позволяет сжимать и дообучать большие предобученные модели, делая их более доступными в условиях ограниченных ресурсов. В результате получаются модели с меньшим потреблением памяти и более быстрым выводом, что открывает путь к более практичному применению LLM." + }, + "en": { + "title": "Democratizing Large Language Models with Efficient Fine-Tuning Techniques", + "desc": "This paper addresses the challenges of using Large Language Models (LLMs) due to their high computational demands. It explores the use of low-rank adapters for parameter-efficient fine-tuning (PEFT), which helps reduce the resources needed. The authors combine low-rank representations with Neural Architecture Search (NAS) techniques, particularly through weight-sharing super-networks, to create efficient solutions for model compression and fine-tuning. The findings suggest that these strategies can make LLMs more accessible and practical for deployment in environments with limited resources, resulting in models that are faster and require less memory." + }, + "zh": { + "title": "低秩适配器助力大型语言模型的高效微调", + "desc": "大型语言模型(LLMs)的快速发展带来了在微调和部署时对计算资源的巨大挑战。最近,低秩适配器在参数高效微调(PEFT)方面显示出了良好的效果。本文回顾了将低秩表示与神经架构搜索(NAS)技术相结合的创新方法,特别是权重共享超网络。通过整合这些方法,开发了压缩和微调大型预训练模型的稳健解决方案,使得LLMs在资源受限的环境中更易于部署。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.15747", + "title": "IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding", + "url": "https://huggingface.co/papers/2501.15747", + "abstract": "Known by more than 1.5 billion people in the Indian subcontinent, Indic languages present unique challenges and opportunities for natural language processing (NLP) research due to their rich cultural heritage, linguistic diversity, and complex structures. IndicMMLU-Pro is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) across Indic languages, building upon the MMLU Pro (Massive Multitask Language Understanding) framework. Covering major languages such as Hindi, Bengali, Gujarati, Marathi, Kannada, Punjabi, Tamil, Telugu, and Urdu, our benchmark addresses the unique challenges and opportunities presented by the linguistic diversity of the Indian subcontinent. This benchmark encompasses a wide range of tasks in language comprehension, reasoning, and generation, meticulously crafted to capture the intricacies of Indian languages. IndicMMLU-Pro provides a standardized evaluation framework to push the research boundaries in Indic language AI, facilitating the development of more accurate, efficient, and culturally sensitive models. This paper outlines the benchmarks' design principles, task taxonomy, and data collection methodology, and presents baseline results from state-of-the-art multilingual models.", + "score": 4, + "issue_id": 1918, + "pub_date": "2025-01-27", + "pub_date_card": { + "ru": "27 января", + "en": "January 27", + "zh": "1月27日" + }, + "hash": "4b666d035c5e5c4c", + "authors": [ + "Sankalp KJ", + "Ashutosh Kumar", + "Laxmaan Balaji", + "Nikunj Kotecha", + "Vinija Jain", + "Aman Chadha", + "Sreyoshi Bhaduri" + ], + "affiliations": [ + "Amazon Gen AI", + "Artificial Intelligence Institute, University of South Carolina", + "Independent Researcher", + "Meta AI", + "Rochester Institute of Technology" + ], + "pdf_title_img": "assets/pdf/title_img/2501.15747.jpg", + "data": { + "categories": [ + "#reasoning", + "#low_resource", + "#multilingual", + "#benchmark" + ], + "emoji": "🇮🇳", + "ru": { + "title": "Новый рубеж в NLP: комплексная оценка языковых моделей для индийских языков", + "desc": "IndicMMLU-Pro - это комплексный бенчмарк для оценки языковых моделей в индийских языках. Он охватывает 9 основных языков Индийского субконтинента и включает широкий спектр задач по пониманию языка, рассуждению и генерации текста. Бенчмарк разработан с учетом уникальных особенностей и сложностей индийских языков. IndicMMLU-Pro предоставляет стандартизированную систему оценки для продвижения исследований в области ИИ для индийских языков." + }, + "en": { + "title": "Empowering Indic Languages with Advanced NLP Benchmarks", + "desc": "The paper introduces IndicMMLU-Pro, a benchmark specifically designed to assess Large Language Models (LLMs) in the context of Indic languages. It builds on the existing MMLU Pro framework and includes major languages like Hindi, Bengali, and Tamil, addressing the unique linguistic challenges of the Indian subcontinent. The benchmark features a variety of tasks that test language comprehension, reasoning, and generation, ensuring a comprehensive evaluation of models. By providing a standardized framework, IndicMMLU-Pro aims to enhance the development of more accurate and culturally aware AI models for Indic languages." + }, + "zh": { + "title": "推动印度语言AI研究的基准", + "desc": "IndicMMLU-Pro是一个专门为印度语言设计的基准,旨在评估大型语言模型(LLMs)的表现。该基准基于MMLU Pro框架,涵盖了印地语、孟加拉语、古吉拉特语等主要语言,解决了印度次大陆语言的多样性带来的挑战。它包括语言理解、推理和生成等多种任务,旨在捕捉印度语言的复杂性。通过提供标准化的评估框架,IndicMMLU-Pro推动了印度语言人工智能的研究,促进了更准确、高效和文化敏感的模型的发展。" + } + } + }, + { + "id": "https://huggingface.co/papers/2501.17117", + "title": "Histoires Morales: A French Dataset for Assessing Moral Alignment", + "url": "https://huggingface.co/papers/2501.17117", + "abstract": "Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce Histoires Morales, a French dataset derived from Moral Stories, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. Histoires Morales covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data.", + "score": 2, + "issue_id": 1924, + "pub_date": "2025-01-28", + "pub_date_card": { + "ru": "28 января", + "en": "January 28", + "zh": "1月28日" + }, + "hash": "d2d1461e245219e8", + "authors": [ + "Thibaud Leteno", + "Irina Proskurina", + "Antoine Gourru", + "Julien Velcin", + "Charlotte Laclau", + "Guillaume Metzler", + "Christophe Gravier" + ], + "affiliations": [ + "Laboratoire Hubert Curien, UMR CNRS 5516, Saint-Etienne, France", + "Télécom Paris, Institut Polytechnique de Paris, Paris, France", + "Université Lumière Lyon 2, Université Claude Bernard Lyon 1, ERIC, 69007, Lyon, France" + ], + "pdf_title_img": "assets/pdf/title_img/2501.17117.jpg", + "data": { + "categories": [ + "#dataset", + "#multilingual", + "#alignment", + "#ethics" + ], + "emoji": "🇫🇷", + "ru": { + "title": "Французский датасет для морального выравнивания языковых моделей", + "desc": "Статья представляет набор данных 'Histoires Morales' на французском языке для выравнивания языковых моделей с человеческими ценностями. Этот датасет создан на основе 'Moral Stories' путем перевода и адаптации к французскому культурному контексту. Исследование включает эксперименты по выравниванию мультиязычных моделей на французских и английских данных. Результаты показывают, что языковые модели в целом соответствуют человеческим моральным нормам, но могут быть легко подвержены влиянию при оптимизации под предпочтения пользователей." + }, + "en": { + "title": "Bridging Language Models and French Moral Values", + "desc": "This paper emphasizes the importance of aligning language models with human values, particularly in the context of the French language. It introduces Histoires Morales, a dataset created from Moral Stories, which has been translated and refined to reflect French cultural norms and moral reasoning. The dataset includes various social situations to better understand how language models handle moral values in French. Preliminary experiments show that while language models generally align with human morals, they can be swayed by user preferences, highlighting the need for careful optimization." + }, + "zh": { + "title": "让语言模型与人类价值观对齐", + "desc": "本论文强调了将语言模型与人类价值观对齐的重要性,尤其是在日常生活中。我们介绍了一个名为Histoires Morales的法语数据集,旨在填补法语在道德推理方面的研究空白。该数据集通过翻译和母语者的帮助进行精细化,确保其语法准确并适应法国文化背景。我们的初步实验表明,尽管大型语言模型通常与人类道德规范一致,但它们可以通过用户偏好优化轻易受到影响。" + } + } + } + ], + "link_prev": "2025-01-28.html", + "link_next": "2025-01-30.html", + "link_month": "2025-01.html", + "short_date_prev": { + "ru": "28.01", + "en": "01/28", + "zh": "1月28日" + }, + "short_date_next": { + "ru": "30.01", + "en": "01/30", + "zh": "1月30日" + }, + "categories": { + "#dataset": 2, + "#data": 0, + "#benchmark": 1, + "#agents": 0, + "#cv": 0, + "#rl": 1, + "#rlhf": 0, + "#rag": 0, + "#plp": 0, + "#inference": 2, + "#3d": 1, + "#audio": 0, + "#video": 0, + "#multimodal": 1, + "#math": 0, + "#multilingual": 2, + "#architecture": 2, + "#healthcare": 0, + "#training": 5, + "#robotics": 0, + "#agi": 0, + "#games": 1, + "#interpretability": 1, + "#reasoning": 2, + "#transfer_learning": 0, + "#graphs": 0, + "#ethics": 1, + "#security": 0, + "#optimization": 5, + "#survey": 1, + "#diffusion": 1, + "#alignment": 1, + "#story_generation": 0, + "#hallucinations": 0, + "#long_context": 0, + "#synthetic": 0, + "#machine_translation": 0, + "#leakage": 0, + "#open_source": 1, + "#small_models": 0, + "#science": 0, + "#low_resource": 2 + }, + "zh": { + "text": "这篇文章比较了监督微调(SFT)和强化学习(RL)在基础模型上的作用。研究发现,RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据,而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而,SFT对于RL的有效训练仍然不可或缺。", + "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training", + "pinyin": "这篇文章比较了监督微调(SFT)和强化学习(RL)在基础模型上的作用。研究发现,RL在文本和视觉任务上都表现出更好的泛化能力。SFT倾向于记住训练数据,而RL能够处理未见过的变体。RL还提高了模型的视觉识别能力。然而,SFT对于RL的有效训练仍然不可或缺。\n\nZhè piān wénzhāng bǐjiào le jiàndū wēitiáo (SFT) hé qiáng huà xuéxí (RL) zài jīchǔ móxíng shàng de zuòyòng. Yánjiū fāxiàn, RL zài wénběn hé shìjué rènwù shàng dōu biǎoxiàn chū gèng hǎo de fànhuà nénglì. SFT qīngxiàng yú jìzhù xùnliàn shùjù, ér RL nénggòu chǔlǐ wèi jiànguò de biàntǐ. RL hái tígāo le móxíng de shìjué shíbié nénglì. Rán'ér, SFT duìyú RL de yǒuxiào xùnliàn réngrán bùkě huòquē.", + "vocab": "[{'word': '监督', 'pinyin': 'jiàn dū', 'trans': 'supervised'},\n{'word': '微调', 'pinyin': 'wēi tiáo', 'trans': 'fine-tuning'},\n{'word': '强化学习', 'pinyin': 'qiáng huà xué xí', 'trans': 'reinforcement learning'},\n{'word': '基础模型', 'pinyin': 'jī chǔ mó xíng', 'trans': 'foundational model'},\n{'word': '作用', 'pinyin': 'zuò yòng', 'trans': 'effect'},\n{'word': '泛化', 'pinyin': 'fàn huà', 'trans': 'generalization'},\n{'word': '倾向于', 'pinyin': 'qīng xiàng yú', 'trans': 'tend to'},\n{'word': '未见过', 'pinyin': 'wèi jiàn guò', 'trans': 'unseen'},\n{'word': '变体', 'pinyin': 'biàn tǐ', 'trans': 'variant'},\n{'word': '视觉识别', 'pinyin': 'shì jué shí bié', 'trans': 'visual recognition'},\n{'word': '不可或缺', 'pinyin': 'bù kě huò quē', 'trans': 'indispensable'}]", + "trans": "This article compares the roles of Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on base models. The study found that RL demonstrates better generalization capabilities in both textual and visual tasks. SFT tends to memorize training data, while RL can handle unseen variants. RL also enhances the model's visual recognition capabilities. However, SFT remains indispensable for effective RL training.", + "update_ts": "2025-01-29 09:10" + } +} \ No newline at end of file diff --git a/hf_papers.json b/hf_papers.json index 86b937e9f..47f8f7105 100644 --- a/hf_papers.json +++ b/hf_papers.json @@ -1,12 +1,12 @@ { "date": { - "ru": "29 января", - "en": "January 29", - "zh": "1月29日" + "ru": "30 января", + "en": "January 30", + "zh": "1月30日" }, - "time_utc": "2025-01-29 23:09", - "weekday": 2, - "issue_id": 1937, + "time_utc": "2025-01-30 00:44", + "weekday": 3, + "issue_id": 1938, "home_page_url": "https://huggingface.co/papers", "papers": [ { @@ -14,7 +14,7 @@ "title": "SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training", "url": "https://huggingface.co/papers/2501.17161", "abstract": "Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models. However, their roles in enhancing model generalization capabilities remain unclear. This paper studies the difference between SFT and RL on generalization and memorization, focusing on text-based rule variants and visual variants. We introduce GeneralPoints, an arithmetic reasoning card game, and adopt V-IRL, a real-world navigation environment, to assess how models trained with SFT and RL generalize to unseen variants in both textual and visual domains. We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants. SFT, in contrast, tends to memorize training data and struggles to generalize out-of-distribution scenarios. Further analysis reveals that RL improves the model's underlying visual recognition capabilities, contributing to its enhanced generalization in the visual domain. Despite RL's superior generalization, we show that SFT remains essential for effective RL training; SFT stabilizes the model's output format, enabling subsequent RL to achieve its performance gains. These findings demonstrates the capability of RL for acquiring generalizable knowledge in complex, multi-modal tasks.", - "score": 28, + "score": 29, "issue_id": 1920, "pub_date": "2025-01-28", "pub_date_card": { @@ -70,7 +70,7 @@ "title": "Optimizing Large Language Model Training Using FP4 Quantization", "url": "https://huggingface.co/papers/2501.17116", "abstract": "The growing computational demands of training large language models (LLMs) necessitate more efficient methods. Quantized training presents a promising solution by enabling low-bit arithmetic operations to reduce these costs. While FP8 precision has demonstrated feasibility, leveraging FP4 remains a challenge due to significant quantization errors and limited representational capacity. This work introduces the first FP4 training framework for LLMs, addressing these challenges with two key innovations: a differentiable quantization estimator for precise weight updates and an outlier clamping and compensation strategy to prevent activation collapse. To ensure stability, the framework integrates a mixed-precision training scheme and vector-wise quantization. Experimental results demonstrate that our FP4 framework achieves accuracy comparable to BF16 and FP8, with minimal degradation, scaling effectively to 13B-parameter LLMs trained on up to 100B tokens. With the emergence of next-generation hardware supporting FP4, our framework sets a foundation for efficient ultra-low precision training.", - "score": 13, + "score": 14, "issue_id": 1920, "pub_date": "2025-01-28", "pub_date_card": { @@ -117,99 +117,99 @@ } }, { - "id": "https://huggingface.co/papers/2501.16975", - "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling", - "url": "https://huggingface.co/papers/2501.16975", - "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.", + "id": "https://huggingface.co/papers/2501.16764", + "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation", + "url": "https://huggingface.co/papers/2501.16764", + "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.", "score": 10, - "issue_id": 1920, + "issue_id": 1921, "pub_date": "2025-01-28", "pub_date_card": { "ru": "28 января", "en": "January 28", "zh": "1月28日" }, - "hash": "27930c2f5d17471e", + "hash": "00ee1a0338716711", "authors": [ - "Hongzhi Huang", - "Defa Zhu", - "Banggu Wu", - "Yutao Zeng", - "Ya Wang", - "Qiyang Min", - "Xun Zhou" + "Chenguo Lin", + "Panwang Pan", + "Bangbang Yang", + "Zeming Li", + "Yadong Mu" ], "affiliations": [ - "Seed-Foundation-Model Team, Bytedance" + "ByteDance", + "Peking University" ], - "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg", + "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg", "data": { "categories": [ + "#diffusion", "#optimization", "#training", - "#architecture" + "#dataset", + "#3d" ], - "emoji": "🔤", + "emoji": "🎨", "ru": { - "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей", - "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера." + "title": "DiffSplat: Генерация 3D контента на новом уровне", + "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям." }, "en": { - "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models", - "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers." + "title": "Revolutionizing 3D Generation with DiffSplat", + "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices." }, "zh": { - "title": "分词技术提升大语言模型性能的关键", - "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器,旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明,增大输入词汇表可以有效降低训练损失,从而提高模型性能。我们的实验结果显示,使用更大的输入词汇表可以在不增加成本的情况下,达到与双倍基线相当的性能。" + "title": "DiffSplat:3D生成的新突破", + "desc": "最近,3D内容生成从文本或单张图像中取得了进展,但高质量3D数据集有限,且2D多视图生成存在不一致性。我们提出了DiffSplat,这是一种新颖的3D生成框架,能够通过控制大规模文本到图像的扩散模型,原生生成3D高斯点云。与以往的3D生成模型不同,DiffSplat有效利用了网络规模的2D先验,同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失,DiffSplat在文本和图像条件生成任务中表现出色,且在下游应用中也显示出其优越性。" } } }, { - "id": "https://huggingface.co/papers/2501.16764", - "title": "DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation", - "url": "https://huggingface.co/papers/2501.16764", - "abstract": "Recent advancements in 3D content generation from text or a single image struggle with limited high-quality 3D datasets and inconsistency from 2D multi-view generation. We introduce DiffSplat, a novel 3D generative framework that natively generates 3D Gaussian splats by taming large-scale text-to-image diffusion models. It differs from previous 3D generative models by effectively utilizing web-scale 2D priors while maintaining 3D consistency in a unified model. To bootstrap the training, a lightweight reconstruction model is proposed to instantly produce multi-view Gaussian splat grids for scalable dataset curation. In conjunction with the regular diffusion loss on these grids, a 3D rendering loss is introduced to facilitate 3D coherence across arbitrary views. The compatibility with image diffusion models enables seamless adaptions of numerous techniques for image generation to the 3D realm. Extensive experiments reveal the superiority of DiffSplat in text- and image-conditioned generation tasks and downstream applications. Thorough ablation studies validate the efficacy of each critical design choice and provide insights into the underlying mechanism.", - "score": 8, - "issue_id": 1921, + "id": "https://huggingface.co/papers/2501.16975", + "title": "Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling", + "url": "https://huggingface.co/papers/2501.16975", + "abstract": "Tokenization is a fundamental component of large language models (LLMs), yet its influence on model scaling and performance is not fully explored. In this paper, we introduce Over-Tokenized Transformers, a novel framework that decouples input and output vocabularies to improve language modeling performance. Specifically, our approach scales up input vocabularies to leverage multi-gram tokens. Through extensive experiments, we uncover a log-linear relationship between input vocabulary size and training loss, demonstrating that larger input vocabularies consistently enhance model performance, regardless of model size. Using a large input vocabulary, we achieve performance comparable to double-sized baselines with no additional cost. Our findings highlight the importance of tokenization in scaling laws and provide practical insight for tokenizer design, paving the way for more efficient and powerful LLMs.", + "score": 10, + "issue_id": 1920, "pub_date": "2025-01-28", "pub_date_card": { "ru": "28 января", "en": "January 28", "zh": "1月28日" }, - "hash": "00ee1a0338716711", + "hash": "27930c2f5d17471e", "authors": [ - "Chenguo Lin", - "Panwang Pan", - "Bangbang Yang", - "Zeming Li", - "Yadong Mu" + "Hongzhi Huang", + "Defa Zhu", + "Banggu Wu", + "Yutao Zeng", + "Ya Wang", + "Qiyang Min", + "Xun Zhou" ], "affiliations": [ - "ByteDance", - "Peking University" + "Seed-Foundation-Model Team, Bytedance" ], - "pdf_title_img": "assets/pdf/title_img/2501.16764.jpg", + "pdf_title_img": "assets/pdf/title_img/2501.16975.jpg", "data": { "categories": [ - "#diffusion", "#optimization", "#training", - "#dataset", - "#3d" + "#architecture" ], - "emoji": "🎨", + "emoji": "🔤", "ru": { - "title": "DiffSplat: Генерация 3D контента на новом уровне", - "desc": "DiffSplat - это новая система генерации 3D контента, использующая диффузионные модели для создания трехмерных гауссовых сплатов. Она решает проблемы ограниченных 3D датасетов и несогласованности при мультиракурсной 2D генерации. DiffSplat объединяет масштабные 2D-приоры с 3D-согласованностью, используя легковесную модель реконструкции и специальную функцию потерь. Эксперименты показывают превосходство DiffSplat в задачах генерации по тексту и изображениям." + "title": "Больше токенов - выше эффективность: новый взгляд на масштабирование языковых моделей", + "desc": "Статья представляет новый подход к токенизации в больших языковых моделях, называемый Over-Tokenized Transformers. Авторы предлагают разделить входной и выходной словари, увеличивая размер входного словаря для использования мультиграммных токенов. Исследование выявило логарифмически-линейную зависимость между размером входного словаря и потерями при обучении. Результаты показывают, что увеличение входного словаря consistently улучшает производительность модели независимо от её размера." }, "en": { - "title": "Revolutionizing 3D Generation with DiffSplat", - "desc": "DiffSplat is a new framework for generating 3D content from text or images, addressing challenges like the lack of high-quality 3D datasets. It uses advanced text-to-image diffusion models to create 3D Gaussian splats while ensuring consistency across different views. The framework includes a lightweight reconstruction model that helps quickly generate multi-view datasets for training. Through extensive testing, DiffSplat shows improved performance in generating 3D content and offers insights into its effective design choices." + "title": "Unlocking Performance: The Power of Over-Tokenization in Language Models", + "desc": "This paper presents a new approach called Over-Tokenized Transformers, which focuses on improving the tokenization process in large language models (LLMs). By separating the input and output vocabularies, the authors demonstrate that increasing the input vocabulary size can significantly reduce training loss and enhance model performance. Their experiments reveal a consistent log-linear relationship between the size of the input vocabulary and the model's effectiveness, showing that larger vocabularies lead to better results without increasing computational costs. This research emphasizes the critical role of tokenization in the scaling of LLMs and offers valuable insights for designing more efficient tokenizers." }, "zh": { - "title": "DiffSplat:3D生成的新突破", - "desc": "最近,3D内容生成从文本或单张图像中取得了进展,但高质量3D数据集有限,且2D多视图生成存在不一致性。我们提出了DiffSplat,这是一种新颖的3D生成框架,能够通过控制大规模文本到图像的扩散模型,原生生成3D高斯点云。与以往的3D生成模型不同,DiffSplat有效利用了网络规模的2D先验,同时在统一模型中保持3D一致性。通过引入轻量级重建模型和3D渲染损失,DiffSplat在文本和图像条件生成任务中表现出色,且在下游应用中也显示出其优越性。" + "title": "分词技术提升大语言模型性能的关键", + "desc": "本文探讨了大语言模型中的分词技术对模型性能的影响。我们提出了一种新的框架——过度分词变换器,旨在通过解耦输入和输出词汇表来提升语言建模性能。研究表明,增大输入词汇表可以有效降低训练损失,从而提高模型性能。我们的实验结果显示,使用更大的输入词汇表可以在不增加成本的情况下,达到与双倍基线相当的性能。" } } }, @@ -218,7 +218,7 @@ "title": "Open Problems in Mechanistic Interpretability", "url": "https://huggingface.co/papers/2501.16496", "abstract": "Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater assurance over AI system behavior and shed light on exciting scientific questions about the nature of intelligence. Despite recent progress toward these goals, there are many open problems in the field that require solutions before many scientific and practical benefits can be realized: Our methods require both conceptual and practical improvements to reveal deeper insights; we must figure out how best to apply our methods in pursuit of specific goals; and the field must grapple with socio-technical challenges that influence and are influenced by our work. This forward-facing review discusses the current frontier of mechanistic interpretability and the open problems that the field may benefit from prioritizing.", - "score": 8, + "score": 9, "issue_id": 1920, "pub_date": "2025-01-27", "pub_date_card": { @@ -445,18 +445,18 @@ } } ], - "link_prev": "2025-01-28.html", - "link_next": "2025-01-30.html", + "link_prev": "2025-01-29.html", + "link_next": "2025-01-31.html", "link_month": "2025-01.html", "short_date_prev": { - "ru": "28.01", - "en": "01/28", - "zh": "1月28日" + "ru": "29.01", + "en": "01/29", + "zh": "1月29日" }, "short_date_next": { - "ru": "30.01", - "en": "01/30", - "zh": "1月30日" + "ru": "31.01", + "en": "01/31", + "zh": "1月31日" }, "categories": { "#dataset": 2, diff --git a/index.html b/index.html index b5910dbb7..e261d103f 100644 --- a/index.html +++ b/index.html @@ -10,7 +10,7 @@ gtag('config', 'G-C1CRWDNJ1J'); - HF. 8 papers. January 29. + HF. 8 papers. January 30. @@ -765,7 +765,7 @@

🔺

hf daily

-

29 января | 8 papers

+

30 января | 8 papers