Auto. Make Doomgrad HF Review on 28 January

averkij · Jan 28, 2025 · 7f14cb4 · 7f14cb4
1 parent 5e5b460
commit 7f14cb4
Show file tree

Hide file tree

Showing 7 changed files with 389 additions and 463 deletions.
diff --git a/d/2025-01-28.html b/d/2025-01-28.html
diff --git a/d/2025-01-28.json b/d/2025-01-28.json
diff --git a/hf_papers.json b/hf_papers.json
@@ -4,17 +4,17 @@
         "en": "January 28",
         "zh": "1月28日"
     },
-    "time_utc": "2025-01-28 20:10",
+    "time_utc": "2025-01-28 21:09",
     "weekday": 1,
-    "issue_id": 1912,
+    "issue_id": 1913,
     "home_page_url": "https://huggingface.co/papers",
     "papers": [
         {
             "id": "https://huggingface.co/papers/2501.15368",
             "title": "Baichuan-Omni-1.5 Technical Report",
             "url": "https://huggingface.co/papers/2501.15368",
             "abstract": "We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.",
-            "score": 34,
+            "score": 35,
             "issue_id": 1898,
             "pub_date": "2025-01-26",
             "pub_date_card": {
@@ -462,6 +462,57 @@
                 }
             }
         },
+        {
+            "id": "https://huggingface.co/papers/2501.14723",
+            "title": "CodeMonkeys: Scaling Test-Time Compute for Software Engineering",
+            "url": "https://huggingface.co/papers/2501.14723",
+            "abstract": "Scaling test-time compute is a promising axis for improving LLM capabilities. However, test-time compute can be scaled in a variety of ways, and effectively combining different approaches remains an active area of research. Here, we explore this problem in the context of solving real-world GitHub issues from the SWE-bench dataset. Our system, named CodeMonkeys, allows models to iteratively edit a codebase by jointly generating and running a testing script alongside their draft edit. We sample many of these multi-turn trajectories for every issue to generate a collection of candidate edits. This approach lets us scale \"serial\" test-time compute by increasing the number of iterations per trajectory and \"parallel\" test-time compute by increasing the number of trajectories per problem. With parallel scaling, we can amortize up-front costs across multiple downstream samples, allowing us to identify relevant codebase context using the simple method of letting an LLM read every file. In order to select between candidate edits, we combine voting using model-generated tests with a final multi-turn trajectory dedicated to selection. Overall, CodeMonkeys resolves 57.4% of issues from SWE-bench Verified using a budget of approximately 2300 USD. Our selection method can also be used to combine candidates from different sources. Selecting over an ensemble of edits from existing top SWE-bench Verified submissions obtains a score of 66.2% and outperforms the best member of the ensemble on its own. We fully release our code and data at https://scalingintelligence.stanford.edu/pubs/codemonkeys.",
+            "score": 3,
+            "issue_id": 1912,
+            "pub_date": "2025-01-24",
+            "pub_date_card": {
+                "ru": "24 января",
+                "en": "January 24",
+                "zh": "1月24日"
+            },
+            "hash": "0aee5401febd2bf6",
+            "authors": [
+                "Ryan Ehrlich",
+                "Bradley Brown",
+                "Jordan Juravsky",
+                "Ronald Clark",
+                "Christopher Ré",
+                "Azalia Mirhoseini"
+            ],
+            "affiliations": [
+                "Department of Computer Science, Stanford University",
+                "University of Oxford"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.14723.jpg",
+            "data": {
+                "categories": [
+                    "#data",
+                    "#optimization",
+                    "#training",
+                    "#dataset",
+                    "#plp",
+                    "#open_source"
+                ],
+                "emoji": "🐒",
+                "ru": {
+                    "title": "CodeMonkeys: Масштабирование вычислений LLM для решения реальных задач программирования",
+                    "desc": "Статья представляет систему CodeMonkeys для решения реальных проблем GitHub с помощью больших языковых моделей (LLM). Система позволяет моделям итеративно редактировать кодовую базу, генерируя и запуская тестовые скрипты вместе с черновыми правками. CodeMonkeys использует как последовательное, так и параллельное масштабирование вычислений во время тестирования, что позволяет эффективно идентифицировать релевантный контекст кодовой базы. Метод выбора кандидатов на основе голосования и финальной многоходовой траектории позволил системе решить 57.4% проблем из набора данных SWE-bench Verified."
+                },
+                "en": {
+                    "title": "Enhancing LLMs with Scalable Test-Time Compute for Code Editing",
+                    "desc": "This paper presents CodeMonkeys, a system designed to enhance the capabilities of large language models (LLMs) by scaling test-time compute during code editing tasks. It combines iterative code generation with testing script execution, allowing models to refine their edits through multiple iterations and trajectories. By leveraging both serial and parallel scaling, CodeMonkeys efficiently identifies relevant code context and selects the best candidate edits through a voting mechanism. The system demonstrates effectiveness by resolving over 57% of real-world GitHub issues while optimizing resource usage, and it shows improved performance when combining edits from various sources."
+                },
+                "zh": {
+                    "title": "通过CodeMonkeys提升代码编辑能力",
+                    "desc": "本文探讨了如何通过扩展测试时计算来提升大型语言模型（LLM）的能力。我们提出了一个名为CodeMonkeys的系统，它可以通过生成和运行测试脚本来迭代编辑代码库，从而解决实际的GitHub问题。该方法通过增加每个问题的迭代次数和轨迹数量，实现了串行和并行的测试时计算扩展。最终，CodeMonkeys成功解决了57.4%的问题，并且我们的选择方法也能有效结合来自不同来源的候选编辑。"
+                }
+            }
+        },
         {
             "id": "https://huggingface.co/papers/2403.09193",
             "title": "Are Vision Language Models Texture or Shape Biased and Can We Steer Them?",
@@ -711,57 +762,6 @@
                 }
             }
         },
-        {
-            "id": "https://huggingface.co/papers/2501.14723",
-            "title": "CodeMonkeys: Scaling Test-Time Compute for Software Engineering",
-            "url": "https://huggingface.co/papers/2501.14723",
-            "abstract": "Scaling test-time compute is a promising axis for improving LLM capabilities. However, test-time compute can be scaled in a variety of ways, and effectively combining different approaches remains an active area of research. Here, we explore this problem in the context of solving real-world GitHub issues from the SWE-bench dataset. Our system, named CodeMonkeys, allows models to iteratively edit a codebase by jointly generating and running a testing script alongside their draft edit. We sample many of these multi-turn trajectories for every issue to generate a collection of candidate edits. This approach lets us scale \"serial\" test-time compute by increasing the number of iterations per trajectory and \"parallel\" test-time compute by increasing the number of trajectories per problem. With parallel scaling, we can amortize up-front costs across multiple downstream samples, allowing us to identify relevant codebase context using the simple method of letting an LLM read every file. In order to select between candidate edits, we combine voting using model-generated tests with a final multi-turn trajectory dedicated to selection. Overall, CodeMonkeys resolves 57.4% of issues from SWE-bench Verified using a budget of approximately 2300 USD. Our selection method can also be used to combine candidates from different sources. Selecting over an ensemble of edits from existing top SWE-bench Verified submissions obtains a score of 66.2% and outperforms the best member of the ensemble on its own. We fully release our code and data at https://scalingintelligence.stanford.edu/pubs/codemonkeys.",
-            "score": 1,
-            "issue_id": 1912,
-            "pub_date": "2025-01-24",
-            "pub_date_card": {
-                "ru": "24 января",
-                "en": "January 24",
-                "zh": "1月24日"
-            },
-            "hash": "0aee5401febd2bf6",
-            "authors": [
-                "Ryan Ehrlich",
-                "Bradley Brown",
-                "Jordan Juravsky",
-                "Ronald Clark",
-                "Christopher Ré",
-                "Azalia Mirhoseini"
-            ],
-            "affiliations": [
-                "Department of Computer Science, Stanford University",
-                "University of Oxford"
-            ],
-            "pdf_title_img": "assets/pdf/title_img/2501.14723.jpg",
-            "data": {
-                "categories": [
-                    "#data",
-                    "#optimization",
-                    "#training",
-                    "#dataset",
-                    "#plp",
-                    "#open_source"
-                ],
-                "emoji": "🐒",
-                "ru": {
-                    "title": "CodeMonkeys: Масштабирование вычислений LLM для решения реальных задач программирования",
-                    "desc": "Статья представляет систему CodeMonkeys для решения реальных проблем GitHub с помощью больших языковых моделей (LLM). Система позволяет моделям итеративно редактировать кодовую базу, генерируя и запуская тестовые скрипты вместе с черновыми правками. CodeMonkeys использует как последовательное, так и параллельное масштабирование вычислений во время тестирования, что позволяет эффективно идентифицировать релевантный контекст кодовой базы. Метод выбора кандидатов на основе голосования и финальной многоходовой траектории позволил системе решить 57.4% проблем из набора данных SWE-bench Verified."
-                },
-                "en": {
-                    "title": "Enhancing LLMs with Scalable Test-Time Compute for Code Editing",
-                    "desc": "This paper presents CodeMonkeys, a system designed to enhance the capabilities of large language models (LLMs) by scaling test-time compute during code editing tasks. It combines iterative code generation with testing script execution, allowing models to refine their edits through multiple iterations and trajectories. By leveraging both serial and parallel scaling, CodeMonkeys efficiently identifies relevant code context and selects the best candidate edits through a voting mechanism. The system demonstrates effectiveness by resolving over 57% of real-world GitHub issues while optimizing resource usage, and it shows improved performance when combining edits from various sources."
-                },
-                "zh": {
-                    "title": "通过CodeMonkeys提升代码编辑能力",
-                    "desc": "本文探讨了如何通过扩展测试时计算来提升大型语言模型（LLM）的能力。我们提出了一个名为CodeMonkeys的系统，它可以通过生成和运行测试脚本来迭代编辑代码库，从而解决实际的GitHub问题。该方法通过增加每个问题的迭代次数和轨迹数量，实现了串行和并行的测试时计算扩展。最终，CodeMonkeys成功解决了57.4%的问题，并且我们的选择方法也能有效结合来自不同来源的候选编辑。"
-                }
-            }
-        },
         {
             "id": "https://huggingface.co/papers/2501.15420",
             "title": "Visual Generation Without Guidance",

diff --git a/index.html b/index.html
diff --git a/log.txt b/log.txt
@@ -1,3 +1,3 @@
-[28.01.2025 20:11] Read previous papers.
-[28.01.2025 20:11] Generating top page (month).
-[28.01.2025 20:11] Writing top page (month).
+[28.01.2025 21:09] Read previous papers.
+[28.01.2025 21:09] Generating top page (month).
+[28.01.2025 21:09] Writing top page (month).