Auto. Make Doomgrad HF Review on 17 January

averkij · Jan 17, 2025 · 401c03a · 401c03a
1 parent de9ca21
commit 401c03a
Show file tree

Hide file tree

Showing 7 changed files with 193 additions and 193 deletions.
diff --git a/d/2025-01-17.html b/d/2025-01-17.html
diff --git a/d/2025-01-17.json b/d/2025-01-17.json
@@ -4,9 +4,9 @@
         "en": "January 17",
         "zh": "1月17日"
     },
-    "time_utc": "2025-01-17 20:10",
+    "time_utc": "2025-01-17 21:08",
     "weekday": 4,
-    "issue_id": 1735,
+    "issue_id": 1736,
     "home_page_url": "https://huggingface.co/papers",
     "papers": [
         {
@@ -68,7 +68,7 @@
             "title": "Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps",
             "url": "https://huggingface.co/papers/2501.09732",
             "abstract": "Generative models have made significant impacts across various domains, largely due to their ability to scale during training by increasing data, computational resources, and model size, a phenomenon characterized by the scaling laws. Recent research has begun to explore inference-time scaling behavior in Large Language Models (LLMs), revealing how performance can further improve with additional computation during inference. Unlike LLMs, diffusion models inherently possess the flexibility to adjust inference-time computation via the number of denoising steps, although the performance gains typically flatten after a few dozen. In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation. Specifically, we consider a search problem aimed at identifying better noises for the diffusion sampling process. We structure the design space along two axes: the verifiers used to provide feedback, and the algorithms used to find better noise candidates. Through extensive experiments on class-conditioned and text-conditioned image generation benchmarks, our findings reveal that increasing inference-time compute leads to substantial improvements in the quality of samples generated by diffusion models, and with the complicated nature of images, combinations of the components in the framework can be specifically chosen to conform with different application scenario.",
-            "score": 24,
+            "score": 25,
             "issue_id": 1720,
             "pub_date": "2025-01-16",
             "pub_date_card": {
@@ -341,7 +341,7 @@
             "title": "Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models",
             "url": "https://huggingface.co/papers/2501.09686",
             "abstract": "Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of \"thought\" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to \"think\" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.",
-            "score": 9,
+            "score": 10,
             "issue_id": 1720,
             "pub_date": "2025-01-16",
             "pub_date_card": {
@@ -449,6 +449,53 @@
                 }
             }
         },
+        {
+            "id": "https://huggingface.co/papers/2501.09433",
+            "title": "CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation",
+            "url": "https://huggingface.co/papers/2501.09433",
+            "abstract": "The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.",
+            "score": 7,
+            "issue_id": 1721,
+            "pub_date": "2025-01-16",
+            "pub_date_card": {
+                "ru": "16 января",
+                "en": "January 16",
+                "zh": "1月16日"
+            },
+            "hash": "8c7a54f21e46af7a",
+            "authors": [
+                "Hwan Heo",
+                "Jangyeong Kim",
+                "Seongyeong Lee",
+                "Jeong A Wi",
+                "Junyoung Choi",
+                "Sangjun Ahn"
+            ],
+            "affiliations": [
+                "Graphics AI Lab, NC Research"
+            ],
+            "pdf_title_img": "assets/pdf/title_img/2501.09433.jpg",
+            "data": {
+                "categories": [
+                    "#diffusion",
+                    "#3d",
+                    "#optimization"
+                ],
+                "emoji": "🎨",
+                "ru": {
+                    "title": "CaPa: Революция в генерации 3D-моделей",
+                    "desc": "В статье представлен CaPa - фреймворк для генерации высококачественных 3D-моделей. Он использует двухэтапный процесс, разделяя создание геометрии и текстур с помощью латентной диффузионной модели и пространственно-разделенного внимания. CaPa также предлагает алгоритм для заполнения нетекстурированных областей, обеспечивая целостность результатов. Фреймворк генерирует 3D-модели менее чем за 30 секунд, превосходя аналоги по качеству текстур и стабильности геометрии."
+                },
+                "en": {
+                    "title": "CaPa: Fast and High-Fidelity 3D Asset Generation",
+                    "desc": "This paper presents CaPa, a novel framework for generating high-quality 3D assets from textual or visual inputs. It addresses common challenges in 3D generation, such as multi-view inconsistency and slow generation times, by separating geometry generation from texture synthesis. The framework utilizes a 3D latent diffusion model for consistent geometry creation and a Spatially Decoupled Attention mechanism for high-resolution texture synthesis. CaPa also includes a 3D-aware occlusion inpainting algorithm to enhance the final output, achieving high fidelity and stability in under 30 seconds."
+                },
+                "zh": {
+                    "title": "高效生成高保真3D资产的CaPa框架",
+                    "desc": "本论文介绍了一种名为CaPa的框架，用于高效生成高保真度的3D资产。该框架采用两阶段的过程，将几何体生成与纹理合成解耦。首先，使用3D潜在扩散模型生成几何体，确保多视角之间的结构一致性。然后，通过一种新颖的空间解耦注意力机制合成高分辨率纹理，并提出了3D感知的遮挡修复算法，最终在30秒内生成高质量的3D资产。"
+                }
+            }
+        },
         {
             "id": "https://huggingface.co/papers/2501.08617",
             "title": "RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation",
@@ -544,53 +591,6 @@
                 }
             }
         },
-        {
-            "id": "https://huggingface.co/papers/2501.09433",
-            "title": "CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation",
-            "url": "https://huggingface.co/papers/2501.09433",
-            "abstract": "The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.",
-            "score": 6,
-            "issue_id": 1721,
-            "pub_date": "2025-01-16",
-            "pub_date_card": {
-                "ru": "16 января",
-                "en": "January 16",
-                "zh": "1月16日"
-            },
-            "hash": "8c7a54f21e46af7a",
-            "authors": [
-                "Hwan Heo",
-                "Jangyeong Kim",
-                "Seongyeong Lee",
-                "Jeong A Wi",
-                "Junyoung Choi",
-                "Sangjun Ahn"
-            ],
-            "affiliations": [
-                "Graphics AI Lab, NC Research"
-            ],
-            "pdf_title_img": "assets/pdf/title_img/2501.09433.jpg",
-            "data": {
-                "categories": [
-                    "#diffusion",
-                    "#3d",
-                    "#optimization"
-                ],
-                "emoji": "🎨",
-                "ru": {
-                    "title": "CaPa: Революция в генерации 3D-моделей",
-                    "desc": "В статье представлен CaPa - фреймворк для генерации высококачественных 3D-моделей. Он использует двухэтапный процесс, разделяя создание геометрии и текстур с помощью латентной диффузионной модели и пространственно-разделенного внимания. CaPa также предлагает алгоритм для заполнения нетекстурированных областей, обеспечивая целостность результатов. Фреймворк генерирует 3D-модели менее чем за 30 секунд, превосходя аналоги по качеству текстур и стабильности геометрии."
-                },
-                "en": {
-                    "title": "CaPa: Fast and High-Fidelity 3D Asset Generation",
-                    "desc": "This paper presents CaPa, a novel framework for generating high-quality 3D assets from textual or visual inputs. It addresses common challenges in 3D generation, such as multi-view inconsistency and slow generation times, by separating geometry generation from texture synthesis. The framework utilizes a 3D latent diffusion model for consistent geometry creation and a Spatially Decoupled Attention mechanism for high-resolution texture synthesis. CaPa also includes a 3D-aware occlusion inpainting algorithm to enhance the final output, achieving high fidelity and stability in under 30 seconds."
-                },
-                "zh": {
-                    "title": "高效生成高保真3D资产的CaPa框架",
-                    "desc": "本论文介绍了一种名为CaPa的框架，用于高效生成高保真度的3D资产。该框架采用两阶段的过程，将几何体生成与纹理合成解耦。首先，使用3D潜在扩散模型生成几何体，确保多视角之间的结构一致性。然后，通过一种新颖的空间解耦注意力机制合成高分辨率纹理，并提出了3D感知的遮挡修复算法，最终在30秒内生成高质量的3D资产。"
-                }
-            }
-        },
         {
             "id": "https://huggingface.co/papers/2501.09038",
             "title": "Do generative video models learn physical principles from watching videos?",