Skip to content

Commit

Permalink
Auto. Make Doomgrad HF Review on 30 January
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jan 30, 2025
1 parent 043dbd5 commit 67228ca
Show file tree
Hide file tree
Showing 8 changed files with 287 additions and 234 deletions.
14 changes: 7 additions & 7 deletions d/2025-01-30.html

Large diffs are not rendered by default.

64 changes: 56 additions & 8 deletions d/2025-01-30.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,17 @@
"en": "January 30",
"zh": "1月30日"
},
"time_utc": "2025-01-30 07:09",
"time_utc": "2025-01-30 08:12",
"weekday": 3,
"issue_id": 1944,
"issue_id": 1945,
"home_page_url": "https://huggingface.co/papers",
"papers": [
{
"id": "https://huggingface.co/papers/2501.17703",
"title": "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate",
"url": "https://huggingface.co/papers/2501.17703",
"abstract": "Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we challenge this paradigm and propose Critique Fine-Tuning (CFT), a strategy where models learn to critique noisy responses rather than simply imitate correct ones. Inspired by human learning processes that emphasize critical thinking, CFT encourages deeper analysis and nuanced understanding-traits often overlooked by standard SFT. To validate the effectiveness of CFT, we construct a 50K-sample dataset from WebInstruct, using GPT-4o as the teacher to generate critiques in the form of (input=[query; noisy response], output=critique). CFT on this dataset yields a consistent 4-10% improvement over SFT on six math benchmarks with different base models like Qwen2.5, Qwen2.5-Math and DeepSeek-Math. We further expand to MetaMath and NuminaMath datasets and observe similar gains over SFT. Notably, our Qwen2.5-Math-CFT model-trained on just 50K samples-matches or outperforms competitive models such as AceMath and Qwen2.5-Math-Instruct on most benchmarks, both of which use over 2M samples. Ablation studies show that CFT is robust to the source of noisy response and teacher critique model. Through these findings, we argue that critique-based training offers a more effective alternative to advance the reasoning of language models.",
"score": 9,
"score": 10,
"issue_id": 1940,
"pub_date": "2025-01-29",
"pub_date_card": {
Expand Down Expand Up @@ -57,12 +57,60 @@
}
}
},
{
"id": "https://huggingface.co/papers/2501.14334",
"title": "Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts",
"url": "https://huggingface.co/papers/2501.14334",
"abstract": "The rapid growth of artificial intelligence (AI), particularly Large Language Models (LLMs), has raised concerns regarding its global environmental impact that extends beyond greenhouse gas emissions to include consideration of hardware fabrication and end-of-life processes. The opacity from major providers hinders companies' abilities to evaluate their AI-related environmental impacts and achieve net-zero targets. In this paper, we propose a methodology to estimate the environmental impact of a company's AI portfolio, providing actionable insights without necessitating extensive AI and Life-Cycle Assessment (LCA) expertise. Results confirm that large generative AI models consume up to 4600x more energy than traditional models. Our modelling approach, which accounts for increased AI usage, hardware computing efficiency, and changes in electricity mix in line with IPCC scenarios, forecasts AI electricity use up to 2030. Under a high adoption scenario, driven by widespread Generative AI and agents adoption associated to increasingly complex models and frameworks, AI electricity use is projected to rise by a factor of 24.4. Mitigating the environmental impact of Generative AI by 2030 requires coordinated efforts across the AI value chain. Isolated measures in hardware efficiency, model efficiency, or grid improvements alone are insufficient. We advocate for standardized environmental assessment frameworks, greater transparency from the all actors of the value chain and the introduction of a \"Return on Environment\" metric to align AI development with net-zero goals.",
"score": 4,
"issue_id": 1945,
"pub_date": "2025-01-24",
"pub_date_card": {
"ru": "24 января",
"en": "January 24",
"zh": "1月24日"
},
"hash": "af6e1a0fd9d77530",
"authors": [
"Clément Desroches",
"Martin Chauvin",
"Louis Ladan",
"Caroline Vateau",
"Simon Gosset",
"Philippe Cordier"
],
"affiliations": [
"Capgemini Invent 145 quai du Président Roosevelt, 92130 Issy Les Moulineaux, France"
],
"pdf_title_img": "assets/pdf/title_img/2501.14334.jpg",
"data": {
"categories": [
"#agents",
"#data",
"#benchmark",
"#ethics"
],
"emoji": "🌱",
"ru": {
"title": "Зеленый ИИ: путь к устойчивому будущему технологий",
"desc": "Статья рассматривает экологическое воздействие искусственного интеллекта, особенно больших языковых моделей (LLM). Авторы предлагают методологию для оценки экологического следа AI-портфеля компании, не требующую глубоких знаний в области ИИ и анализа жизненного цикла. Результаты показывают, что генеративные модели ИИ потребляют до 4600 раз больше энергии, чем традиционные, а к 2030 году при высоком уровне внедрения использование электроэнергии ИИ может вырасти в 24,4 раза. Для смягчения экологического воздействия генеративного ИИ авторы призывают к стандартизации оценки, большей прозрачности и введению метрики 'возврата на окружающую среду'."
},
"en": {
"title": "Assessing AI's Environmental Footprint for a Sustainable Future",
"desc": "This paper addresses the environmental impact of artificial intelligence, especially focusing on Large Language Models (LLMs). It highlights that these models can consume significantly more energy than traditional models, with estimates showing up to 4600 times higher energy use. The authors propose a methodology for companies to assess their AI portfolio's environmental impact, making it easier to achieve net-zero targets without needing deep expertise in AI or Life-Cycle Assessment. They emphasize the need for coordinated efforts across the AI value chain and advocate for standardized frameworks and transparency to mitigate the environmental effects of Generative AI by 2030."
},
"zh": {
"title": "推动AI可持续发展,保护环境未来",
"desc": "这篇论文探讨了人工智能,特别是大型语言模型(LLMs)对环境的影响,包括硬件制造和生命周期结束的过程。研究表明,大型生成性AI模型的能耗是传统模型的4600倍。为了评估公司的AI投资组合的环境影响,论文提出了一种方法论,能够在不需要深入的AI和生命周期评估(LCA)专业知识的情况下提供可行的见解。为了在2030年前减轻生成性AI的环境影响,需要在AI价值链的各个环节进行协调努力,而不仅仅依靠硬件效率或模型效率的单一措施。"
}
}
},
{
"id": "https://huggingface.co/papers/2501.17749",
"title": "Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation",
"url": "https://huggingface.co/papers/2501.17749",
"abstract": "Large Language Models (LLMs) have become an integral part of our daily lives. However, they impose certain risks, including those that can harm individuals' privacy, perpetuate biases and spread misinformation. These risks highlight the need for robust safety mechanisms, ethical guidelines, and thorough testing to ensure their responsible deployment. Safety of LLMs is a key property that needs to be thoroughly tested prior the model to be deployed and accessible to the general users. This paper reports the external safety testing experience conducted by researchers from Mondragon University and University of Seville on OpenAI's new o3-mini LLM as part of OpenAI's early access for safety testing program. In particular, we apply our tool, ASTRAL, to automatically and systematically generate up to date unsafe test inputs (i.e., prompts) that helps us test and assess different safety categories of LLMs. We automatically generate and execute a total of 10,080 unsafe test input on a early o3-mini beta version. After manually verifying the test cases classified as unsafe by ASTRAL, we identify a total of 87 actual instances of unsafe LLM behavior. We highlight key insights and findings uncovered during the pre-deployment external testing phase of OpenAI's latest LLM.",
"score": 3,
"score": 4,
"issue_id": 1940,
"pub_date": "2025-01-29",
"pub_date_card": {
Expand Down Expand Up @@ -230,9 +278,9 @@
},
"categories": {
"#dataset": 2,
"#data": 3,
"#benchmark": 0,
"#agents": 0,
"#data": 4,
"#benchmark": 1,
"#agents": 1,
"#cv": 0,
"#rl": 0,
"#rlhf": 1,
Expand All @@ -255,7 +303,7 @@
"#reasoning": 1,
"#transfer_learning": 0,
"#graphs": 0,
"#ethics": 2,
"#ethics": 3,
"#security": 2,
"#optimization": 2,
"#survey": 0,
Expand Down
Loading

0 comments on commit 67228ca

Please sign in to comment.