You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Innovation: The core innovation of the paper is the discovery that large language models (LLMs) represent numbers as generalized helices and use a "Clock" algorithm to perform addition by manipulating these helices, providing the first representation-level explanation of an LLM's mathematical capability.
Tasks: The study involves reverse engineering how three mid-sized LLMs (GPT-J, Pythia-6.9B, and Llama3.1-8B) compute addition for integers in the range [0, 99] by analyzing model components like MLPs, attention heads, and neurons, and verifying the findings through causal interventions.
Significant Result: The research reveals that LLMs utilize a helical representation of numbers and manipulate these helices to compute addition, with the "Clock" algorithm being central to this computation, effectively demonstrating that LLMs use complex mathematical algorithms learned from general text.
The text was updated successfully, but these errors were encountered:
Title
Language Models Use Trigonometry to Do Addition
Published Date
2025-02-02
Source
arXiv
Head Name
Arithmetic Head
Summary
Innovation: The core innovation of the paper is the discovery that large language models (LLMs) represent numbers as generalized helices and use a "Clock" algorithm to perform addition by manipulating these helices, providing the first representation-level explanation of an LLM's mathematical capability.
Tasks: The study involves reverse engineering how three mid-sized LLMs (GPT-J, Pythia-6.9B, and Llama3.1-8B) compute addition for integers in the range [0, 99] by analyzing model components like MLPs, attention heads, and neurons, and verifying the findings through causal interventions.
Significant Result: The research reveals that LLMs utilize a helical representation of numbers and manipulate these helices to compute addition, with the "Clock" algorithm being central to this computation, effectively demonstrating that LLMs use complex mathematical algorithms learned from general text.
The text was updated successfully, but these errors were encountered: