Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #40

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #40

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

Title

Language Models Use Trigonometry to Do Addition

Published Date

2025-02-02

Source

arXiv

Head Name

Arithmetic Head

Summary

  • Innovation: The core innovation of the paper is the discovery that large language models (LLMs) represent numbers as generalized helices and use a "Clock" algorithm to perform addition by manipulating these helices, providing the first representation-level explanation of an LLM's mathematical capability.

  • Tasks: The study involves reverse engineering how three mid-sized LLMs (GPT-J, Pythia-6.9B, and Llama3.1-8B) compute addition for integers in the range [0, 99] by analyzing model components like MLPs, attention heads, and neurons, and verifying the findings through causal interventions.

  • Significant Result: The research reveals that LLMs utilize a helical representation of numbers and manipulate these helices to compute addition, with the "Clock" algorithm being central to this computation, effectively demonstrating that LLMs use complex mathematical algorithms learned from general text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant