Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new paper: #47

Open
wyzh0912 opened this issue Feb 23, 2025 · 0 comments
Open

Add new paper: #47

wyzh0912 opened this issue Feb 23, 2025 · 0 comments

Comments

@wyzh0912
Copy link
Contributor

Title

Do Attention Heads Compete or Cooperate during Counting?

Published Date

2025-02-10

Source

arXiv

Head Name

Mixed Head

Summary

  • Innovation: The paper investigates the behavior of attention heads in transformers when performing a counting task, revealing that attention heads act as a pseudo-ensemble, each solving the same subtask (i.e., counting), but requiring non-uniform aggregation to conform to task syntax.

  • Tasks: The study involves training small transformers on the Count01 language, a simple counting task where the model determines if a string has more '1's than '0's, with '2's as noise. The analysis focuses on understanding whether attention heads compete or cooperate during this task.

  • Significant Result: The research shows that while attention heads individually learn to solve the counting task, they do so as a pseudo-ensemble. The output layer must aggregate their outputs non-uniformly to address both the main task (counting) and an auxiliary syntactic task (ending the sentence).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant