Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadamard transform in the mlp_output #63

Open
seamoonlight-YBY opened this issue Jan 24, 2025 · 0 comments
Open

Hadamard transform in the mlp_output #63

seamoonlight-YBY opened this issue Jan 24, 2025 · 0 comments

Comments

@seamoonlight-YBY
Copy link

Hi,
I would like to ask you about an issue regarding the Hadamard transform in the mlp_output.

In your paper, you mentioned
We first insert a Hadamard operation into the feed-forward network, before the down-projection matrix. This operation is performed in full precision, and implemented using a fast kernel following Tseng et al. [2024]. This operation is implicitly reversed by fusing a Hadamard matrix into the down-projection matrix of the network: Wdown ← HWdown

I understand that the code implementation for this is in function"apply_exact_had_to_linear".

Image

Specifically, in line 99, an online Hadamard transform is performed (since an activation function is present, it is not possible to insert a Hadamard matrix earlier to cancel each other out).
Then, in line 100, to "implicitly reverse", a Hadamard matrix is multiplied.
I believe this already achieves the purpose of reducing incoherence while keeping the output unchanged.

However, why is the ActQuantizer.fp32_had for the down_proj layer set to True during inference, lead to another Hadamard transform applied during the forward process?

Image

This is the part I don't understand, and I look forward to your explanation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant