Hadamard transform in the mlp_output #63

seamoonlight-YBY · 2025-01-24T08:37:34Z

Hi，
I would like to ask you about an issue regarding the Hadamard transform in the mlp_output.

In your paper, you mentioned
“We first insert a Hadamard operation into the feed-forward network, before the down-projection matrix. This operation is performed in full precision, and implemented using a fast kernel following Tseng et al. [2024]. This operation is implicitly reversed by fusing a Hadamard matrix into the down-projection matrix of the network: Wdown ← HWdown”

I understand that the code implementation for this is in function"apply_exact_had_to_linear".

Specifically, in line 99, an online Hadamard transform is performed (since an activation function is present, it is not possible to insert a Hadamard matrix earlier to cancel each other out).
Then, in line 100, to "implicitly reverse", a Hadamard matrix is multiplied.
I believe this already achieves the purpose of reducing incoherence while keeping the output unchanged.

However, why is the ActQuantizer.fp32_had for the down_proj layer set to True during inference, lead to another Hadamard transform applied during the forward process?

This is the part I don't understand, and I look forward to your explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadamard transform in the mlp_output #63

Hadamard transform in the mlp_output #63

seamoonlight-YBY commented Jan 24, 2025

Hadamard transform in the mlp_output #63

Hadamard transform in the mlp_output #63

Comments

seamoonlight-YBY commented Jan 24, 2025