You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I would like to ask you about an issue regarding the Hadamard transform in the mlp_output.
In your paper, you mentioned “We first insert a Hadamard operation into the feed-forward network, before the down-projection matrix. This operation is performed in full precision, and implemented using a fast kernel following Tseng et al. [2024]. This operation is implicitly reversed by fusing a Hadamard matrix into the down-projection matrix of the network: Wdown ← HWdown”
I understand that the code implementation for this is in function"apply_exact_had_to_linear".
Specifically, in line 99, an online Hadamard transform is performed (since an activation function is present, it is not possible to insert a Hadamard matrix earlier to cancel each other out).
Then, in line 100, to "implicitly reverse", a Hadamard matrix is multiplied.
I believe this already achieves the purpose of reducing incoherence while keeping the output unchanged.
However, why is the ActQuantizer.fp32_had for the down_proj layer set to True during inference, lead to another Hadamard transform applied during the forward process?
This is the part I don't understand, and I look forward to your explanation!
The text was updated successfully, but these errors were encountered:
Hi,
I would like to ask you about an issue regarding the Hadamard transform in the mlp_output.
In your paper, you mentioned
“We first insert a Hadamard operation into the feed-forward network, before the down-projection matrix. This operation is performed in full precision, and implemented using a fast kernel following Tseng et al. [2024]. This operation is implicitly reversed by fusing a Hadamard matrix into the down-projection matrix of the network: Wdown ← HWdown”
I understand that the code implementation for this is in function"apply_exact_had_to_linear".
Specifically, in line 99, an online Hadamard transform is performed (since an activation function is present, it is not possible to insert a Hadamard matrix earlier to cancel each other out).
Then, in line 100, to "implicitly reverse", a Hadamard matrix is multiplied.
I believe this already achieves the purpose of reducing incoherence while keeping the output unchanged.
However, why is the ActQuantizer.fp32_had for the down_proj layer set to True during inference, lead to another Hadamard transform applied during the forward process?
This is the part I don't understand, and I look forward to your explanation!
The text was updated successfully, but these errors were encountered: