Implementation issues of the Efficient Self-Attention module #148

K1t3 · 2024-05-03T10:18:01Z

Firstly, thank you to all the authors for their impressive work. SegFormer has indeed demonstrated extraordinary performance.

When studying the code carefully, I noticed that you mentioned in the article that the Efficient Self Attention module reshaped the $K$ matrix and utilized the linear projection to reduce the number of parameters by $R$ times, and set different reduction rates at each stage ($[64, 16, 4, 1]$ from stage-1 to stage-4). I guess this part of the code is located in the Attention class, but after reading the code multiple times, I reckon that there is only code for the ordinary multi-head self-attention mechanism in the code block, and no code implementation that matches the concept of the Efficient Self Attention module in the article has been found.

Did I misunderstand or was there a missing part in the code? I hope to receive your answer, which is very important to me. THX!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation issues of the Efficient Self-Attention module #148

Implementation issues of the Efficient Self-Attention module #148

K1t3 commented May 3, 2024

Implementation issues of the Efficient Self-Attention module #148

Implementation issues of the Efficient Self-Attention module #148

Comments

K1t3 commented May 3, 2024