You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about the training process of TS-CAM.
During training, you do not use the cls_token to conduct classification, use remaining tokens instead.
In this circumstance, can the first row of the attention weight (the attention for the cls_token) reflect the relative importance of differnt patches for classification accuractly?
Look forward to your rely!
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for your attention.
I think the attention for the cls_token can reflect the relative importance of different patches for classification. The reason is that cls_token is updated in every transformer layer, by weighted summation with attention weight. The attention weight is calculated between cls_token and other patch tokens.
Thanks very much for sharing your code!
I have a question about the training process of TS-CAM.
During training, you do not use the cls_token to conduct classification, use remaining tokens instead.
In this circumstance, can the first row of the attention weight (the attention for the cls_token) reflect the relative importance of differnt patches for classification accuractly?
Look forward to your rely!
Thanks!
The text was updated successfully, but these errors were encountered: