You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that during self-training, the output dimensions of MoCo are set to 128, and the InfoNCE loss is calculated based on the 128-dimension features. However, when training the linear head, the fully connected output layer is concatenated with the 2048-dimensional features. In my opinion, if the 128-dimensional data represents latent features, it would be better to concatenate the classification head with the 128-dimensional output instead.
So may I ask what the reason is for using this implementation in the code?
The text was updated successfully, but these errors were encountered:
I noticed that during self-training, the output dimensions of MoCo are set to 128, and the InfoNCE loss is calculated based on the 128-dimension features. However, when training the linear head, the fully connected output layer is concatenated with the 2048-dimensional features. In my opinion, if the 128-dimensional data represents latent features, it would be better to concatenate the classification head with the 128-dimensional output instead.
So may I ask what the reason is for using this implementation in the code?
The text was updated successfully, but these errors were encountered: