You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i am having exactly the same issue as this one: #162
the loaded model should be 630k-audioset-best.pt. i am trying to use ESC50 to finetune the clap model. later i want to use my own dataset to finetune the model.
thanks a lot for your help!
The text was updated successfully, but these errors were encountered:
I am able to train the model with parameters settings like this: "--amodel=HTSAT-tiny", but I don't know what's the difference between HTSAT-base and why the model here https://huggingface.co/lukewys/laion_clap/tree/main doesn't work. Also how to change the dimension of the input 'audio_projection.0.weight' layer in the projection layer in the audio encoder, where it is expected to be [512, 2048].
The shape is only expected to be 2048 if you use the PANN-14 architecture, i.e. --amodel PANN-14.
However, as far as I know, the authors haven't released any pretrained weights based on the PANN-14 architecture.
Both HTSAT-tiny and HTSAT-base will use a 768 output dimension. The former is just a smaller version of the latter. HTSAT-tiny requires significantly less GPU memory and might be able to support larger batch sizes, but personally I found its performance to be worse than HTSAT-base.
i am having exactly the same issue as this one:
#162
the loaded model should be
630k-audioset-best.pt
. i am trying to use ESC50 to finetune the clap model. later i want to use my own dataset to finetune the model.thanks a lot for your help!
The text was updated successfully, but these errors were encountered: