Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size mismatch between model and ckpt #165

Open
kayleeliyx opened this issue Oct 6, 2024 · 2 comments
Open

size mismatch between model and ckpt #165

kayleeliyx opened this issue Oct 6, 2024 · 2 comments

Comments

@kayleeliyx
Copy link

i am having exactly the same issue as this one:
#162
the loaded model should be 630k-audioset-best.pt. i am trying to use ESC50 to finetune the clap model. later i want to use my own dataset to finetune the model.

thanks a lot for your help!

@kayleeliyx
Copy link
Author

kayleeliyx commented Oct 8, 2024

I am able to train the model with parameters settings like this:
"--amodel=HTSAT-tiny", but I don't know what's the difference between HTSAT-base and why the model here https://huggingface.co/lukewys/laion_clap/tree/main doesn't work. Also how to change the dimension of the input 'audio_projection.0.weight' layer in the projection layer in the audio encoder, where it is expected to be [512, 2048].

@tbrouns
Copy link

tbrouns commented Oct 29, 2024

The shape is only expected to be 2048 if you use the PANN-14 architecture, i.e. --amodel PANN-14.

However, as far as I know, the authors haven't released any pretrained weights based on the PANN-14 architecture.

Both HTSAT-tiny and HTSAT-base will use a 768 output dimension. The former is just a smaller version of the latter. HTSAT-tiny requires significantly less GPU memory and might be able to support larger batch sizes, but personally I found its performance to be worse than HTSAT-base.

These weights use the HTSAT-tiny architecture:

630k-audioset-best.pt
630k-audioset-fusion-best.pt
630k-best.pt
630k-fusion-best.pt

And these use the HTSAT-base:

music_audioset_epoch_15_esc_90.14.pt
music_speech_audioset_epoch_15_esc_89.98.pt
music_speech_epoch_15_esc_89.25.pt

Hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants