Question about training set of CLAP #145

sjhan91 · 2024-04-04T02:00:58Z

Hello, I wonder which dataset you used to train CLAP (especially for music).

The reason I ask is audio embeddings from audio syntheized from MIDI is not closely aligned with text embeddings (MusicCaps, AudioStock, LP-MusicCaps, AudioSet). (when I draw samples in t-SNE space)

Also, GTZAN embeddings show similar situation.

Or, can you let me know the example of captions?

Regards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about training set of CLAP #145

Question about training set of CLAP #145

sjhan91 commented Apr 4, 2024 •

edited

Loading

Question about training set of CLAP #145

Question about training set of CLAP #145

Comments

sjhan91 commented Apr 4, 2024 • edited Loading

sjhan91 commented Apr 4, 2024 •

edited

Loading