Train / Validation / Test splits for million song dataset #20

codyhesse · 2023-03-01T16:17:01Z

Hi!
Thank you for releasing this repo :)

I was wondering where I can find the train/test/validation splits you used for MSD? My team and I are trying to reproduce this study but, unfortunately, we can't find the 201 680 / 11 774 / 28 435 splits and the corresponding tags from Last.FM. Would be very helpful for any assistance on this!

Kind regards,
Cody

carlthome · 2023-04-24T07:10:15Z

Can add to this that we have been able to access the audio data itself, and it's really just used splits we're looking for. Curiously, the total number doesn't add up to one million so I guess some filtering/concatenation has been done as well.

yiyiyi0817 · 2023-08-14T08:12:55Z

Hello, Have you find the 'processed_annotations'data such as output_labels_msd.txt，index_msd.tsv，train_gt_msd.tsv? @codyhesse @carlthome @Spijkervet Thanks for your comment.

carlthome · 2023-08-14T08:16:25Z

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

yiyiyi0817 · 2023-08-14T10:34:26Z

Thank you very much. @carlthome

lix4 · 2024-03-25T01:06:33Z

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Where can I get the npy files?

keunwoochoi · 2024-03-25T04:33:16Z

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

lix4 · 2024-03-25T20:24:19Z

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

I mean the original data files like "3/6/36122424.npy". Is there a place I can download it?

keunwoochoi · 2024-03-25T22:53:01Z

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it.
https://arxiv.org/abs/2308.16389

lix4 · 2024-03-26T02:10:09Z

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it. https://arxiv.org/abs/2308.16389

All right, thank you for your explainition. I have searched them for a while.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train / Validation / Test splits for million song dataset #20

Train / Validation / Test splits for million song dataset #20

codyhesse commented Mar 1, 2023

carlthome commented Apr 24, 2023

yiyiyi0817 commented Aug 14, 2023

carlthome commented Aug 14, 2023 •

edited

Loading

yiyiyi0817 commented Aug 14, 2023

lix4 commented Mar 25, 2024

keunwoochoi commented Mar 25, 2024

lix4 commented Mar 25, 2024

keunwoochoi commented Mar 25, 2024

lix4 commented Mar 26, 2024

Train / Validation / Test splits for million song dataset #20

Train / Validation / Test splits for million song dataset #20

Comments

codyhesse commented Mar 1, 2023

carlthome commented Apr 24, 2023

yiyiyi0817 commented Aug 14, 2023

carlthome commented Aug 14, 2023 • edited Loading

yiyiyi0817 commented Aug 14, 2023

lix4 commented Mar 25, 2024

keunwoochoi commented Mar 25, 2024

lix4 commented Mar 25, 2024

keunwoochoi commented Mar 25, 2024

lix4 commented Mar 26, 2024

carlthome commented Aug 14, 2023 •

edited

Loading