Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train / Validation / Test splits for million song dataset #20

Open
codyhesse opened this issue Mar 1, 2023 · 9 comments
Open

Train / Validation / Test splits for million song dataset #20

codyhesse opened this issue Mar 1, 2023 · 9 comments

Comments

@codyhesse
Copy link

Hi!
Thank you for releasing this repo :)

I was wondering where I can find the train/test/validation splits you used for MSD? My team and I are trying to reproduce this study but, unfortunately, we can't find the 201 680 / 11 774 / 28 435 splits and the corresponding tags from Last.FM. Would be very helpful for any assistance on this!

Kind regards,
Cody

@carlthome
Copy link

Can add to this that we have been able to access the audio data itself, and it's really just used splits we're looking for. Curiously, the total number doesn't add up to one million so I guess some filtering/concatenation has been done as well.

@yiyiyi0817
Copy link

Hello, Have you find the 'processed_annotations'data such as output_labels_msd.txt,index_msd.tsv,train_gt_msd.tsv? @codyhesse @carlthome @Spijkervet Thanks for your comment.

@carlthome
Copy link

carlthome commented Aug 14, 2023

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

@yiyiyi0817
Copy link

Thank you very much. @carlthome

@lix4
Copy link

lix4 commented Mar 25, 2024

@yiyiyi0817 don't know about those files specifically (@codyhesse and @SebastianLoef might know more), but we believe the splits used in CLMR were the ones by @keunwoochoi over in https://github.com/keunwoochoi/MSD_split_for_tagging at least.

Where can I get the npy files?

@keunwoochoi
Copy link

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

@lix4
Copy link

lix4 commented Mar 25, 2024

@lix4 it's on the mentioned repo - https://github.com/keunwoochoi/MSD_split_for_tagging

I mean the original data files like "3/6/36122424.npy". Is there a place I can download it?

@keunwoochoi
Copy link

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it.
https://arxiv.org/abs/2308.16389

@lix4
Copy link

lix4 commented Mar 26, 2024

you meant the audio files. sorry you should ask around people who might have them as the crawling API doesn't work anymore. it's very problematic that i even wrote a short paper about it. https://arxiv.org/abs/2308.16389

All right, thank you for your explainition. I have searched them for a while.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants