Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the problem of download training data #2

Open
feibin95 opened this issue Jun 26, 2019 · 7 comments
Open

the problem of download training data #2

feibin95 opened this issue Jun 26, 2019 · 7 comments

Comments

@feibin95
Copy link

First, download the training data from the website (http://www.msceleb.org/download/lowshot)

The data set has been deleted from the official website. Could you please provide a download link? Thank you very much!

@wuyuebupt
Copy link
Owner

@feifei9099
I am afraid that I can not provide a download link for such large dataset. You may find some alternative download links from:
https://github.com/deepinsight/insightface/wiki/Dataset-Zoo
https://ibug.doc.ic.ac.uk/resources/lightweight-face-recognition-challenge-workshop/

I also would like to share some news about the dataset:
https://www.vice.com/en_us/article/a3x4mp/microsoft-deleted-a-facial-recognition-database-but-its-not-dead
https://megapixels.cc/datasets/msceleb/

Hope this helps.

@feibin95
Copy link
Author

Thank you very much for the news you provide, I found a download link here:
https://academictorrents.com/details/9e67eb7cc23c9417f39778a8e06cca5e26196a97/tech

But I still have a problem if I want to study the Low-Shot part of ms-celeb-1mrefer to your paper “Low-shot Face Recognition with Hybrid Classifiers, ICCV Workshop, 2017”. Should I download all the data sets, and all the data sets contain low-shot parts?
In your paper you introduced the low shot part include “Base Set consists of 20,000 people, with an average of 58 training samples per person. Novel Set has the rest 1,000 people, of which each comes with 1, 2 or 5 training images.”
Thank you very much!

@wuyuebupt
Copy link
Owner

wuyuebupt commented Jun 28, 2019

@feifei9099

The data split is provided by the challenge organizer. The first paper that states the setting should be "One-shot Face Recognition by Promoting Underrepresented Classes" (https://arxiv.org/pdf/1707.05574.pdf).

The training data are a part of the full dataset. I am not sure if you can split it out directly from the whole data as the low shot data is a cleaned version.

I found a list of the training ids. 0-19999 is the base set, 20000-20999 is the novel set. The link is at:
https://drive.google.com/file/d/14n5f6ZfmxP20j3iDGRiDSKGVybs2Sy8y/view?usp=sharing

I can not find the training data originally in tsv format. I do have a copy of image data, which is about 24GB. Even I find the original training file, it is still too large for me to upload.

I do find two parts of data.

  1. training data for novel set: https://drive.google.com/file/d/18g4Cn7uSWxLM1IHxVMHbC-eI60juuXDn/view?usp=sharing
  2. validation set that contains 25000 images, https://drive.google.com/file/d/1R0yky3CT6Uuvu6z2KQxggxsRV9XrrLRs/view?usp=sharing

One list for all training images for base set: https://drive.google.com/file/d/1by9zWY2xcocYdne8_sGKILnCARXLJsB7/view?usp=sharing

If the number of training images matches, they should be the same.

Hope these help.

@feibin95
Copy link
Author

Thank you very much for your help. It means a lot to me.

@wtongping
Copy link

@feifei9099

The data split is provided by the challenge organizer. The first paper that states the setting should be "One-shot Face Recognition by Promoting Underrepresented Classes" (https://arxiv.org/pdf/1707.05574.pdf).

The training data are a part of the full dataset. I am not sure if you can split it out directly from the whole data as the low shot data is a cleaned version.

I found a list of the training ids. 0-19999 is the base set, 20000-20999 is the novel set. The link is at:
https://drive.google.com/file/d/14n5f6ZfmxP20j3iDGRiDSKGVybs2Sy8y/view?usp=sharing

I can not find the training data originally in tsv format. I do have a copy of image data, which is about 24GB. Even I find the original training file, it is still too large for me to upload.

I do find two parts of data.

  1. training data for novel set: https://drive.google.com/file/d/18g4Cn7uSWxLM1IHxVMHbC-eI60juuXDn/view?usp=sharing
  2. validation set that contains 25000 images, https://drive.google.com/file/d/1R0yky3CT6Uuvu6z2KQxggxsRV9XrrLRs/view?usp=sharing

One list for all training images for base set: https://drive.google.com/file/d/1by9zWY2xcocYdne8_sGKILnCARXLJsB7/view?usp=sharing

If the number of training images matches, they should be the same.

Hope these help.

@wuyuebupt
Thank you for providing these links. I have been able to sort out the training data, but I have been searching on the network for a long time, but I still can't find the test data. Could you please share the list(image name) of test data of base and novel?

@wuyuebupt
Copy link
Owner

@wtongping

I did not find the test data. Even I found the test data, the labels were not available as the challenge evaluation was run by the organizers if I remember correctly.

@wtongping
Copy link

@wuyuebupt ok, thank u!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants