-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mapping between old and new filenames #6
Comments
Hello! What are the file name changes you are seeing? I had changed the filenames in the public release temporarily and rolled it back. Could you check now to see if the file names are still different? |
Hi! I downloaded the dataset from here on December 17th 2020. After unpacking it consisted of 3 directories ( I downloaded the dataset again 1 or 2 days ago and all of a sudden the directory At first I thought I mixed something up but I had documented everything in December when I first downloaded it and even my browser remembered, that I downloaded it from exactly the same URL. So, what's going on? Could you clarify what the correct version is (I assume the latter with 10k images)? But where do the additional images in the directory I downloaded in December come from? Best regards |
Hi @expectopatronum & @junsukchoe The current dataset release (the one you can download right now with 10k images is the correct one). We had a mixup with our S3 bucket in October 2020 and all our files got deleted, and we re-uploaded the dataset to the same locations. The long names like "7e4a8987a9a330189cc38c4098b1c57ac301713f" are our internal candidate ids and were added to the release to allow you can merge the images with the data structures/labels found in this repository, and our other project: https://github.com/modestyachts/evaluating_machine_accuracy_on_imagenet. You were right @junsukchoe this is indeed a change in our current release from our old release from pre October 2020. The extra 10k images are duplicates so you can ignore them! I can dig up the exact mapping between the filenames old release (from Pre October 2020) and the new release if you need it! Thanks, |
Hi! The new directory names 0, 1, ..., 999 cause trouble for using import os, glob
for path in glob.glob('../dataset/imagenetv2*'):
if os.path.isdir(path):
for subpath in glob.glob(f'{path}/*'):
dirname = subpath.split('/')[-1]
os.rename(subpath, '/'.join(subpath.split('/')[:-1]) + '/' + dirname.zfill(4)) |
hi @Vaishaal just downloaded the dataset
any news on the mapping? @expectopatronum did you find a way around this? i really appreciate your help |
The tar.gz should have 1000 sub-folders which correspond to each of the 1000 imagnet classes (https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a) Is this not what you see? |
if you are using pytorch you can use https://github.com/modestyachts/ImageNetV2_pytorch to load the dataset. |
hi, thanks |
Oh did not realize there was a dependency on the filenames! We actually lost the old version of the dataset because the newer version with the candidate ids allows us to associate each image in the release to the rest of the metadata we've released in https://github.com/modestyachts/ImageNetV2. If you have a copy of the old dataset lying around I can probably generate the mapping quite easily but right now I don't have access to the old dataset. |
i dont have the old dataset, but probably the author of the additional annotation might @junsukchoe thanks |
Thank you for your snippet! It solves the problem. I made the following minor adjustments to make it more robust w.r.t. OS. (Windows 10 has a different path separator from Linux.) import glob
for path in glob.glob('../dataset/imagenetv2*'):
if os.path.isdir(path):
for subpath in glob.glob(f'{path}/*'):
dirname = os.path.basename(subpath)
os.rename(subpath, os.path.sep.join([os.path.dirname(subpath), dirname.zfill(4)])) |
So what's the mapping between old and new filenames? Why not just keep consistent with the original valset |
Ah sorry we lost the old filenames. You can use the ImageNetV2 pytorch dataloader: https://github.com/modestyachts/ImageNetV2_pytorch if you'd like code that loads the dataset correctly (so it is compatible with ImageNet-Val) |
Hello,
It seems that the file names of ImageNetV2 have been changed.
Could you provide the mappings between old and new filenames?
Thanks!
The text was updated successfully, but these errors were encountered: