Mapping between old and new filenames #6

junsukchoe · 2020-12-28T06:51:03Z

Hello,

It seems that the file names of ImageNetV2 have been changed.
Could you provide the mappings between old and new filenames?

Thanks!

Vaishaal · 2021-01-11T07:02:23Z

Hello! What are the file name changes you are seeing? I had changed the filenames in the public release temporarily and rolled it back. Could you check now to see if the file names are still different?

expectopatronum · 2021-01-14T06:55:42Z

Hi!
It looks like not only the file names have changed, but also the number of files.

I downloaded the dataset from here on December 17th 2020. After unpacking it consisted of 3 directories (imagenet-matched-frequency-format-val, imagenet-threshold-0.7-format-val, imagenet-top-images-format-val), each containing 1000 directories with names like n03461385. The directories in imagenet-matched-frequency-format-val for example each contains the images 0.jpeg ... 19.jpeg (which amounts to 20,000 images instead of the mentioned 10,000).

I downloaded the dataset again 1 or 2 days ago and all of a sudden the directory imagenet-matched-frequency-format-val contains directories 0 ... 999 and each of them contains 10 images each with names like 7e4a8987a9a330189cc38c4098b1c57ac301713f.jpeg.

At first I thought I mixed something up but I had documented everything in December when I first downloaded it and even my browser remembered, that I downloaded it from exactly the same URL.

So, what's going on? Could you clarify what the correct version is (I assume the latter with 10k images)? But where do the additional images in the directory I downloaded in December come from?

Best regards
Verena

Vaishaal · 2021-01-14T07:21:08Z

Hi @expectopatronum & @junsukchoe

The current dataset release (the one you can download right now with 10k images is the correct one). We had a mixup with our S3 bucket in October 2020 and all our files got deleted, and we re-uploaded the dataset to the same locations.

The long names like "7e4a8987a9a330189cc38c4098b1c57ac301713f" are our internal candidate ids and were added to the release to allow you can merge the images with the data structures/labels found in this repository, and our other project: https://github.com/modestyachts/evaluating_machine_accuracy_on_imagenet. You were right @junsukchoe this is indeed a change in our current release from our old release from pre October 2020.

The extra 10k images are duplicates so you can ignore them!

I can dig up the exact mapping between the filenames old release (from Pre October 2020) and the new release if you need it!

Thanks,
Vaishaal Shankar

m-Just · 2021-03-12T10:08:24Z

Hi! The new directory names 0, 1, ..., 999 cause trouble for using torchvision.datasets.ImageFolder, which sorts the names into 0, 1, 10, 100, ..., 999, different from the original order. To get around, I padded zeros to all directory names to 4 digits and it worked. In Python:

import os, glob

for path in glob.glob('../dataset/imagenetv2*'):
    if os.path.isdir(path):
        for subpath in glob.glob(f'{path}/*'):
            dirname = subpath.split('/')[-1]
            os.rename(subpath, '/'.join(subpath.split('/')[:-1]) + '/' + dirname.zfill(4))

sbelharbi · 2021-09-11T15:34:02Z

hi @Vaishaal

just downloaded the dataset imagenetv2-threshold0.7 using this link from here.
when untared, there is only one folder imagenetv2-threshold0.7-format-val.
and as others mentioned, names of files are faa7b8da1c2a3f0fee1814d01d1afffb4b5952f7.jpeg.

I can dig up the exact mapping between the filenames old release (from Pre October 2020) and the new release if you need it!

any news on the mapping?

@expectopatronum did you find a way around this?

i really appreciate your help
thanks

Vaishaal · 2021-09-20T23:22:28Z

The tar.gz should have 1000 sub-folders which correspond to each of the 1000 imagnet classes (https://gist.github.com/yrevar/942d3a0ac09ec9e5eb3a)

Is this not what you see?

Vaishaal · 2021-09-20T23:23:06Z

if you are using pytorch you can use https://github.com/modestyachts/ImageNetV2_pytorch to load the dataset.

sbelharbi · 2021-09-21T01:21:03Z

hi,
the issue has nothing to do with the dataset.
but, the issue is the name of files that have changed and caused a problem somewhere else in some repo that uses the old names.
it was 0.jpeg for example, and now it is faa7b8da1c2a3f0fee1814d01d1afffb4b5952f7.jpeg.
someone had made additional annotation based on the old naming.
so, as you said above that you changed the naming system.
so, i was wondering if you have the old naming (mapping between the old naming to the new naming).

thanks

Vaishaal · 2021-09-27T06:27:20Z

Oh did not realize there was a dependency on the filenames! We actually lost the old version of the dataset because the newer version with the candidate ids allows us to associate each image in the release to the rest of the metadata we've released in https://github.com/modestyachts/ImageNetV2.

If you have a copy of the old dataset lying around I can probably generate the mapping quite easily but right now I don't have access to the old dataset.

sbelharbi · 2021-09-27T11:56:10Z

i dont have the old dataset, but probably the author of the additional annotation might @junsukchoe

thanks

ludles · 2021-10-17T19:55:25Z

Hi! The new directory names 0, 1, ..., 999 cause trouble for using torchvision.datasets.ImageFolder, which sorts the names into 0, 1, 10, 100, ..., 999, different from the original order. To get around, I padded zeros to all directory names to 4 digits and it worked. In Python:
import os, glob

for path in glob.glob('../dataset/imagenetv2*'):
    if os.path.isdir(path):
        for subpath in glob.glob(f'{path}/*'):
            dirname = subpath.split('/')[-1]
            os.rename(subpath, '/'.join(subpath.split('/')[:-1]) + '/' + dirname.zfill(4))

Thank you for your snippet! It solves the problem. I made the following minor adjustments to make it more robust w.r.t. OS. (Windows 10 has a different path separator from Linux.)

import glob
for path in glob.glob('../dataset/imagenetv2*'):
    if os.path.isdir(path):
        for subpath in glob.glob(f'{path}/*'):
            dirname = os.path.basename(subpath)
            os.rename(subpath, os.path.sep.join([os.path.dirname(subpath), dirname.zfill(4)]))

tangtaogo · 2022-04-09T13:12:00Z

So what's the mapping between old and new filenames? Why not just keep consistent with the original valset

Vaishaal · 2022-04-25T23:04:23Z

Ah sorry we lost the old filenames. You can use the ImageNetV2 pytorch dataloader: https://github.com/modestyachts/ImageNetV2_pytorch if you'd like code that loads the dataset correctly (so it is compatible with ImageNet-Val)

junsukchoe mentioned this issue Jun 22, 2021

About ImageNetV2 file name clovaai/wsolevaluation#42

Open

mckinziebrandon mentioned this issue Aug 10, 2022

generate_class_info.py uses incorrect constructor for ImageNetData #9

Closed

Vaishaal closed this as completed Oct 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mapping between old and new filenames #6

Mapping between old and new filenames #6

junsukchoe commented Dec 28, 2020

Vaishaal commented Jan 11, 2021

expectopatronum commented Jan 14, 2021

Vaishaal commented Jan 14, 2021

m-Just commented Mar 12, 2021 •

edited

Loading

sbelharbi commented Sep 11, 2021

Vaishaal commented Sep 20, 2021

Vaishaal commented Sep 20, 2021

sbelharbi commented Sep 21, 2021

Vaishaal commented Sep 27, 2021

sbelharbi commented Sep 27, 2021

ludles commented Oct 17, 2021

tangtaogo commented Apr 9, 2022

Vaishaal commented Apr 25, 2022

Mapping between old and new filenames #6

Mapping between old and new filenames #6

Comments

junsukchoe commented Dec 28, 2020

Vaishaal commented Jan 11, 2021

expectopatronum commented Jan 14, 2021

Vaishaal commented Jan 14, 2021

m-Just commented Mar 12, 2021 • edited Loading

sbelharbi commented Sep 11, 2021

Vaishaal commented Sep 20, 2021

Vaishaal commented Sep 20, 2021

sbelharbi commented Sep 21, 2021

Vaishaal commented Sep 27, 2021

sbelharbi commented Sep 27, 2021

ludles commented Oct 17, 2021

tangtaogo commented Apr 9, 2022

Vaishaal commented Apr 25, 2022

m-Just commented Mar 12, 2021 •

edited

Loading