About ImageNetV2 file name #42

hyunOO · 2020-12-26T05:22:28Z

Hi,

I downloaded Threshold0.7 of ImageNetV2 to use it as train-fullsup.
However, the file name of the image is not one of 0.jpeg to 9.jpeg, it is in the format like 0af3f1b55de791c4144e2fb6d7dfe96dfc22d3fc.jpeg, 8e1374a4e20d7af22665b7749158b7eb9fa3826e.jpeg, etc.

How can I change the file name to correctly use the box labels you annotated?

Thanks.

The text was updated successfully, but these errors were encountered:

junsukchoe · 2021-06-22T04:41:23Z

Recently the file name of ImageNetV2 has been changed. We are looking into this issue, but before that you can ask the authors for ImageNetV2 to provide the mapping between the old and current file name. Please refer to modestyachts/ImageNetV2#6 for more detail.

sbelharbi · 2021-09-11T15:36:48Z

hi @junsukchoe
the issue is still not solved yet.
contacted the authors.

is there a way around this?

meanwhile could you share the data you have with the old naming if it is ok with the authors?

thanks

sbelharbi · 2021-09-11T15:38:51Z

@hyunOO did you find a way to solve this?
thanks

sbelharbi · 2021-09-13T02:24:26Z

tomorrow, i'll try to brute-force the mapping between images based on their size hxw... hopefully it is unique. will check whether the name of the folders has changed as well so to use it to help iding samples. i see already samples with same size but in different folders...

there is only 10k images. it can be done given some time.
if you have the data or the mapping, please post it here.
thanks

junsukchoe · 2021-09-13T12:05:08Z

Hello,

For the quick solution, I have made a mapping list based on the SSIM scores: mapping.txt.
It hasn't been thoroughly verified yet, but when I checked a few samples, the mappings were correct.

I hope this helps until the official mapping is released.

Thanks!

sbelharbi · 2021-09-14T04:19:52Z

hi,
thanks for your quick/helpful reply.
i will work with this while waiting the official maps.
i did a brute force mapping inside the same folders using images sizes.
found only 4556 pairs. the rest have similar sizes!!!

all the found 4556 match the mapping you provided.

thanks again

here is the output of script:

100%|██████████████████████████████████████| 1000/1000 [00:06<00:00, 159.20it/s]
BFORCE: found 4556 possibly correct pairs.
BFORCE: found 5444 failed matching due to duplicate sizes.
found 0 failed comparison.

script:

import os
import sys
from os.path import join, dirname, abspath
from tqdm import tqdm

from PIL import Image

SPLIT = 'valid'


def get_ids(img_id_file: str) -> list:
    image_ids = []
    with open(img_id_file, 'r') as f:
        for line in f.readlines():
            image_ids.append(line.strip('\n').replace('val2/', ''))
    return image_ids


def get_image_sizes(path_img_sz: str) -> dict:
    """
    image_sizes.txt has the structure

    <path>,<w>,<h>
    path/to/image1.jpg,500,300
    path/to/image2.jpg,1000,600
    path/to/image3.jpg,500,300
    ...
    """
    image_sizes = {}
    with open(path_img_sz, 'r') as f:
        for line in f.readlines():
            image_id, ws, hs = line.strip('\n').split(',')
            image_id = image_id.replace('val2/', '')
            w, h = int(ws), int(hs)
            image_sizes[image_id] = (w, h)
    return image_sizes


def compare_bforce_with_mapping(path_provided_map_1: str, bf: dict) -> list:
    mapz = dict()
    with open(path_provided_map_1, 'r') as fin:
        for line in fin.readlines():
            org_k, new_k = line.strip('\n').replace(' ', '').split(',')
            assert org_k not in mapz
            mapz[org_k] = new_k

    failed = []
    for k in bf:
        if bf[k] != mapz[k]:
            failed.append(f'{k}, {bf[k]}, {mapz[k]}')

    return failed


if __name__ == '__main__':
    # hard paths.
    vlddir = 'folds/wsol-done-right-splits/ILSVRC/val'

    # original valid data.
    org_img_id_path = join(vlddir, 'image_ids.txt')
    org_img_sz_path = join(vlddir, 'image_sizes.txt')

    org_ids = get_ids(img_id_file=org_img_id_path)
    org_sz = get_image_sizes(path_img_sz=org_img_sz_path)

    # new valid data.
    data_valid = 'wsol-done-right/ILSVRC/val2'

    subfds = [x[0] for x in os.walk(data_valid) if x[0] != data_valid]
    subfds = [x.replace(data_valid + '/', '') for x in subfds]
    subfds.sort(key=int)
    new_ids = []
    new_sz = dict()
    mappings = dict()  # orig: new
    failed_mappings = []

    for fd in tqdm(subfds, ncols=80, total=len(subfds)):
        c_or_ids = [k for k in org_ids if k.startswith(fd + '/')]
        for file in os.listdir(join(data_valid, fd)):
            if file.endswith(".jpeg"):
                pfile = os.path.join(data_valid, fd, file)
                image = Image.open(pfile)
                w, h = image.size
                new_k = f'{fd}/{file}'
                new_ids.append(new_k)

                new_sz[new_k] = (w, h)

                # bf
                matchs = []
                for k in c_or_ids:
                    matchs.append(org_sz[k] == new_sz[new_k])

                if sum(matchs) == 1:
                    orig_k = c_or_ids[matchs.index(True)]
                    assert orig_k not in mappings
                    mappings[orig_k] = new_k
                else:
                    failed_mappings.append(new_k)

    with open('bfmapping.txt', 'w') as fout:
        for k in mappings:
            fout.write(f'{k}, {mappings[k]}\n')

    # compare bf results with the provided mapping.
    pathmp = 'mapping.txt'
    failed = compare_bforce_with_mapping(path_provided_map_1=pathmp,
                                         bf=mappings)

    print(f'BFORCE: found {len(list(mappings.keys()))} possibly correct pairs.')
    print(f'BFORCE: found {len(failed_mappings)} failed matching due to '
          f'duplicate '
          f'sizes.')

    print(f'found {len(failed)} failed comparison.')

jason718 · 2021-09-30T23:44:23Z

#53 submitted a PR regarding this. Used the mapping file provided in the thread.

sbelharbi mentioned this issue Sep 21, 2021

Mapping between old and new filenames modestyachts/ImageNetV2#6

Closed

hbai98 mentioned this issue Dec 28, 2022

About dataset hbai98/SCM#4

Open

Chanfeechen mentioned this issue Oct 30, 2023

ImageNet-V2 naming error anguyen8/gScoreCAM#9

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About ImageNetV2 file name #42

About ImageNetV2 file name #42

hyunOO commented Dec 26, 2020

junsukchoe commented Jun 22, 2021

sbelharbi commented Sep 11, 2021

sbelharbi commented Sep 11, 2021

sbelharbi commented Sep 13, 2021

junsukchoe commented Sep 13, 2021

sbelharbi commented Sep 14, 2021 •

edited

Loading

jason718 commented Sep 30, 2021

About ImageNetV2 file name #42

About ImageNetV2 file name #42

Comments

hyunOO commented Dec 26, 2020

junsukchoe commented Jun 22, 2021

sbelharbi commented Sep 11, 2021

sbelharbi commented Sep 11, 2021

sbelharbi commented Sep 13, 2021

junsukchoe commented Sep 13, 2021

sbelharbi commented Sep 14, 2021 • edited Loading

jason718 commented Sep 30, 2021

sbelharbi commented Sep 14, 2021 •

edited

Loading