Shuffling in ImageNet dataloader #5

alevine0 · 2020-01-05T09:30:33Z

Hi,

When attempting to replicate, I've noticed that the dataloader loads ImageNet validation samples in order, so the true label is correlated to the index:

idx	label	predict	radius	correct	time
0	0	-1	0.0	0	0:01:05.891589
100	2	2	1.06	1	0:00:49.151814
200	4	-1	0.0	0	0:00:49.389205
300	6	6	0.145	1	0:00:49.527328
400	8	8	1.55	1	0:00:49.616975
500	10	10	1.8	1	0:00:49.704927

But in the provided data, this isn't the case:

smoothing-adversarial/data/certify/imagenet/replication/resnet50/noise_0.50/test/sigma_0.50:
idx	label	predict	radius	correct	time
0	65	-1	0.0000	0	0:02:21.932991
100	473	700	0.0642	0	0:02:17.267692
200	704	704	0.1656	1	0:02:17.283181
300	329	116	0.2634	0	0:02:17.284284
400	359	359	1.1224	1	0:02:17.282202
500	270	270	0.4027	1	0:02:17.290370

I'm running the command:

export model="pretrained_models/imagenet/replication/resnet50/noise_0.50/checkpoint.pth.tar"
export output="certification_output_standard_0.50" 
python code/certify.py imagenet $model 0.50 $output --skip 100 --batch 400

For what it's worth, in data_cohen, the samples are in fact in order...

/data_cohen/certify/imagenet/resnet50/noise_0.50/test/sigma_0.50:
idx	label	predict	radius	correct	time
0	0	394	0.0125	0	0:02:32.717239
100	2	2	1.86	1	0:02:30.318316
200	4	-1	0.0	0	0:02:31.564715
300	6	6	0.709	1	0:02:30.939915
400	8	8	1.53	1	0:02:31.558977
500	10	10	1.96	1	0:02:31.548283

(I'm assuming that the values differ because the replication model was trained independently, so that isn't the issue: I'm just wondering why the selection of samples is different in the reported data from when I try to replicate it.)

The text was updated successfully, but these errors were encountered:

Hadisalman · 2020-02-18T08:34:19Z

@alevine0 I apologize for the late reply. Hmmm that is an interesting observation, we didn't even notice this till now. So we are using a version of ImageNet that is zipped and already exists on our cluster, so I guess the validation set is already shuffled there.

This shuffled ordering is consistent throughout all of our experiments, and as you mentioned, this shouldn't affect any of our results especially as we actually replicate Cohen's results using this version of ImageNet.

As such, we'll close this issue, but your post would certainly help future researchers avoid confusion when they try to replicate our results!

Hadisalman closed this as completed Feb 18, 2020

Hadisalman pinned this issue Feb 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shuffling in ImageNet dataloader #5

Shuffling in ImageNet dataloader #5

alevine0 commented Jan 5, 2020

Hadisalman commented Feb 18, 2020

Shuffling in ImageNet dataloader #5

Shuffling in ImageNet dataloader #5

Comments

alevine0 commented Jan 5, 2020

Hadisalman commented Feb 18, 2020