The project is based on pytorch and integrates the current mainstream network architecture, including VGGnet, ResNet, DenseNet, MobileNet and DarkNet (YOLOv2 and YOLOv3).
This project will fully comply with the relevant details mentioned in the paper. Since the structural details in some papers are incomplete, we have added some personal insights. The input size of all networks is uniformly set to (224, 224, 3) (H, W, C).
Model | Params/Million | FLOPs/G | Time_cast/ms | Top-1 | Top-5 |
---|---|---|---|---|---|
--- 2015 --- | |||||
Vgg11 | 9.738984 | 15.02879 | 205.59 | 70.4 | 89.6 |
Vgg13 | 9.92388 | 22.45644 | 324.13 | 71.3 | 90.1 |
Vgg16 | 15.236136 | 30.78787 | 397.33 | 74.4 | 91.9 |
Vgg19 | 20.548392 | 39.11929 | 451.11 | 74.5 | 92.0 |
--- 2016 --- | |||||
ResNet18 | 11.693736 | 3.65921 | 86.56 | ||
ResNet34 | 21.801896 | 7.36109 | 123.07 | 75.81 | 92.6 |
ResNet50 | 25.557032 | 8.27887 | 293.62 | 77.15 | 93.29 |
ResNet101 | 44.54916 | 15.71355 | 413.51 | 78.25 | 93.95 |
ResNet152 | 60.192808 | 23.15064 | 573.09 | 78.57 | 94.29 |
PreActResNet18 | 11.690792 | 3.65840 | 86.12 | ||
PreActResNet34 | 21.798952 | 7.36029 | 142.51 | ||
PreActResNet50 | 25.545256 | 8.27566 | 296.39 | ||
PreActResNet101 | 44.537384 | 15.71034 | 418.37 | ||
PreActResNet152 | 60.181032 | 23.14743 | 578.81 | 78.90 | 94.50 |
DarkNet19(YOLOv2) | 8.01556 | 10.90831 | 139.21 | ||
--- 2017 --- | |||||
DenseNet121(k=32) | 7.978734 | 5.69836 | 286.45 | ||
DenseNet169(k=32) | 14.149358 | 6.75643 | 375.47 | ||
DenseNet201(k=32) | 20.013806 | 8.63084 | 486.14 | ||
DenseNet264(k=32) | 33.337582 | 11.57003 | 689.63 | ||
DenseNet161(k=48) | 28.680814 | 15.50790 | 708.36 | ||
DPN92 | 36.779704 | 12.77985 | 366.11 | 79.30 | 94.60 |
DPN98 | 60.21588 | 22.92897 | 573.04 | 79.80 | 94.80 |
ResNeXt50_2x40d | 25.425 | 8.29756 | 364.24 | 77.00 | |
ResNeXt50_4x24d | 25.292968 | 8.37150 | 416.01 | 77.40 | |
ResNeXt50_8x14d | 25.603016 | 8.58994 | 444.33 | 77.70 | |
ResNeXt50_32x4d | 25.028904 | 8.51937 | 460.20 | 77.80 | |
ResNeXt101_2x40d | 44.456296 | 15.75783 | 640.83 | 78.3 | |
ResNeXt101_4x24d | 44.363432 | 15.84712 | 627.48 | 78.6 | |
ResNeXt101_8x14d | 45.104328 | 16.23445 | 870.31 | 78.7 | |
ResNeXt101_32x4d | 44.177704 | 16.02570 | 952.88 | 78.8 | |
MobileNet | 4.231976 | 1.14757 | 100.45 | 70.60 | |
SqueezeNet | 1.2524 | 1.69362 | 90.97 | 57.50 | 80.30 |
SqueezeNet + Simple Bypass | 1.2524 | 1.69550 | 96.82 | 60.40 | 82.50 |
SqueezeNet + Complex Bypass | 1.594928 | 2.40896 | 130.98 | 58.80 | 82.00 |
--- 2018 --- | |||||
PeleeNet | 4.51988 | 4.96656 | 237.18 | 72.6 | 90.6 |
1.0-SqNxt-23 | 0.690824 | 0.48130 | 69.93 | 59.05 | 82.60 |
1.0-SqNxt-23v5 | 0.909704 | 0.47743 | 58.40 | 59.24 | 82.41 |
2.0-SqNxt-23 | 2.2474 | 1.12928 | 111.89 | 67.18 | 88.17 |
2.0-SqNxt-23v5 | 3.11524 | 1.12155 | 93.54 | 67.44 | 88.20 |
MobileNetV2 | 3.56468 | 0.66214 | 138.15 | 74.07 | |
DarkNet53(YOLOv3) | 41.609928 | 14.25625 | 275.50 | ||
DLA-34 | 15.784869 | 2.27950 | 70.17 | ||
DLA-46-C | 1.310885 | 0.40895 | 40.29 | 64.9 | 86.7 |
DLA-60 | 22.335141 | 2.93399 | 110.80 | ||
DLA-102 | 33.732773 | 4.42848 | 154.27 | ||
DLA-169 | 53.990053 | 6.65083 | 230.39 | ||
DLA-X-46-C | 1.077925 | 0.37765 | 44.74 | 66.0 | 87.0 |
DLA-X-60-C | 1.337765 | 0.40313 | 50.84 | 68.0 | 88.4 |
DLA-X-60 | 17.650853 | 2.39033 | 131.93 | ||
DLA-X-102 | 26.773157 | 3.58778 | 164.93 | ||
IGCV3-D (0.7) | 2.490294 | 0.31910 | 165.14 | 68.45 | |
IGCV3-D (1.0) | 3.491688 | 0.60653 | 263.80 | 72.20 | |
IGCV3-D (1.4) | 6.015164 | 1.11491 | 318.40 | 74.70 | |
--- 2019 --- | |||||
EfficientNet-B0 | 5.288548 | 0.01604 | 186.61 | 76.30 | 93.20 |
EfficientNet-B1 | 7.794184 | 0.02124 | 266.05 | 78.80 | 94.40 |
EfficientNet-B2 | 9.109994 | 0.02240 | 277.94 | 79.80 | 94.90 |
EfficientNet-B3 | 12.233232 | 0.02905 | 376.24 | 81.10 | 95.50 |
EfficientNet-B4 | 19.341616 | 0.03762 | 513.91 | 82.60 | 96.30 |
EfficientNet-B5 | 30.389784 | 0.05086 | 721.95 | 83.30 | 96.70 |
EfficientNet-B6 | 43.040704 | 0.06443 | 1062.64 | 84.00 | 96.90 |
EfficientNet-B7 | 66.34796 | 0.08516 | 1520.88 | 84.40 | 97.10 |
Model | Params/Million | FLOPs/G | Time_cast/ms | Top-1 | Top-5 |
---|---|---|---|---|---|
--- 2014 --- | |||||
GoogleNet V1 | 6.998552 | 3.20387 | 85.95 | ||
GoogleNet V1 (LRN) | 6.998552 | 3.20387 | 192.64 | 71.00 | 90.80 |
GoogleNet V1 (Bn) | 7.013112 | 3.21032 | 139.42 | 73.20 | |
--- 2015 --- | |||||
GoogleNet V2 | 11.204936 | 4.08437 | 127.71 | 76.60 | |
GoogleNet V3 | 23.834568 | 7.60887 | 208.01 | 78.80 | 94.40 |
--- 2016 --- | |||||
GoogleNet V4 | 42.679816 | 12.31977 | 324.36 | 80.00 | 95.10 |
Note: GoogleNet V1 does not include the Bn layer, but after the first two layers of convolution, LocalResponseNorm is added, this operation will increase the calculation time of the model. So we found that GoogleNet V1 is slower than GoogleNet V1_Bn.
For Time_cast, we set the input size: (4, 3, 224, 224), and then test multiple rounds of averaging (time is susceptible to interference from CPU operating state).
http://www.image-net.org/challenges/LSVRC/2012/downloads
我们需要的是训练集与验证集(等同测试集),一般论文当中只展示验证集上的结果(Top-1 & Top-5)。
Development kit (Task 1 & 2). 2.5MB. (这个并没有用到)
Training images (Task 1 & 2). 138GB. MD5: 1d675b47d978889d74fa0da5fadfb00e
Validation images (all tasks). 6.3GB. MD5: 29b22e2961454d5413ddabcf34fc5622
方法一:https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh
方法二:
解压下载的数据文件,这可能需要一段时间
tar xvf ILSVRC2012_img_train.tar -C ./train
tar xvf ILSVRC2012_img_val.tar -C ./val
对于train数据,解压后是1000个tar文件,需要再次解压,解压脚本dataset/unzip.sh如下
dir=/data/srd/data/Image/ImageNet/train
for x in `ls $dir/*tar`
do
filename=`basename $x .tar`
mkdir $dir/$filename
tar -xvf $x -C $dir/$filename
done
rm *.tar
注:将其中的'dir'修改为自己的文件目录
然后运行
sh unzip.sh
对于val数据,解压之后是50000张图片,我们需要将每一个类的图片整理到一起,与train一致。将项目dataset/valprep.sh脚本放到val文件夹下运行
sh valprep.sh
下载好的训练集下的每个文件夹是一类图片,文件夹名对应的标签在下载好的Development kit的标签文件meta.mat中,这是一个matlab文件,scipy.io.loadmat可以读取文件内容,验证集下是50000张图片,每张图片对应的标签在ILSVRC2012_validation_ground_truth.txt中。
数据增强:取图片时随机取,然后将图片放缩为短边为256,然后再随机裁剪224x224的图片, 再把每个通道减去相应通道的平均值,随机左右翻转。
Due to the existence of a fully connected layers, we neeed to limit the size of the images in the input network. Set the size of the image input neural network to 224x224, but the size of the image in the test set is not fixed. It is difficult to completely cover the information of the target object in the image by only the center clipping method, so we crop the image at multiple locations.
One-crop of an image is created by cropping one 224 × 224 regions from the center of a 256 × 256 image; Five-crop is five 224 × 224 sized image regions cropped from top left, top right, bottom left, bottom right and center of original image; Ten-crop is horizontally flipping each cropped region base on the results of five-crop.
Use Pytorch,