This is the official repository for our paper: Pruning Self-attentions into Convolutional Layers in Single Path by Haoyu He, Jianfei Cai, Jing liu, Zizheng Pan, Jing Zhang, Dacheng Tao and Bohan Zhuang.
[2023-06-09]: Update distillation configurations and pre-trained checkpoints.
[2021-12-04]: Release pre-trained models.
[2021-11-25]: Release code.
To reduce the massive computational resource consumption for ViTs and add convolutional inductive bias, our SPViT prunes pre-trained ViT models into accurate and compact hybrid models by pruning self-attentions into convolutional layers. Thanks to the proposed weight-sharing scheme between self-attention and convolutional layers that cast the search problem as finding which subset of parameters to use, our SPViT has significantly reduced search cost.
We provide experimental results and pre-trained models for SPViT:
Name | Acc@1 | Acc@5 | # parameters | FLOPs | Model |
---|---|---|---|---|---|
SPViT-DeiT-Ti | 70.7 | 90.3 | 4.9M | 1.0G | Model |
SPViT-DeiT-Ti* | 73.2 | 91.4 | 4.9M | 1.0G | Model |
SPViT-DeiT-S | 78.3 | 94.3 | 16.4M | 3.3G | Model |
SPViT-DeiT-S* | 80.3 | 95.1 | 16.4M | 3.3G | Model |
SPViT-DeiT-B | 81.5 | 95.7 | 46.2M | 8.3G | Model |
SPViT-DeiT-B* | 82.4 | 96.1 | 46.2M | 8.3G | Model |
Name | Acc@1 | Acc@5 | # parameters | FLOPs | Model |
---|---|---|---|---|---|
SPViT-Swin-Ti | 80.1 | 94.9 | 26.3M | 3.3G | Model |
SPViT-Swin-Ti* | 81.0 | 95.3 | 26.3M | 3.3G | Model |
SPViT-Swin-S | 82.4 | 96.0 | 39.2M | 6.1G | Model |
SPViT-Swin-S* | 83.0 | 96.4 | 39.2M | 6.1G | Model |
* indicates knowledge distillation.
In this repository, we provide code for pruning two representative ViT models.
- SPViT-DeiT that prunes DeiT. Please see SPViT_DeiT/README.md for details.
- SPViT-Swin that prunes Swin. Please see SPViT_Swin/README.md for details.
If you find our paper useful, please consider cite:
@article{he2021Pruning,
title={Pruning Self-attentions into Convolutional Layers in Single Path},
author={He, Haoyu and Liu, Jing and Pan, Zizheng and Cai, Jianfei and Zhang, Jing and Tao, Dacheng and Zhuang, Bohan},
journal={arXiv preprint arXiv:2111.11802},
year={2021}
}