Name: Praditya Raudi Avinanto and Rifqi Luthfan UNI: pra2118 and rl3154
This repository contains the final project for COMS 6998 Practical Deep Learning System Performance
Quantization is one of the techniques to reduce model size and computational complexity which can then be implemented in edge devices (Mobile Phones, IoT devices). However, PyTorch and Tensorflow supports only 8-bit integer quantization currently.
In this project, we explore converting a 32-bit float neural network (NN) model into a precision lower than 8-bit integer NN model.
- We experimented using 8,7,6,5,4 bits quantization for two models (ResNet-18 and ResNet-50) for two datasets (CIFAR10 and ImageNette)
- We experimented both Post Training Quantization and Quantization Aware Training
We found the different effect of each model responding to different bitwidth quantization
For the web server part, refer to these repo: Front End Module and Back End Quantization Project.
This repository contains Resnet Quantization implentation in Pytorch, and currently supports Post Training Quantization (PTQ), and Quantization Aware Training (QAT).
- Launch Deep Learning VM in Google Cloud Platform Deploy VM Image
- The Hardware that we used in this project is NVIDIA Tesla V100 GPU with 8 vCPU 30gb RAM (n1-standard-8 GCP)
- Install additional libraries
pip install -r requirements.txt
The code used for this project is contained in the following notebooks and python files:
We implemented custom Convolutional, Batch Normalization, Relu, Linear, and Addition Layers differently for PTQ and QAT. All PTQ weights & activation outputs are already in integers while QAT has FakeQuant and QParam. PTQ Modules also contains quantization model conversion that can convert the weights and activations from a saved PyTorch model.
Under quantization_functions
folder, we have:
QAT Modules:
-
quant_aware_layers.py
- Contains custom layers: CONV, BN, RELU, LINEAR, ADD
- Utilizes PyTorch module register buffer in
QParam
to keep quantization parameters FakeQuantize
Function is used to simulate quantization loss during trainingfold_bn
is used to replace Convolution + Batch Normalization layers by only one Convolution with different weights- Here, the parameters are still in float32
-
quant_aware_resnet_model.py
- Contains Basic and Bottleneck blocks of Resnet model utilizing custom layers created
- Resnet-18, Resnet-34, Resnet-50, Resnet-101, Resnet-152 is implemented, but we only compared Resnet-18 and Resnet-50 in our experiments
PTQ Modules:
-
post_training_quant_layers.py
- Contains custom layers: CONV, BN, RELU, MAXPOOL, AVG_POOL, LINEAR, ADD
- Here, the parameters are already in integers
- Have a method that can convert existing weights to quantized model
-
post_training_quant_model.py
- Utilizes the PTQ layers to create quantized Resnet model
- Parameters are also already in integers
- Also have a method that can convert existing weights to quantized model
Utils:
- train_loop.py -> custom training loop and validation loop utilizing
tqdm
library - generate_onnx.py -> generate ONNX model from PyTorch model
Example of model and quantization functions usage in notebook:
Under the main folder, we have several notebooks for the experiments that we do:
For experiments using different dataset and different model, we use different notebooks and named it based on {dataset}-{model}.ipynb naming:
- CIFAR10-ResNet18.ipynb
- CIFAR10-ResNet50.ipynb
- ImageNette-ResNet18.ipynb
- ImageNette-ResNet50.ipynb
In each of those 4 notebooks, we trained full precision 32 bit, Post Training Quantization 8,7,6,5,4 bit, and Quantization Aware Training 8,7,6,5,4 bit.
And lastly, we have the compiled analysis for comparing metrics & charts in:
- Analysis-of-results.ipynb
python -m quantization_functions.generate_onnx.py
- Clone the repo
- Download the ONNX models and keep it under
pytorch-quantization/checkpoint/onnx/imagenette/
sudo docker pull mcr.microsoft.com/onnxruntime/server
sudo docker run -it -v $(pwd):$(pwd) -p 9001:8001 mcr.microsoft.com/onnxruntime/server --model_path $(pwd)/pytorch-quantization/checkpoint/onnx/imagenette/resnet50_4bit.onnx
- You can choose the model by changing the
--model_path
argument above.