This is a PyTorch implementation of ICCV 2023 paper Randomized Quantization for Data Agnostic Representation Learning. This paper introduces a self-supervised augmentation tool for data agnostic representation learning, by quantizing each input channel through a non-uniform quantizer, with the quantized value sampled randomly within randomly generated quantization bins. Applying the randomized quantization in conjunction with sequential augmentations on self-supervised contrastive models achieves on par results with modality-specific augmentation on vision tasks, and state-of-the-art results on 3D point clouds as well as on audio. We also demonstrate this method to be applicable for augmenting intermediate embeddings in a deep neural network on the comprehensive DABS benchmark which is comprised of various data modalities.
Pretrained checkpoints on ImageNet under moco-v3
Augmentations | Pre-trained checkpoints | Linear probe |
---|---|---|
Randomized Quantization (100 epochs) | model | 42.9 |
RRC + Randomized Quantization (100 epochs) | model | 67.9 |
RRC + Randomized Quantization (300 epochs) | model | 71.6 |
RRC + Randomized Quantization (800 epochs) | model | 72.1 |
We largely follow the experimental settings of BYOL-A and treat it as our baseline. We replace the Mixup augmentation used in BYOL-A with our randomized quantization. The network is trained on Audioset for 100 epoches. On six downstream audio classification datasets, including NSynth (NS), UrbanSound8K (US8K), VoxCeleb1 (VC1), VoxForge (VF), Speech Commands V2 (SPCV2/12), Speech Commands V2 (SPCV2) , linear probing results are reported as below:
Method | Augmentations | NS | US8K | VC1 | VF | SPCV2/12 | SPCV2 | Average |
---|---|---|---|---|---|---|---|---|
BYOL-A | RRC + Mixup | 74.1 | 79.1 | 40.1 | 90.2 | 91.0 | 92.2 | 77.8 |
Our model | RRC + Randomized Quantization | 74.2 | 78.0 | 45.7 | 92.6 | 95.1 | 92.1 | 79.6 |
The code has been tested with PyTorch 1.10.0, CUDA 11.3 and CuDNN 8.2.0. You are recommended to work with this docker image. Bellow are use cases based on moco-v3 with minimal effort that allow people having an interest to immediately inject our augmentation into their own project.
- Call the augmentation as one of torchvision.transforms modules.
region_num = 8
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262-L285
augmentation1 = [
transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
RandomizedQuantizationAugModule(region_num, transforms_like=True),
transforms.ToTensor()
]
augmentation2 = [
transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
RandomizedQuantizationAugModule(region_num, transforms_like=True),
transforms.ToTensor()
]
- Apply randomly our augmentation with a given probability.
region_num = 8
p_random_apply1, p_random_apply2 = 0.5, 0.5
#https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/main_moco.py#L262
augmentation1 = [
transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply1),
transforms.ToTensor()
]
augmentation2 = [
transforms.RandomResizedCrop(224, scale=(args.crop_min, 1.)),
RandomizedQuantizationAugModule(region_num, p_random_apply_rand_quant=p_random_apply2),
transforms.ToTensor()
]
- Call the augmentation in forward(). This is faster than above two usages since the augmentation is deployed on GPUs.
# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L35
region_num = 8
self.rand_quant_layer = RandomizedQuantizationAugModule(region_num)
# https://github.com/facebookresearch/moco-v3/blob/c349e6e24f40d3fedb22d973f92defa4cedf37a7/moco/builder.py#L86-L94
q1 = self.predictor(self.base_encoder(self.rand_quant_layer(x1)))
q2 = self.predictor(self.base_encoder(self.rand_quant_layer(x2)))
with torch.no_grad(): # no gradient
self._update_momentum_encoder(m) # update the momentum encoder
# compute momentum features as targets
k1 = self.momentum_encoder(self.rand_quant_layer(x1))
k2 = self.momentum_encoder(self.rand_quant_layer(x2))
@inproceedings{wu2023randomized,
title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning},
author={Wu, Huimin and Lei, Chenyang and Sun, Xiao and Wang, Peng-Shuai and Chen, Qifeng and Cheng, Kwang-Ting and Lin, Stephen and Wu, Zhirong},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={16305--16316},
year={2023}
}
@Article{wu2023randomized,
author={Huimin Wu and Chenyang Lei and Xiao Sun and Peng-Shuai Wang and Qifeng Chen and Kwang-Ting Cheng and Stephen Lin and Zhirong Wu},
journal = {arXiv:2212.08663},
title={Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning},
year={2023},
}
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.