-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad performance on CIFAR using on low bit width #3
Comments
Thanks for your interests in our work. Could you please try to use [3, 4, 5] to see if there is still this issue? Also, what is the performance of [2]? |
Hi @Ahmad-Jarrar , sorry for this, the quantization scheme proposed in the paper does not converge for low bits, and some modification is necessary. I remembered I posted this... For proper convergence, it should be better to have vanishing mean for weights, besides proper variance requirements. This will guarantee centered distribution for weights. The code is something like this:
inside the You could also try this for activation quantization (without applying the outermost remapping 2x-1), but I did not try this before. I will update the code and readme accordingly. Best. |
Hi @Ahmad-Jarrar , I have updated the readme. Hope it is clear. Thanks again for your interest in our work. |
If I'm not wrong, the code given does not apply the outermost 2x-1. |
https://github.com/deJQK/AdaBits/blob/master/models/quant_ops.py#L142-L143 |
Yes I noticed it later. Thank you so much for your help. |
Hello @deJQK , I can't understand the meaning that "For proper convergence, it should be better to have vanishing mean for weights, besides proper variance requirements.". Why is "For proper convergence, it should be better to vanishing mean for weights"? Could you give me a specific explanation? Additionally, this formula isn't match code about: Looking forward to your reply, thank you. |
Hi @haiduo , you could check these papers: https://arxiv.org/pdf/1502.01852.pdf, https://arxiv.org/pdf/1606.05340.pdf, https://arxiv.org/pdf/1611.01232.pdf, all of which analyze training dynamics for centered weight. I am not sure how to analyze weights with nonzero mean. |
Thank you for your reply! @deJQK , So "vanishing mean for weights" just added 0.5 after |
Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32, ..., 31/32}, to {-15/16, -13/16, ..., 13/16, 15/16}. Code for all four schemes is available in the repo and you could check the related lines. |
ok,Thank you! |
Hi @deJQK , Sorry, one more question, I need you to answer two of my questions about:
|
@haiduo, yes for both. |
I am trying to run your experiments on CIFAR10 as described in the q_resnet_uint8_train_val.yml . However i am getting poor performance on lower bit widths. I have tried with several tweaks to the config file. The result of the latest experiment is:
I have used these parameters:
Kindly let me know how can I improve the results and what am I doing wrong.
The text was updated successfully, but these errors were encountered: