Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the pretrained model #5

Open
tongtyr opened this issue Jun 12, 2019 · 5 comments
Open

About the pretrained model #5

tongtyr opened this issue Jun 12, 2019 · 5 comments

Comments

@tongtyr
Copy link

tongtyr commented Jun 12, 2019

Hello,I am confused of the training epoch, in your code,you set the epochs as 4, if I want to quantize resnet18, do I need to change it? and do you have the quantized model of resnet18 of other bit width except 5? Thank you!

@Mxbonn
Copy link
Owner

Mxbonn commented Jun 12, 2019

Training epochs can be a bit confusing, I agree. In Incremental Network Quantization you have 2 kind of iterations. The first one is the amount of times you do a new weight partitioning where you determine which weights get fixed and which will still be trained. And then you have the amount of training loops you do within each of these quantization iterations. With epochs in the code I mean the latter, the first one is defined by 'iterative_steps': [0.5, 0.75, 0.875, 1],.

The only thing you have to change in the code should be the path to imagenet, the remainder should be the setup to quantize resnet18.
I did not train on other bit widths than 5 bits with this setup.

@tongyutyr
Copy link

I changed nothing about the code except for the data path,but I found that when I began to quantize the model,the loss was big. Is that correct? the environment of mine is python3.6 and pytorch 1.0.1.

=> using pre-trained model 'resnet18'
Test: [0/196] Time 7.270 (7.270) Loss 0.6744 (0.6744) Acc@1 80.078 (80.078) Acc@5 96.094 (96.094)
Test: [10/196] Time 0.076 (0.728) Loss 1.1976 (0.8823) Acc@1 67.969 (77.592) Acc@5 90.234 (92.898)
Test: [20/196] Time 0.088 (0.509) Loss 0.8896 (0.9078) Acc@1 80.469 (76.860) Acc@5 91.016 (92.615)
Test: [30/196] Time 0.076 (0.443) Loss 0.9277 (0.8707) Acc@1 77.344 (77.923) Acc@5 92.969 (92.868)
Test: [40/196] Time 0.072 (0.414) Loss 0.8867 (0.9135) Acc@1 75.391 (76.229) Acc@5 95.703 (93.035)
Test: [50/196] Time 0.072 (0.391) Loss 0.6245 (0.9081) Acc@1 83.984 (75.973) Acc@5 95.703 (93.275)
Test: [60/196] Time 0.071 (0.382) Loss 1.1484 (0.9200) Acc@1 71.875 (75.666) Acc@5 94.141 (93.411)
Test: [70/196] Time 0.072 (0.394) Loss 0.8141 (0.9027) Acc@1 79.297 (76.276) Acc@5 94.141 (93.563)
Test: [80/196] Time 0.073 (0.386) Loss 1.5958 (0.9238) Acc@1 64.453 (76.013) Acc@5 85.547 (93.277)
Test: [90/196] Time 0.072 (0.367) Loss 2.2526 (0.9903) Acc@1 49.219 (74.704) Acc@5 79.688 (92.449)
Test: [100/196] Time 1.257 (0.368) Loss 1.6807 (1.0526) Acc@1 57.031 (73.434) Acc@5 84.766 (91.646)
Test: [110/196] Time 0.078 (0.366) Loss 1.1636 (1.0807) Acc@1 71.875 (72.934) Acc@5 89.844 (91.248)
Test: [120/196] Time 0.076 (0.363) Loss 1.8209 (1.1061) Acc@1 58.594 (72.569) Acc@5 79.297 (90.825)
Test: [130/196] Time 0.540 (0.356) Loss 0.9303 (1.1458) Acc@1 76.562 (71.678) Acc@5 94.141 (90.392)
Test: [140/196] Time 0.735 (0.355) Loss 1.3774 (1.1680) Acc@1 65.234 (71.271) Acc@5 85.938 (90.129)
Test: [150/196] Time 0.072 (0.351) Loss 1.3500 (1.1946) Acc@1 73.438 (70.791) Acc@5 85.938 (89.727)
Test: [160/196] Time 0.072 (0.350) Loss 1.0733 (1.2141) Acc@1 76.953 (70.477) Acc@5 90.625 (89.490)
Test: [170/196] Time 0.083 (0.347) Loss 0.8934 (1.2377) Acc@1 77.344 (69.970) Acc@5 91.406 (89.163)
Test: [180/196] Time 0.072 (0.345) Loss 1.4582 (1.2560) Acc@1 62.109 (69.615) Acc@5 89.453 (88.946)
Test: [190/196] Time 0.073 (0.344) Loss 1.3996 (1.2547) Acc@1 63.281 (69.582) Acc@5 91.797 (88.993)

  • Acc@1 69.758 Acc@5 89.078
    Epoch: [0][0/4985] Time 16.063 (16.063) Data 15.005 (15.005) Loss 8.6419 (8.6419) Acc@1 17.969 (17.969) Acc@5 28.125 (28.125)
    Epoch: [0][10/4985] Time 0.202 (2.764) Data 0.000 (2.504) Loss 7.5821 (8.2289) Acc@1 22.656 (20.455) Acc@5 31.250 (29.261)
    Epoch: [0][20/4985] Time 0.193 (2.266) Data 0.000 (2.055) Loss 6.9952 (8.0044) Acc@1 16.797 (19.159) Acc@5 28.125 (28.144)
    Epoch: [0][30/4985] Time 0.205 (2.162) Data 0.000 (1.972) Loss 6.8403 (7.6641) Acc@1 12.500 (17.944) Acc@5 25.781 (27.092)
    Epoch: [0][40/4985] Time 0.196 (2.054) Data 0.000 (1.875) Loss 6.1470 (7.3678) Acc@1 16.016 (16.949) Acc@5 26.953 (26.296)
    Epoch: [0][50/4985] Time 0.216 (1.977) Data 0.000 (1.800) Loss 6.2507 (7.1539) Acc@1 10.938 (16.176) Acc@5 20.703 (25.705)
    Epoch: [0][60/4985] Time 0.211 (2.198) Data 0.000 (2.025) Loss 6.3317 (7.0041) Acc@1 13.672 (15.811) Acc@5 22.656 (25.307)
    Epoch: [0][70/4985] Time 0.199 (2.095) Data 0.000 (1.920) Loss 6.0656 (6.8798) Acc@1 14.453 (15.454) Acc@5 21.875 (24.983)
    Epoch: [0][80/4985] Time 0.194 (1.997) Data 0.000 (1.822) Loss 6.3854 (6.7550) Acc@1 11.719 (15.451) Acc@5 21.094 (25.125)

@Mxbonn
Copy link
Owner

Mxbonn commented Jun 27, 2019

Hey, I just reran the example file in a docker container, only modifying the datapath.

My output looks like this:

2019-06-27T08:34:25.806910467Z Test: [0/196]	Time 19.438 (19.438)	Loss 0.6744 (0.6744)	Acc@1 80.078 (80.078)	Acc@5 96.094 (96.094)
2019-06-27T08:34:25.806964309Z Test: [10/196]	Time 0.037 (1.797)	Loss 1.1976 (0.8823)	Acc@1 67.969 (77.592)	Acc@5 90.234 (92.898)
2019-06-27T08:34:25.806970136Z Test: [20/196]	Time 0.031 (0.964)	Loss 0.8896 (0.9078)	Acc@1 80.469 (76.860)	Acc@5 91.016 (92.615)
2019-06-27T08:34:25.806976041Z Test: [30/196]	Time 0.045 (0.666)	Loss 0.9277 (0.8707)	Acc@1 77.344 (77.923)	Acc@5 92.969 (92.868)
2019-06-27T08:34:25.806980598Z Test: [40/196]	Time 0.103 (0.551)	Loss 0.8867 (0.9135)	Acc@1 75.391 (76.229)	Acc@5 95.703 (93.035)
2019-06-27T08:34:25.806985306Z Test: [50/196]	Time 0.041 (0.452)	Loss 0.6245 (0.9081)	Acc@1 83.984 (75.973)	Acc@5 95.703 (93.275)
2019-06-27T08:34:25.806989953Z Test: [60/196]	Time 0.120 (0.424)	Loss 1.1484 (0.9200)	Acc@1 71.875 (75.666)	Acc@5 94.141 (93.411)
2019-06-27T08:34:25.806994431Z Test: [70/196]	Time 0.049 (0.378)	Loss 0.8141 (0.9027)	Acc@1 79.297 (76.276)	Acc@5 94.141 (93.563)
2019-06-27T08:34:25.806998700Z Test: [80/196]	Time 0.088 (0.361)	Loss 1.5958 (0.9238)	Acc@1 64.453 (76.013)	Acc@5 85.547 (93.277)
2019-06-27T08:34:25.807003230Z Test: [90/196]	Time 0.057 (0.331)	Loss 2.2526 (0.9903)	Acc@1 49.219 (74.704)	Acc@5 79.688 (92.449)
2019-06-27T08:34:25.807008141Z Test: [100/196]	Time 0.106 (0.320)	Loss 1.6807 (1.0526)	Acc@1 57.031 (73.434)	Acc@5 84.766 (91.646)
2019-06-27T08:34:25.807013085Z Test: [110/196]	Time 1.150 (0.313)	Loss 1.1636 (1.0807)	Acc@1 71.875 (72.934)	Acc@5 89.844 (91.248)
2019-06-27T08:34:25.807018346Z Test: [120/196]	Time 0.088 (0.298)	Loss 1.8209 (1.1061)	Acc@1 58.594 (72.569)	Acc@5 79.297 (90.825)
2019-06-27T08:34:25.807023922Z Test: [130/196]	Time 0.072 (0.295)	Loss 0.9303 (1.1458)	Acc@1 76.562 (71.678)	Acc@5 94.141 (90.392)
2019-06-27T08:34:25.807028979Z Test: [140/196]	Time 0.044 (0.279)	Loss 1.3774 (1.1680)	Acc@1 65.234 (71.271)	Acc@5 85.938 (90.129)
2019-06-27T08:34:25.807034751Z Test: [150/196]	Time 0.072 (0.274)	Loss 1.3500 (1.1946)	Acc@1 73.438 (70.791)	Acc@5 85.938 (89.727)
2019-06-27T08:34:25.807056351Z Test: [160/196]	Time 0.090 (0.262)	Loss 1.0733 (1.2141)	Acc@1 76.953 (70.477)	Acc@5 90.625 (89.490)
2019-06-27T08:34:25.807061511Z Test: [170/196]	Time 0.104 (0.262)	Loss 0.8934 (1.2377)	Acc@1 77.344 (69.970)	Acc@5 91.406 (89.163)
2019-06-27T08:34:25.807066914Z Test: [180/196]	Time 0.033 (0.250)	Loss 1.4582 (1.2560)	Acc@1 62.109 (69.615)	Acc@5 89.453 (88.946)
2019-06-27T08:34:25.807071710Z Test: [190/196]	Time 0.032 (0.246)	Loss 1.3996 (1.2547)	Acc@1 63.281 (69.582)	Acc@5 91.797 (88.993)
2019-06-27T08:34:25.807075897Z  * Acc@1 69.758 Acc@5 89.078
2019-06-27T08:36:10.019742694Z Epoch: [0][0/5005]	Time 6.768 (6.768)	Data 3.291 (3.291)	Loss 1.5907 (1.5907)	Acc@1 61.719 (61.719)	Acc@5 85.156 (85.156)
2019-06-27T08:36:10.019795449Z Epoch: [0][10/5005]	Time 0.064 (0.679)	Data 0.000 (0.300)	Loss 1.4104 (1.6633)	Acc@1 62.891 (60.440)	Acc@5 85.547 (83.310)
2019-06-27T08:36:10.019802350Z Epoch: [0][20/5005]	Time 0.079 (0.391)	Data 0.000 (0.158)	Loss 1.4512 (1.6237)	Acc@1 66.016 (61.979)	Acc@5 85.547 (83.743)
2019-06-27T08:36:10.019806941Z Epoch: [0][30/5005]	Time 0.105 (0.306)	Data 0.000 (0.107)	Loss 1.3594 (1.5989)	Acc@1 69.141 (62.462)	Acc@5 87.500 (83.984)
2019-06-27T08:36:10.019811663Z Epoch: [0][40/5005]	Time 0.125 (0.265)	Data 0.005 (0.082)	Loss 1.4409 (1.5905)	Acc@1 66.406 (62.367)	Acc@5 86.328 (84.261)
2019-06-27T08:36:10.019816956Z Epoch: [0][50/5005]	Time 0.080 (0.240)	Data 0.000 (0.067)	Loss 1.4476 (1.5773)	Acc@1 64.844 (62.661)	Acc@5 85.547 (84.383)
2019-06-27T08:36:10.019821265Z Epoch: [0][60/5005]	Time 0.252 (0.222)	Data 0.084 (0.058)	Loss 1.4737 (1.5590)	Acc@1 65.234 (63.025)	Acc@5 86.328 (84.548)
2019-06-27T08:36:10.019825678Z Epoch: [0][70/5005]	Time 0.089 (0.210)	Data 0.000 (0.050)	Loss 1.6264 (1.5505)	Acc@1 63.672 (63.248)	Acc@5 84.375 (84.617)
2019-06-27T08:36:10.019829956Z Epoch: [0][80/5005]	Time 0.103 (0.203)	Data 0.000 (0.048)	Loss 1.3459 (1.5431)	Acc@1 66.797 (63.600)	Acc@5 89.453 (84.727)
2019-06-27T08:36:10.019834473Z Epoch: [0][90/5005]	Time 0.106 (0.193)	Data 0.000 (0.043)	Loss 1.6379 (1.5359)	Acc@1 62.109 (63.805)	Acc@5 81.641 (84.779)

So the loss should be way lower and the accuracy higher. I also noticed you have less epochs. Are you sure you have the correct dataset? (The fact that your validation error is correct makes me think that you do have the right dataset however). On how many GPUs are you running the code (not that it should matter).

-- In case someone else stumbles upon this issue and ran the code, could you let me know if everything works fine?

@xysun
Copy link

xysun commented Jul 15, 2019

I find this line a bit weird: the quantization_scheduler.step() should be called outside of the retraining epoch loops I believe? i.e. step while we're advancing iterative_steps only

@Mxbonn
Copy link
Owner

Mxbonn commented Jul 29, 2019

I find this line a bit weird: the quantization_scheduler.step() should be called outside of the retraining epoch loops I believe? i.e. step while we're advancing iterative_steps only

Following up on this in #8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants