Training speed #48

Lucien66 · 2022-08-25T13:05:11Z

NICE WORK！
I train classSR structure to process the denoising task. the branchs adopt small network, which calculation cost is only about 5 to 16 GMAC.
There are two problems that bother me：
First, the training speed is too slow. I use two 3090 cards to train models and it takes 12 minutes to run 100iter. However, when I increased to four 3090 cards, the speed did not increase, and the utilization rate of each GPU was very low. Could you please give me some suggestions？
Second, because the training is too slow, I used .pth at 30000iter for testing, and the classification is uneven. The image results have a strong sense of demarcation, which is mainly because different patches take different branches.Will this phenomenon be improved in the next training？

Xiangtaokong · 2022-08-27T08:35:28Z

Thanks for your attention.
1.
If adapting ClassSR to the denoise task, you may need to take care of the training patch size. SR models process images in a low resolution and upsample them at the end of a network, but denoise models process images in the same resolution as GT. It brings much computational consumption.
2.
Firstly, you need well-trained branches, which all can process the corresponding input well. I guess this phenomenon is caused by the branches is not good enough for the corresponding inputs (check the branches performance) or the missing match of branches and inputs (100k+ iters training of classifier should be enough).

Lucien66 · 2022-08-27T09:53:02Z

Thank you very much for your detailed reply！

If I reduce the patch size, will it affect the result performance？
In fact, I have pre trained the branch model, and their PSNR values are 39.3/39.7/39.9 respectively. However, the PSNR of the 30000 iter testing is only 38.3. I don't understand this phenomenon. Should I check my code？

Xiangtaokong · 2022-08-27T14:49:17Z

It will bring a little performance drop.
It is normal. You test the branches in sub-images and evaluate the whole ClassSR model on another complete image, this operation causes the difference. You can compare (testing them on same test images) the ClassSR model and the most complex branches (trained with all data) to observe the performance.

Lucien66 · 2022-08-29T01:40:37Z

I will do as you suggest! Thank you very much for your reply！

Lucien66 closed this as completed Aug 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training speed #48

Training speed #48

Lucien66 commented Aug 25, 2022

Xiangtaokong commented Aug 27, 2022 •

edited

Loading

Lucien66 commented Aug 27, 2022

Xiangtaokong commented Aug 27, 2022

Lucien66 commented Aug 29, 2022

Training speed #48

Training speed #48

Comments

Lucien66 commented Aug 25, 2022

Xiangtaokong commented Aug 27, 2022 • edited Loading

Lucien66 commented Aug 27, 2022

Xiangtaokong commented Aug 27, 2022

Lucien66 commented Aug 29, 2022

Xiangtaokong commented Aug 27, 2022 •

edited

Loading