-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in Resume training module of 4_efficientdet, getting after completing 5 epoch. #56
Comments
Thank you for pointing out the issue. We will try to resolve it as soon as possible. On your end please check by downgrading pytorch to version 1.4 |
Okay
Best regards,
Tushar Wagh
+91 9890132816
…On Fri, Aug 21, 2020, 22:06 Abhishek Kumar Annamraju < ***@***.***> wrote:
Thank you for pointing out the issue. We will try to resolve it as soon as
possible. On your end please check by downgrading pytorch to version 1.4
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3ISNMZT56G5LASS2CF4WTSB2O7FANCNFSM4QG6UJQQ>
.
|
Did a version downgrade help your case? |
Not tried yet.
Best regards,
Tushar Wagh
+91 9890132816
…On Mon, Aug 24, 2020, 16:09 Abhishek Kumar Annamraju < ***@***.***> wrote:
Did a version downgrade help your case?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3ISNNHW6AWYQSBUBNJPHDSCI7PPANCNFSM4QG6UJQQ>
.
|
We are unable to reproduce that error with pytorch v1.4. Please check and let us know |
Okay. Some time I get error at epoch 5 and sometime at epoch 12.
Best regards,
Tushar Wagh
+91 9890132816
…On Mon, Aug 24, 2020, 16:34 Abhishek Kumar Annamraju < ***@***.***> wrote:
We are unable to reproduce that error with pytorch v1.4. Please check and
let us know
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3ISNKTX6Y55SGBSG6OTLDSCJCMZANCNFSM4QG6UJQQ>
.
|
the error is because onnx is still incompatible with torch 1.6; Hence reducing torch to 1.4 and torchvision 0.5 will resolve the errors. Requirement files have been updated accordingly. |
Thanks.
Best regards,
Tushar Wagh
+91 9890132816
…On Mon, Aug 24, 2020, 21:30 Abhishek Kumar Annamraju < ***@***.***> wrote:
the error is because onnx is still incompatible with torch 1.6; Hence
reducing torch to 1.4 and torchvision 0.5 will resolve the errors.
Requirement files have been updated accordingly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3ISNNZCQOCSNR4VAUGCHTSCKFBHANCNFSM4QG6UJQQ>
.
|
When I use torch 1.4 and torchvision 0.5, I am getting loading annotations into memory...
|
Earlier I was able to reach till epoch 5 or sometimes 13. But now training starts but after a minute I get this ( Not using torch == 1.4 and torchvision == 0.5 as with this training does not start and directly gives above error) 100% /usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
|
Don't mixup versions when resuming training. Keep every training restricted to pytorch version 1.4 and torchvision version 0.5 starting from the very first training itself. Serializing a model trained in version 1.5 or 1.6 may not be possible in version 1.4. |
Please let me know how can I deal with this error ? |
WAY 1: a) Switch to torch==1.4, torchvision==0.5 and efficientnet_pytorch==0.6.3 WAY 2: When you clone the library comment out the line number 393-396 and 400-403 in the file Monk_Object_Detection/4_efficientdet/lib/train_detector.py These lines
and
|
WAY 2, did not work. |
Please share your code. |
Shared. |
The image size is 32? For EfficientNet - b0 image size should be 512. See this example - https://github.com/Tessellate-Imaging/Monk_Object_Detection/blob/master/example_notebooks/4_efficientdet/train%20-%20with%20validation%20dataset.ipynb |
How earlier was working?
…On Thu, Aug 27, 2020, 13:18 Abhishek Kumar Annamraju < ***@***.***> wrote:
The image size is 32? For EfficientNet - b0 image size should be 512
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#56 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AE3ISNJM3M6PNGICJGTDA7DSCYFTZANCNFSM4QG6UJQQ>
.
|
If the image shapes were inconsistent it auto switched to default shapes. Since latest efficientnet_pytorch upgrade requires a manual input of shapes we have made the argument as a required entity and cannot take in inconsistencies. |
Keep image shape as 512 with B0 version and the training engine will scale annotations accordingly. |
''The size of tensor a (2) must match the size of tensor b (8) at non-singleton dimension 3' This error is gone. |
I used way 1 and could successfully train module and also resume training worked fine. Today when I again tried resume training, I got error which attached in text file. |
Since you are using colab make sure the versioning done is correct. And comment out the two lines mentioned in Way 2. |
versioning is as per your colab_requirement.txt, also commenting did not help. |
try to add these in way 2 |
Hello @abhi-kumar I used 786 for B2 but I got the same error. Any suggestion. |
I obtain the same error. It only disappears when I use image_size = 512, regardless of the chosen model version. E.g. image_size = 786 and model version B2 fails, while image_size = 512 and model version B2 works. I tried modifying dummy_input from torch.rand(1, 3, 512, 512) to torch.rand(1, 3, image_size, image_size) in lines 387 and 452 of train_detector.py, but nothing changed. |
Thank you for mentioning the issue. The issue will be taken into consideration very soon (most probably post Christmas). |
@abhi-kumar |
@abhi-kumar Any update for the issue? |
I am using torch 1.6.0 , efficientnet-pytorch-0.6.3, tensorboardX-2.1
This is my code
`from train_detector import Detector
gtf = Detector()
#directs the model towards file structure
root_dir = "./"
coco_dir = "cellphone"
img_dir = "./"
set_dir = "Images"
#smells like some free compute from Colab, nice
gtf.Train_Dataset(root_dir, coco_dir, img_dir, set_dir, batch_size=8, image_size=32, use_gpu=True)
gtf.Model(model_name="efficientnet-b0",load_pretrained_model_from="/content/trained/signatrix_efficientdet_coco.pth")
gtf.Set_Hyperparams(lr=0.0001, val_interval=1, es_min_delta=0.0, es_patience=0)
gtf.Train(num_epochs=50, model_output_dir="trained/");`
My error is
Epoch: 1/50. Iteration: 910/910. Cls loss: 0.12021. Reg loss: 0.26245. Batch loss: 0.38265 Total loss: 0.50293
100% 910/910 [24:24<00:00, 1.58s/it]
/content/Monk_Object_Detection/4_efficientdet/lib/src/model.py:251: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if len(inputs) == 2:
/content/Monk_Object_Detection/4_efficientdet/lib/src/utils.py:84: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
image_shape = np.array(image_shape)
/content/Monk_Object_Detection/4_efficientdet/lib/src/utils.py:96: TracerWarning: torch.from_numpy results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
anchors = torch.from_numpy(all_anchors.astype(np.float32))
/content/Monk_Object_Detection/4_efficientdet/lib/src/model.py:282: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if scores_over_thresh.sum() == 0:
Epoch: 2/50. Iteration: 910/910. Cls loss: 0.17044. Reg loss: 0.19580. Batch loss: 0.36624 Total loss: 0.48137
100% 910/910 [24:31<00:00, 1.57s/it]
Epoch: 3/50. Iteration: 910/910. Cls loss: 0.22575. Reg loss: 0.32424. Batch loss: 0.54999 Total loss: 0.46841
100% 910/910 [24:36<00:00, 1.60s/it]
Epoch: 4/50. Iteration: 910/910. Cls loss: 0.13469. Reg loss: 0.25157. Batch loss: 0.38626 Total loss: 0.45206
100% 910/910 [24:40<00:00, 1.57s/it]
Epoch: 5/50. Iteration: 910/910. Cls loss: 0.24624. Reg loss: 0.34335. Batch loss: 0.58959 Total loss: 0.44057
100% 910/910 [23:59<00:00, 1.54s/it]
Epoch: 6/50. Iteration: 910/910. Cls loss: 0.20909. Reg loss: 0.26789. Batch loss: 0.47698 Total loss: 0.42917
100% 910/910 [23:53<00:00, 1.52s/it]
/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_helper.py:253: UserWarning: You are trying to export the model with onnx:Upsample for ONNX opset version 9. This operator might cause results to not match the expected results by PyTorch.
ONNX's Upsample/Resize operator did not match Pytorch's Interpolation until opset 11. Attributes to determine how to transform the input were added in onnx:Resize in opset 11 to support Pytorch's behavior (like coordinate_transformation_mode and nearest_mode).
We recommend using opset 11 and above for models using this operator.
"" + str(_export_onnx_opset_version) + ". "
RuntimeError Traceback (most recent call last)
in ()
1 gtf.Set_Hyperparams(lr=0.0001, val_interval=1, es_min_delta=0.0, es_patience=0)
----> 2 gtf.Train(num_epochs=50, model_output_dir="trained/");
9 frames
/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_helper.py in _onnx_opset_unsupported(op_name, current_opset, supported_opset)
184 def _onnx_opset_unsupported(op_name, current_opset, supported_opset):
185 raise RuntimeError('Unsupported: ONNX export of {} in '
--> 186 'opset {}. Please try opset version {}.'.format(op_name, current_opset, supported_opset))
187
188
RuntimeError: Unsupported: ONNX export of index_put in opset 9. Please try opset version 11.
The text was updated successfully, but these errors were encountered: