How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

maryam089 · 2018-01-31T04:25:41Z

I have alexnet model which is trained on 100K images now i want to update this model by adding few thousand more images to it. But when i tried to load the model and start training, it gives me following error.... any help ????

/home/maryam/torch/install/bin/lua: /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: check that you are sharing parameters and gradParameters
stack traceback:
[C]: in function 'assert'
/home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: in function 'getParameters'
train.lua:270: in main chunk
[C]: in function 'dofile'
...ryam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?

kysunami · 2020-11-13T09:14:39Z

Check this out https://debuggercafe.com/effective-model-saving-and-resuming-training-in-pytorch/
You can first save the checkpoint and reload when your want to resume training. Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

maryam089 commented Jan 31, 2018

kysunami commented Nov 13, 2020

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

Comments

maryam089 commented Jan 31, 2018

kysunami commented Nov 13, 2020