Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to resume training from a certain epoch or how to update an already trained model in torch ? #1305

Open
maryam089 opened this issue Jan 31, 2018 · 1 comment

Comments

@maryam089
Copy link

I have alexnet model which is trained on 100K images now i want to update this model by adding few thousand more images to it. But when i tried to load the model and start training, it gives me following error.... any help ????

/home/maryam/torch/install/bin/lua: /home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: check that you are sharing parameters and gradParameters
stack traceback:
[C]: in function 'assert'
/home/maryam/torch/install/share/lua/5.2/nn/Module.lua:327: in function 'getParameters'
train.lua:270: in main chunk
[C]: in function 'dofile'
...ryam/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: in ?

@kysunami
Copy link

Check this out https://debuggercafe.com/effective-model-saving-and-resuming-training-in-pytorch/
You can first save the checkpoint and reload when your want to resume training. Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants