Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel trainings for faster training #1

Open
Laurae2 opened this issue Sep 1, 2017 · 1 comment
Open

Parallel trainings for faster training #1

Laurae2 opened this issue Sep 1, 2017 · 1 comment

Comments

@Laurae2
Copy link

Laurae2 commented Sep 1, 2017

If you have enough RAM, it is always better to run parallel trainings with a low amount of training threads instead of parallelizing each training.

Example on a 20c/40t server, approximate theoretical time:

Mode Threads (train) Threads (parallel) Time for 1 model Time for 40 models (passes)
Demo Single 1 1 500 20000 (40)
Demo Dual 2 1 300 12000 (40)
Demo Multi 20 1 100 4000 (40)
Demo Multi + Hyperthreading 40 1 70 2600 (40)
Parallel Single 1 20 (RAM Single x20) 500 1000 (2)
Parallel Single + Hyperthreading 1 40 (RAM Single x40) 500 700 (1)
Parallel Dual 2 10 (RAM Dual x10) 300 1200 (4)
Parallel Dual + Hyperthreading 2 20 (RAM Dual x20) 300 840 (1)

With hyperthreading, timings decrease by about 30% (in theory - in reality it is about 15-25% due to overhead). Parallel versions only have the overhead of merging results together (and copying data, if not forking), which is nearly non-existent (use a parallel lapply and not a parallel for to remove most of the overhead).

Also, it will allow to skip the negative efficiency issue you may have.

@szilard
Copy link
Owner

szilard commented Sep 1, 2017

Yes, absolutely, that's what I recommend people too.

For everyone interested (I know @Laurae2 you've seen this already):

screen shot 2017-09-01 at 11 19 19 am

https://github.com/szilard/GBM-multicore

and also my KDD invited talk:

https://speakerdeck.com/szilard/machine-learning-software-in-practice-quo-vadis-invited-talk-kdd-conference-applied-data-science-track-august-2017-halifax-canada

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants