-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update environment and notebook 2 with dask-searchcv #104
Conversation
I agree that Regarding the
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any indication how dask-searchcv changed the runtime on your machine?
Here are the results with dask-searchcv:
Fitting CV for model: full
runtime: 0:03:09.570861
Fitting CV for model: expressions
runtime: 0:03:14.775702
Fitting CV for model: covariates
runtime: 0:00:27.543525
Also github not showing diff for 2.mutation-classifier.ipynb
. Assuming an image changed causing large diff. Any idea what changed from a local git diff?
environment.yml
Outdated
- conda-forge::vega=0.4.4 | ||
- pip: | ||
- neo4j-driver==1.3.1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be a newline at the end of file, but no spaces on the newline
environment.yml
Outdated
@@ -12,6 +12,8 @@ dependencies: | |||
- anaconda::setuptools=27.2.0 | |||
- anaconda::statsmodels=0.8.0 | |||
- conda-forge::altair=1.2.0 | |||
- conda-forge::dask-searchcv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also specify a dask channel and version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a version number for dask-searchcv would be good. Are you also thinking we would explicitly import dask? So it would be:
conda-forge::dask=0.14.3
conda-forge::dask-searchcv=0.0.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly add dask-searchcv
and dask
to environment.yml both with versions. As far as importing dask in the notebook, only if it's necessary. Also consider using anaconda dask unless that doesn't work. Currently, we're using the anaconda channel over conda-forge in terms of priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added version numbers to both dask
and dask-searchcv
and used anaconda for dask
. Everything works fine.
dask-searchcv was about twice as fast. Average of three runs without dask-searchcv = 14.5 minutes; average of three runs with dask-searchcv = 6.5 minutes.
Is this what you're looking for?
|
Yeah. Just make sure all changes in there make sense. |
Nice, hopefully we also reduced memory footprint. Was this with only 1 core? In production, we could parallelize. |
Both changes make sense to me: The first change was importing |
Yep, only 1 core. It will be cool to see how fast this runs in production! |
Here's the very simple PR to add the dask-searchCV to the environment and dask-searchCV's implementation of
GridSearchCV
to notebook 2.A few notes:
GridSearchCV
about 3 times faster. The wall times forGridSearchCV
in this PR are similar to those from @dhimmel 's notebook 2 times with the SK-learnGridSearchCV
not because dask-searchCV doesn't speed things up but because my computer is slow. :(GridSearchCV
... I debated renaming it to include "dask" or "dask-searchCV" (by usingfrom dask_searchcv import GridSearchCV as dask-GridSearchCV
) as this would make it easier for someone to see that it wasn't the SK-learn implementation but I thought there was potentials for confusion either way so I would just keep it the same.I'm not sure what caused this. I searched the dask-searchcv github page for 'grid_scores_' and didn't see anything. If this becomes an issue in the future we can look into it further or revert that specific case to SK-learn's
GridSearchCV
.Closes #94