-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory issue #70
Comments
Okay I think this memory issue probably started after we merged #54. It may be worth considering reverting our pipeline to the old incorrect ordering. |
@dhimmel Thanks for your reply! In the older version of pipeline, |
Previously the grid_search only included the SGDClassifier. Now the grid_search includes the entire pipeline. Therefore, cross validation now refits the feature selection (if used) and standardization (if used) on every training-fold rather than once on the entire X_train. |
Daniel, thank you for your confirmation! This information is very helpful. |
There is a fix suggested here. After implementing the fix, I tried using Isomap to do some dimensionality reduction but my Jupyter Notebook still yielded |
@KT12 So you kept You can also set |
I kept the default The SGDClassifier had On various attempts, it's been mostly the classifier that ran out of memory. The few times I was able to run the classifier, the Investigate the predictions block is what gave me an issue. |
That may speed things up but will only make memory issues worse! |
I just found and experimented a solution for the memory issue in Ubuntu 14.04. For 16.04, a similar solution is also available. Mainly, more swap space can be added via swap file. I added a swap file of 16GB and finished running the latest version of the sample notebook However, it takes ~40 min to finish the training and the highest memory usage is beyond 25GB according to activity monitor. |
I am running my notebook obtained by revising the latest 2.TCGA-MLexample in Ubuntu on my laptop (8GB RAM & 8GB swap). I used over-sampling which increased the size of the training data by about 7%. My machine keeps running into memory problem:
OSError: [Errno 12] Cannot allocate memory
, as well as other exceptionsThere is no problem after I discard
pipeline
. I will use my MacBook (using compressed memory) to run the notebook, but it will be much slower.The text was updated successfully, but these errors were encountered: