Memory issue #70

htcai · 2016-11-19T02:01:58Z

I am running my notebook obtained by revising the latest 2.TCGA-MLexample in Ubuntu on my laptop (8GB RAM & 8GB swap). I used over-sampling which increased the size of the training data by about 7%. My machine keeps running into memory problem: OSError: [Errno 12] Cannot allocate memory, as well as other exceptions

There is no problem after I discard pipeline. I will use my MacBook (using compressed memory) to run the notebook, but it will be much slower.

The text was updated successfully, but these errors were encountered:

dhimmel · 2016-12-09T00:04:40Z

Okay I think this memory issue probably started after we merged #54. It may be worth considering reverting our pipeline to the old incorrect ordering.

htcai · 2016-12-09T00:22:49Z

@dhimmel Thanks for your reply! In the older version of pipeline, k is fixed in the pipeline while we feed a singleton list (e.g., [2000]) to the pipeline in the current version. Does this lead to the difference that GridSearchCV will fit SelectKBest only once in the former case while it will be fitted for each training fold in the latter?

dhimmel · 2016-12-09T00:53:52Z

Does this lead to the difference that GridSearchCV will run SelectKBest only once in the former case while it will be run for each training fold in the latter?

Previously the grid_search only included the SGDClassifier. Now the grid_search includes the entire pipeline. Therefore, cross validation now refits the feature selection (if used) and standardization (if used) on every training-fold rather than once on the entire X_train.

htcai · 2016-12-09T01:44:07Z

Daniel, thank you for your confirmation! This information is very helpful.

KT12 · 2017-01-16T02:33:56Z

There is a fix suggested here.

After implementing the fix, I tried using Isomap to do some dimensionality reduction but my Jupyter Notebook still yielded OSError: [Errno 12] Cannot allocate memory.

dhimmel · 2017-01-16T18:33:08Z

@KT12 So you kept n_jobs=1 in sklearn.manifold.Isomap? It's possible that even with only one job running Isomap, you could run out of memory.

You can also set n_jobs=1 in GridSearchCV, not sure exactly what stage is causing you to run out of memory. See #43 (comment) for more information on memory usage at different stages of our pipeline.

KT12 · 2017-01-16T19:48:00Z

I kept the default n_jobs=1 in Isomap. I'll try it again using all cores.

The SGDClassifier had n_jobs=-1.

On various attempts, it's been mostly the classifier that ran out of memory. The few times I was able to run the classifier, the Investigate the predictions block is what gave me an issue.

dhimmel · 2017-01-16T19:57:18Z

I kept the default n_jobs=1 in Isomap. I'll try it again using all cores.

That may speed things up but will only make memory issues worse!

htcai · 2017-01-16T22:16:38Z

I just found and experimented a solution for the memory issue in Ubuntu 14.04. For 16.04, a similar solution is also available. Mainly, more swap space can be added via swap file. I added a swap file of 16GB and finished running the latest version of the sample notebook 2.TCGA-MLexample.ipynb for the first time. I will restore the usage of pipeline in my own notebook.

However, it takes ~40 min to finish the training and the highest memory usage is beyond 25GB according to activity monitor.

Closes #94 Refs #70

dhimmel mentioned this issue Dec 20, 2016

Improved the performance of the SGD classifier on sparse mutations by reducing the noise #71

Merged

jruhym mentioned this issue Mar 2, 2017

Thrashing in 2.TCGA-MLexample #86

Closed

dhimmel mentioned this issue May 26, 2017

Evaluate dask-searchcv to speed up GridSearchCV #94

Closed

dhimmel pushed a commit that referenced this issue Jun 23, 2017

Use dask-searchcv in 2.mutation-classifier.ipynb (#104)

57f7bd0

Closes #94 Refs #70

rdvelazquez closed this as completed Oct 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory issue #70

Memory issue #70

htcai commented Nov 19, 2016 •

edited

Loading

dhimmel commented Dec 9, 2016 •

edited

Loading

htcai commented Dec 9, 2016 •

edited

Loading

dhimmel commented Dec 9, 2016

htcai commented Dec 9, 2016

KT12 commented Jan 16, 2017

dhimmel commented Jan 16, 2017

KT12 commented Jan 16, 2017

dhimmel commented Jan 16, 2017

htcai commented Jan 16, 2017 •

edited

Loading

Memory issue #70

Memory issue #70

Comments

htcai commented Nov 19, 2016 • edited Loading

dhimmel commented Dec 9, 2016 • edited Loading

htcai commented Dec 9, 2016 • edited Loading

dhimmel commented Dec 9, 2016

htcai commented Dec 9, 2016

KT12 commented Jan 16, 2017

dhimmel commented Jan 16, 2017

KT12 commented Jan 16, 2017

dhimmel commented Jan 16, 2017

htcai commented Jan 16, 2017 • edited Loading

htcai commented Nov 19, 2016 •

edited

Loading

dhimmel commented Dec 9, 2016 •

edited

Loading

htcai commented Dec 9, 2016 •

edited

Loading

htcai commented Jan 16, 2017 •

edited

Loading