Turn ParallelAnalysisBase into dask custom collection #136

yuxuanzhuang · 2020-08-17T13:44:03Z

Fixes #135

Note the only file changes from #132 is parallel.py You can read https://github.com/yuxuanzhuang/pmda/pull/1/files to get the actual changes.

Changes made in this Pull Request:

See Turn ParallelAnalysisBase into dask custom collection #135

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

pep8speaks · 2020-08-17T13:44:09Z

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file pmda/parallel.py:

Line 305:80: E501 line too long (80 > 79 characters)
Line 306:70: E128 continuation line under-indented for visual indent
Line 306:80: E501 line too long (80 > 79 characters)

In the file setup.py:

Line 16:80: E501 line too long (84 > 79 characters)
Line 16:84: W504 line break after binary operator
Line 58:80: E501 line too long (104 > 79 characters)
Line 69:80: E501 line too long (115 > 79 characters)

Comment last updated at 2020-08-20 10:39:58 UTC

orbeckst · 2020-08-19T07:13:13Z

This PR really depends on PR #132 so we should look at that one first. Then you can rebase this one and it will become much cleaner.

orbeckst

Overall this looks like a really interesting way to move forward. This, together with the notebook, is a good study for how the next version of PMDA could look like.

I have a bunch of initial questions/comments inline.

Also note that we would first need to merge PR #132 before really moving forward here.

We would also need to remove Python 2 as soon as we become dependent on MDA 2.0.0 (but that's for PR #132).

Tests will obviously be needed...

orbeckst · 2020-08-19T07:14:28Z

pmda/parallel.py

+        return self._keys
+
+    #  it uses multiprocessing scheduler in default
+    __dask_scheduler__ = staticmethod(dask.multiprocessing.get)


Even though multiprocessing is the default scheduler, one can still use distributed, right?

Right, it can either be a global dask config, a context manager, or an arg in self.compute(). (https://docs.dask.org/en/latest/scheduler-overview.html#configuring-the-schedulers)

orbeckst · 2020-08-19T07:15:08Z

pmda/parallel.py

+            np.array([el[5] for el in res]))
+
+        #  this is crucial if the analysis does not iterate over
+        #  the whole trajectory.


Why is this crucial? What would happen? Add more comment.

discussed here
https://github.com/MDAnalysis/pmda/pull/132/files#r455247843

orbeckst · 2020-08-19T07:18:14Z

pmda/parallel.py

+
+    def __dask_postpersist__(self):
+        #  we don't need persist implementation.
+        raise NotImplementedError


Will it not be possible to persist?

Presumably, that would have been possible previously if we had chosen persist in run instead of compute().

orbeckst · 2020-08-19T07:19:09Z

pmda/parallel.py

            times_io), np.sum(times_compute)

    @staticmethod
    def _reduce(res, result_single_frame):
        """ 'append' action for a time series"""
        res.append(result_single_frame)
        return res
+
+    def __getstate__(self):


Does DaskMixin require the whole class to be picklable?

Yes...I mean in the old implementation, the whole class has to be picklable as well.

FYI, the code here is not needed anymore after MDAnalysis/mdanalysis#2893 is merged

orbeckst · 2020-08-19T07:26:12Z

pmda/parallel.py

@@ -284,6 +281,69 @@ def _single_frame(self, ts, atomgroups):
        """
        raise NotImplementedError

+    def prepare_jobs(self,


prepare_jobs sounds confusing to me – what "jobs"? If it's part of the documented workflow then it could just be prepare.

prepare_dask would be more explicit but also a bit pointless because PMDA is fully intertwined with dask so that's the only thing we would be preparing for. create_dask_graph is too long and really talks to much about implementation details.

All in all, I'd just call it prepare and add more docs stating clearly what is being prepared and under which circumstances a user needs to run it.

yuxuanzhuang added 23 commits July 15, 2020 14:22

refactor parallel.py

1e3d27b

refactor custom

4435f29

pep8

1981673

refactor rmsd

0a70857

refactor rmsf

cafe65f

refactor contacts

185d19a

refactor density

ef86b9d

refactor rdf

c0b0bd6

refactor HBonds

bfa629c

leaflet broken

495033f

build mdanalysis on serialize_io

8700223

build mdanalysis on serialize_io fix

c8e9973

push leaflet fix back

356cfb9

travis fix

58fea5d

travis fix

dfd3588

test parallel

0921882

timing test

49288e8

leaflet fix

9a0e0c5

pep8

187463b

make sure universe before atomgroup

8a42040

change travis back

053225b

travis to develop

83becd7

turn to dask collection

666d690

yuxuanzhuang changed the title ~~Joblist~~ Turn ParallelAnalysisBase into dask custom collection Aug 17, 2020

n_frame modify

1e0013e

orbeckst reviewed Aug 19, 2020

View reviewed changes

yuxuanzhuang added 2 commits August 19, 2020 12:01

remove getstate

f9c89e6

pep8

61bce8f

yuxuanzhuang added 10 commits August 19, 2020 12:28

update setup.py

cb99fc8

pep8 warning

d95add1

travis

a505282

setup reverse

608a803

pep

18988c5

setup py3

61a34c7

rewind doc

7bf68f5

merge to new refactor

9f08fc8

change tokenize to uuid

186bfbf

merge to develop

f762c7a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn ParallelAnalysisBase into dask custom collection #136

Turn ParallelAnalysisBase into dask custom collection #136

yuxuanzhuang commented Aug 17, 2020 •

edited

Loading

pep8speaks commented Aug 17, 2020 •

edited

Loading

orbeckst commented Aug 19, 2020

orbeckst left a comment

orbeckst Aug 19, 2020

yuxuanzhuang Aug 19, 2020

orbeckst Aug 19, 2020

yuxuanzhuang Aug 19, 2020

orbeckst Aug 19, 2020

orbeckst Aug 19, 2020

yuxuanzhuang Aug 19, 2020

orbeckst Aug 19, 2020

Turn ParallelAnalysisBase into dask custom collection #136

Are you sure you want to change the base?

Turn ParallelAnalysisBase into dask custom collection #136

Conversation

yuxuanzhuang commented Aug 17, 2020 • edited Loading

PR Checklist

pep8speaks commented Aug 17, 2020 • edited Loading

Comment last updated at 2020-08-20 10:39:58 UTC

orbeckst commented Aug 19, 2020

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuxuanzhuang commented Aug 17, 2020 •

edited

Loading

pep8speaks commented Aug 17, 2020 •

edited

Loading