Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DreamPool implementation seems incompatible with Python3.8 multiprocess? #16

Closed
lschwetlick opened this issue Feb 6, 2020 · 7 comments · Fixed by #17
Closed

DreamPool implementation seems incompatible with Python3.8 multiprocess? #16

lschwetlick opened this issue Feb 6, 2020 · 7 comments · Fixed by #17

Comments

@lschwetlick
Copy link

lschwetlick commented Feb 6, 2020

I have been having troble getting PyDream to work with Python 3.8 and I think I have tracked down the root of the problem: The DreamPoolclass integrates with a multiprocess function that has changed in the Python3.8 version of multiprocess.

What?

In Python3.8 when installing pyDream in a clean environment, a very simple test script kept outputting the error:

  File "/virtual_envs/py38_dream/lib/python3.8/site-packages/multiprocess/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/virtual_envs/py38_dream/lib/python3.8/site-packages/multiprocess/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/virtual_envs/py38_dream/lib/python3.8/site-packages/multiprocess/pool.py", line 319, in _repopulate_pool_static
    w = Process(ctx, target=worker,
  File "/PyDREAM/pydream/Dream.py", line 996, in __init__
    mp.Process.__init__(self, group, target, name, args, kwargs)
  File "virtual_envs/py38_dream/lib/python3.8/site-packages/multiprocess/process.py", line 82, in __init__
    assert group is None, 'group argument must be None for now'
AssertionError: group argument must be None for now

When doing the exact same thing (environment, pip install ., try script) it works as expected.

Why?

In previous implementations of multiprocess, the Pool class defined processes with just args and kw args see here.
In the Python3.8 version Pool has an extra argument ctx see here.
In PyDream the Process member of Pool gets overwritten by the custom class NoDaemonProcess. NoDaemonProcessdoes not expect to be passed the ctx variable, and interprets is as the group argument. group needs to be None, thus the error.

Ideas

In order to be compatible with multiprocess, I assume we need to make NoDaemonProcess depend on ctx (which is a multiprocess.context.ForkContext object, which incidentally hides another layer of indirection with another Process subclass). Unfortunately, I'm not really understanding what the ctx variable does.

I tried simply catching it with

    def Process(self, ctx, *args, **kwds):
        return NoDaemonProcess(*args, **kwds)

and that seemed to work, I'm not entirely sure of whether there are unwanted consequences of this.

@lschwetlick
Copy link
Author

Thinking about this some more, I was wondering if anyone could maybe explain why using multiprocess as a dependency is better than using the standard multiprocessing library?
Since Multiprocess changes its interface dependent on the Python version it will be extremely cumbersome to support different versions of Python in PyDream as well!

@ortega2247
Copy link
Contributor

Hi @lschwetlick, thank you for reporting this problem. I just checked and you are right, there is no need to use the multiprocess package. Regarding the issue with python 3.8, you are also right about the problem with the ctx argument. There is a workaround in this post: http://stackoverflow.com/questions/6974695/python-process-pool-non-daemonic

Also, it is possible to use the new concurrent futures api that supports the launching of child processes and thus there is no need to subclass the multiprocessing Process and Pool classes. I'm going to implement this solution and release a new PyDREAM version soon.

@lschwetlick
Copy link
Author

Thanks so much for actively maintaining the package!

I use it a lot, so if I can contribute in some way, I'd be happy to.

@ortega2247
Copy link
Contributor

I am glad you find PyDREAM useful!

I have made a pull request that should enable users to run PyDREAM with Python 3.8. I'll have to wait until someone from the lab reviews the PR to release a new version. In the meantime, you can install the pydream from the dream_pool_all_python branch using pip: pip install git+https://github.com/LoLab-VU/PyDREAM.git@dream_pool_all_python

I would appreciate it if you can take a look at the PR and see if something that doesn't make sense :) Also, if you know about Bayesian hierarchical modeling could you take a look at the issue #15? I haven't figured out yet how to do hierarchical bayes with pydream.

Thanks!

@lschwetlick
Copy link
Author

lschwetlick commented Feb 18, 2020

Hi,

I only got around to trying it now and I do have a question about it.
Out of the box I get a very strange error (see below). Googling freeze_support I found this which is extremely odd, since I'm running MacOS Catalina...
Implementing the freeze_support does resolve the issue though, which is even odder because
it sais that on non-windows machines it should do nothing. Perhaps you are overwriting some default with the ctx?

Although as far as I can tell you are getting the context from mp.get_context, so I'm really unsure what would be causing this...

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 262, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 95, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/lisa/Documents/SFB1294_B05/PyDREAM/try/run_dream.py", line 76, in <module>
    dream_estim_and_save(priors)
  File "/Users/lisa/Documents/SFB1294_B05/PyDREAM/try/run_dream.py", line 60, in dream_estim_and_save
    sampled_params, log_ps = pd_run(list(priors.values()), custom_loglik, nchains=3, niterations=5000, restart=False, verbose=False, model_name=folderPath+"fit_"+estim_id)
  File "/Users/lisa/Documents/virtual_envs/new_pydream/lib/python3.8/site-packages/pydream/core.py", line 66, in run_dream
    pool = _setup_mp_dream_pool(nchains, niterations, step_instance, start_pt=start, mp_context=mp_context)
  File "/Users/lisa/Documents/virtual_envs/new_pydream/lib/python3.8/site-packages/pydream/core.py", line 307, in _setup_mp_dream_pool
    p = DreamPool(nchains, context=ctx, initializer=_mp_dream_init,
  File "/Users/lisa/Documents/virtual_envs/new_pydream/lib/python3.8/site-packages/pydream/Dream.py", line 1068, in __init__
    super(DreamPool, self).__init__(processes=processes,
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/pool.py", line 212, in __init__
    self._repopulate_pool()
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/pool.py", line 303, in _repopulate_pool
    return self._repopulate_pool_static(self._ctx, self.Process,
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/pool.py", line 326, in _repopulate_pool_static
    w.start()
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/Users/lisa/.pyenv/versions/3.8.0/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

@ortega2247
Copy link
Contributor

Hi, thanks for reporting this issue. Since Python 3.8 the default method to start the processes in a mac changed from fork to spawn. According to the python multiprocessing guidelines, there are a few extra restrictions which don’t apply to the fork start method. Thus, if you want to use Python 3.8 with the default spawn method to start the processes you have to put the run_dream call within the if statement:

if __name__ == '__main__':
    run_dream(...)

As you mentioned above there is no need to call the freeze_support function() in a mac.

Alternatively, you can run pydream with a fork context, although this is not recommended as the python developers mention in the docs: "The fork start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725."

from multiprocessing import get_context
run_dream(..., mp_context=get_context('fork'))

I hope this is helpful. If you keep getting errors please let us know and we can reopen this issue

@lschwetlick
Copy link
Author

Ah, I see! Thanks for clearing that up for me. I was quite mystified! I'm getting no further errors, so it all looks good. Thanks again for fixing :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants