Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program hangs when instantiating a GP using multiprocessing #521

Open
brendenpetersen opened this issue Jul 6, 2017 · 12 comments
Open

Program hangs when instantiating a GP using multiprocessing #521

brendenpetersen opened this issue Jul 6, 2017 · 12 comments

Comments

@brendenpetersen
Copy link

I'm trying to do what seems like a simple task: use multiprocessing to parallelize the optimize() call over many unique GPs. Here's a minimal example of what I'm trying to do.

from GPy.core.gp import GP
from GPy.kern import White
from GPy.likelihoods.gaussian import Gaussian
from GPy.inference.latent_function_inference.laplace import Laplace
from multiprocessing import Pool
import numpy as np

# Wrapper needed so the function is pickleable, which is required for multiprocessing.Pool
def opt_wrapper(gp):
   return gp.optimize() # Can replace with 'return 1' and program still hangs

size = 100 # Program works when this is low enough
inference_method = Laplace() # Program works when this is None
models = [GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method) for _ in range(1)]

print "Starting pool..."
pool = Pool(1)
print pool.map(opt_wrapper, models)
pool.close()
pool.join()

The program simply hangs after printing "Starting pool..." Annoyingly, it also results in a zombie process for each worker in the pool (just 1 in this example).

The program works just fine when size is less than about 60. However, when size is larger, it simply hangs after printing "Starting pool..." Note you can replace pool.map with the built-in map and it works just fine, so it seems to be an issue of creating a GP with Laplace over a certain size within a new process.

The program works just fine when any one of the following conditions are true:

  1. When size is less than about 60. For larger values, it hangs.
  2. When Laplace() is replaced with None. This is because the Gaussian likelihood then defaults to ExactGaussianInference(); however, my actual project's likelihood is custom and requires Laplace().
  3. When pool.map is replaced with the built-in map.

Lastly, it still breaks when you replace return gp.optimize() with return 1. Similarly, the following program hangs (same imports):

def make_gp(dummy):
   inference_method = Laplace() # Again, program works when Laplace() becomes None
   gp = GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method)
   return 1

size = 100 # Again, program works when this is small
pool = Pool(1)
print pool.map(make_gp, ['dummy']) # Again, works with `map`
pool.close()
pool.join()

It seems to be an issue of instantiating/copying a GP--both with Laplace and above a certain size--within a new process. Seems highly odd and highly specific. Any help greatly appreciated.

@avehtari
Copy link
Contributor

avehtari commented Jul 7, 2017

Are GPs instantiated in the same node? Are you running out of memory? When my student @opkoisti tested distributed GP approach with GPy, there were some problems with the implementation of GPy. Unfortunately I don't remember whether they were properly fixed. Maybe @alansaul remembers?

@brendenpetersen
Copy link
Author

brendenpetersen commented Jul 7, 2017

Thank you for the response! Yes, they're all on the same node. I'm just trying this out on a single laptop with 4 cores. (I'm also considering distributed, though it wouldn't be with the multiprocessing package.) I'm not running out of memory either--good question. The memory requirements are very small.

It does seem to be a GPy implementation issue. I've tried toy examples with other objects and it works just fine. I know multiprocessing has issues with unpickleable data (i.e. the need to wrap the optimize() call). It could also be an issue with multiple instances of multiprocessing? I looked into what's happening under the hood when Laplace() is called, but it goes pretty deep with some other packages. I also tried replacing Laplace() with expectation_propagation.EP(), which resulted in some NotImplemented error about object 'super' not having __getstate__.

I look forward to help from @opkoisti and/or @alansaul. Thanks again.

@mzwiessele
Copy link
Member

mzwiessele commented Jul 9, 2017 via email

@brendenpetersen
Copy link
Author

brendenpetersen commented Jul 10, 2017 via email

@mzwiessele
Copy link
Member

I am running on Python 2.7.9 too...

@brendenpetersen
Copy link
Author

@mzwiessele Can you try increasing size and see if it still works?

Could you check your versions of GPy and numpy (I'm on 1.13.1)?

@ahartikainen
Copy link

ahartikainen commented Mar 19, 2018

Hi, this is probably a problem with the Jupyter Notebook.

You need to create an external python file (e.g. my_func.py) and put the opt_wrapper in there.

Then import the opt_wrapper and call your code normally.

from GPy.core.gp import GP
from GPy.kern import White
from GPy.likelihoods.gaussian import Gaussian
from GPy.inference.latent_function_inference.laplace import Laplace
from multiprocessing import Pool
import numpy as np
from myfunc import opt_wrapper

size = 100 # Program works when this is low enough
inference_method = Laplace() # Program works when this is None
models = [GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method) for _ in range(1)]

print "Starting pool..."
pool = Pool(1)
print pool.map(opt_wrapper, models)
pool.close()
pool.join()

If you are calling your script from the commanline, use if __name__ == '__main__' block to get the multiprocessing working. This way you don't need the external file for your function.

from GPy.core.gp import GP
from GPy.kern import White
from GPy.likelihoods.gaussian import Gaussian
from GPy.inference.latent_function_inference.laplace import Laplace
from multiprocessing import Pool
import numpy as np

def opt_wrapper(gp):
    return gp.optimize()

if __name__ == '__main__':
    size = 100 # Program works when this is low enough
    inference_method = Laplace() # Program works when this is None
    models = [GP(X=np.arange(size).reshape(size,1), Y=np.arange(size).reshape(size,1), kernel=White(1), likelihood=Gaussian(), inference_method=inference_method) for _ in range(1)]

    print "Starting pool..."
    pool = Pool(1)
    print pool.map(opt_wrapper, models)
    pool.close()
    pool.join()

Also, consider to wrap your pool.close and pool.join in Try-Finally block. In python 3 one can call with Pool(1) as p: .

try:
    print "Starting pool..."
    pool = Pool(1)
    print pool.map(opt_wrapper, models)
finally:
    pool.close()
    pool.join()

@brendenpetersen
Copy link
Author

Hi @ahartikainen, I was not using Jupyter Notebook. Python was executed from command-line. So I don't think those changes would fix the problem.

I've moved on from this project, but the issue was actually a limitation with Python's multiprocessing, which uses OS pipes under the hood and is therefore limited by buffer sizes. This explains why the program works when size is small enough, as it puts it under the buffer size, and why it worked for @mzwiessele, whose OS likely had a different buffer size.

One (of many) explanations here: https://sopython.com/canon/82/programs-using-multiprocessing-hang-deadlock-and-never-complete/

@ahartikainen
Copy link

Thanks for the follow up. Interesting problem. What was your OS if I can ask.

@brendenpetersen
Copy link
Author

@ahartikainen no problem! It's a bummer, since GPs can get quite data-intensive. I imagine it can be fixed by chunking up the size of the GP, but that was more work than I could afford at the time.

I'm on Mac OS X El Capitan, which I believe can handle up to 64 kB buffers, but I have no idea how they break that up.

@patel-zeel
Copy link

I had a similar problem while doing benchmarking on AMD and Intel CPUs. AMD was too bad with GPy multiprocessing but Intel did well. Does anyone have a similar experience?

@thihaa2019
Copy link

I have same issue with my Mac. When I try to run GPy parallel. on MacOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants