Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelization issues with pocoMC #591

Open
lisaotten opened this issue Aug 5, 2024 · 7 comments
Open

Parallelization issues with pocoMC #591

lisaotten opened this issue Aug 5, 2024 · 7 comments

Comments

@lisaotten
Copy link

When employing the pocoMC package for Bayesian Inference runs using tellurium for modeling, we have encountered issues with parallelization. Using multiprocess(ing), we noticed a very big discrepancy between the results obtained by parallelized and non-parallelized runs (see also attached corner plots). Both runs run through smoothly without any error messages or other large differences. I have been able to reproduce the results of both runs separately multiple times both on a HPC and my personal notebook. Besides the change of parallelization, all other parameters are kept exactly the same. Changes in the number of parallel kernels do not seem to change these results. The results of the non-parallelized run are what we would expect as the correct results from experience.

I have attached a self-contained code including the environment I am running it on. The config-file has an option under bayesian_inference to turn parallelization on/off as well as specify the number of kernels.

parallel.pdf
not_parallel.pdf
Bayesian_Transporter.zip

@matthiaskoenig
Copy link
Collaborator

Hi @lisaotten,

I had a quick look at the plot. Not sure the plots are really different.
You have to set the axes of the two plots identical to do a better comparison visually.
Often you get single rare samples which are far outside resulting in very different axes ranges (if the axes are adapted automatically). I assume you have just 1-2 outlier (very unlikely samples) in the one run resulting in very small distributions visually due to changes of the axes limits.

You should do some actual test if the distributions are different, e.g. by comparing the modes of your multi-dimensional distributions or using something like:
EFECT – A Method and Metric to Assess the Reproducibility of Stochastic Simulation Studies
T.J. Sego, Matthias König, Luis L. Fonseca, Baylor Fain, Adam C. Knapp, Krishna Tiwari, Henning Hermjakob, Herbert M. Sauro, James A. Glazier, Reinhard C. Laubenbacher, Rahuman S. Malik-Sheriff
arXiv:2406.16820 (preprint). doi:10.48550/arXiv.2406.16820

Hope this helps.
TDLR: most likely a plotting issue, not a sampling issue

@lisaotten
Copy link
Author

Hi @matthiaskoenig,

Thanks for your reply!
Both plots actually have the exact same axis ranges. I agree with you that the parallel results are much broader than the results from the non-parallel run, which is where my problem lies. I have reproduced these results multiple times with very similar results both on a HPC and my personal notebook.

@luciansmith
Copy link
Contributor

The first thing I can think of is that if the seeds are being set from the system clock (which they are by default), you might be getting the same seed on multiple runs? I can imagine reasons for both the parallel and non-parallel runs to end up this way, so you might want to try explicitly setting the seed for each run manually, to ensure that they're all unique.

You could also examine the individual results to see if this is actually happening or not.

@lisaotten
Copy link
Author

I played around with the seed quite a bit in finding a possible source for the errors, but even setting the seed manually resulted in the same distributions.

@lisaotten
Copy link
Author

I have attached a combined corner plot that maybe illustrates the issues a little better: The black line corresponds to a non-parallel run, while the other three lines correspond to parallel runs using the multiprocess, multiprocessing and pathos packages. They were all created using the same fixed seed at the start of the runs.

The last parameter in the corner plot corresponds to the deviation of the parameter fits from the data points that we are trying to analyze and is much larger for the parallel run. This clearly indicates that the parallel runs fit the data much worse.

corner_16Dpococheck_3

@hsauro
Copy link
Contributor

hsauro commented Sep 23, 2024 via email

@hsauro
Copy link
Contributor

hsauro commented Sep 24, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants