Parallelization issues with pocoMC #591

lisaotten · 2024-08-05T18:50:34Z

When employing the pocoMC package for Bayesian Inference runs using tellurium for modeling, we have encountered issues with parallelization. Using multiprocess(ing), we noticed a very big discrepancy between the results obtained by parallelized and non-parallelized runs (see also attached corner plots). Both runs run through smoothly without any error messages or other large differences. I have been able to reproduce the results of both runs separately multiple times both on a HPC and my personal notebook. Besides the change of parallelization, all other parameters are kept exactly the same. Changes in the number of parallel kernels do not seem to change these results. The results of the non-parallelized run are what we would expect as the correct results from experience.

I have attached a self-contained code including the environment I am running it on. The config-file has an option under bayesian_inference to turn parallelization on/off as well as specify the number of kernels.

parallel.pdf
not_parallel.pdf
Bayesian_Transporter.zip

matthiaskoenig · 2024-08-13T10:21:20Z

Hi @lisaotten,

I had a quick look at the plot. Not sure the plots are really different.
You have to set the axes of the two plots identical to do a better comparison visually.
Often you get single rare samples which are far outside resulting in very different axes ranges (if the axes are adapted automatically). I assume you have just 1-2 outlier (very unlikely samples) in the one run resulting in very small distributions visually due to changes of the axes limits.

You should do some actual test if the distributions are different, e.g. by comparing the modes of your multi-dimensional distributions or using something like:
EFECT – A Method and Metric to Assess the Reproducibility of Stochastic Simulation Studies
T.J. Sego, Matthias König, Luis L. Fonseca, Baylor Fain, Adam C. Knapp, Krishna Tiwari, Henning Hermjakob, Herbert M. Sauro, James A. Glazier, Reinhard C. Laubenbacher, Rahuman S. Malik-Sheriff
arXiv:2406.16820 (preprint). doi:10.48550/arXiv.2406.16820

Hope this helps.
TDLR: most likely a plotting issue, not a sampling issue

lisaotten · 2024-08-13T15:56:04Z

Hi @matthiaskoenig,

Thanks for your reply!
Both plots actually have the exact same axis ranges. I agree with you that the parallel results are much broader than the results from the non-parallel run, which is where my problem lies. I have reproduced these results multiple times with very similar results both on a HPC and my personal notebook.

luciansmith · 2024-08-13T17:10:34Z

The first thing I can think of is that if the seeds are being set from the system clock (which they are by default), you might be getting the same seed on multiple runs? I can imagine reasons for both the parallel and non-parallel runs to end up this way, so you might want to try explicitly setting the seed for each run manually, to ensure that they're all unique.

You could also examine the individual results to see if this is actually happening or not.

lisaotten · 2024-08-20T16:13:51Z

I played around with the seed quite a bit in finding a possible source for the errors, but even setting the seed manually resulted in the same distributions.

lisaotten · 2024-09-18T16:38:55Z

I have attached a combined corner plot that maybe illustrates the issues a little better: The black line corresponds to a non-parallel run, while the other three lines correspond to parallel runs using the multiprocess, multiprocessing and pathos packages. They were all created using the same fixed seed at the start of the runs.

The last parameter in the corner plot corresponds to the deviation of the parameter fits from the data points that we are trying to analyze and is much larger for the parallel run. This clearly indicates that the parallel runs fit the data much worse.

hsauro · 2024-09-23T21:52:36Z

I wonder if it is possible to create a small problem to look at? I can see there appears to be a difference. What happens if you run it again, do you get a similar corner plot?

…

On Wed, Sep 18, 2024 at 9:39 AM lisaotten ***@***.***> wrote: I have attached a combined corner plot that maybe illustrates the issues a little better: The black line corresponds to a non-parallel run, while the other three lines correspond to parallel runs using the multiprocess, multiprocessing and pathos packages. They were all created using the same fixed seed at the start of the runs. The last parameter in the corner plot corresponds to the deviation of the parameter fits from the data points that we are trying to analyze and is much larger for the parallel run. This clearly indicates that the parallel runs fit the data much worse. corner_16Dpococheck_3.png (view on web) <https://urldefense.com/v3/__https://github.com/user-attachments/assets/3656420b-7719-4161-b5ff-cbecb31a2313__;!!K-Hz7m0Vt54!gttSNTitQRpvNxt7NVNEJMih4al3YU4A6XOICHbJTWgYKRSHGcXs3IuufcYhNZu4FC7hF4Kiu2goWAJu6zfQoC1Vq0uYCA$> — Reply to this email directly, view it on GitHub <https://urldefense.com/v3/__https://github.com/sys-bio/tellurium/issues/591*issuecomment-2358945406__;Iw!!K-Hz7m0Vt54!gttSNTitQRpvNxt7NVNEJMih4al3YU4A6XOICHbJTWgYKRSHGcXs3IuufcYhNZu4FC7hF4Kiu2goWAJu6zfQoC10I_7x7Q$>, or unsubscribe <https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AAIBSDRLGAZ5HYL34HTUPTDZXGUDNAVCNFSM6AAAAABMA2544KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJYHE2DKNBQGY__;!!K-Hz7m0Vt54!gttSNTitQRpvNxt7NVNEJMih4al3YU4A6XOICHbJTWgYKRSHGcXs3IuufcYhNZu4FC7hF4Kiu2goWAJu6zfQoC3RPX1GBw$> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- Herbert Sauro, Professor Director: NIH Center for model reproducibility University of Washington, Bioengineering 206-685-2119, www.sys-bio.org, http://reproduciblebiomodels.org/ Mobile: 206-880-8093 ***@***.*** Books: http://books.analogmachine.org/

hsauro · 2024-09-24T14:08:12Z

I’ve been thinking more about this. Would it be possible to create a simple one variable model, eg -> x ->. With simple mass action on each step? That would create a one element corner plot. This would be easier to debug.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelization issues with pocoMC #591

Parallelization issues with pocoMC #591

lisaotten commented Aug 5, 2024

matthiaskoenig commented Aug 13, 2024

lisaotten commented Aug 13, 2024

luciansmith commented Aug 13, 2024

lisaotten commented Aug 20, 2024

lisaotten commented Sep 18, 2024

hsauro commented Sep 23, 2024 via email

hsauro commented Sep 24, 2024 via email •

edited by luciansmith

Loading

Parallelization issues with pocoMC #591

Parallelization issues with pocoMC #591

Comments

lisaotten commented Aug 5, 2024

matthiaskoenig commented Aug 13, 2024

lisaotten commented Aug 13, 2024

luciansmith commented Aug 13, 2024

lisaotten commented Aug 20, 2024

lisaotten commented Sep 18, 2024

hsauro commented Sep 23, 2024 via email

hsauro commented Sep 24, 2024 via email • edited by luciansmith Loading

hsauro commented Sep 24, 2024 via email •

edited by luciansmith

Loading