-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running parallel scans in Jupyter/IJulia #317
Comments
Thank you very much, this is amazing! @nkotsina I think this may solve an issue you've had as well? One question: when you say it "crashed out", did you get an error within Julia or did Julia itself crash? We've seen some issues running the continuous integration via GitHub actions on parallel scans as well, wondering whether this could be related. |
Ah, yes, good question - I was a little slap-dash in the above posting! The behaviour I see if trying to run a parallel job directly from a notebook is that parallel workers are launched, and seem to start up OK, then crash out on a per-worker basis with As a side-note, I did have a couple of successful runs from a notebook at some point, but I've no idea why! Here's a snippet of the output from an attempted run:
... and so on with similar errors for other I can share a full example notebook and outputs if useful. |
A full (non)working example would definitely be helpful. What's confusing here is that the error you're getting, |
Hmmm, that is curious. I initially tested for parameters quite similar to the Luna docs (these are as currently set in the demo notebook at https://github.com/phockett/Luna.jl-jupyterDispatch/blob/main/scan_parallel_Luna_template_160223.ipynb).
Physically I think these parameters should be OK, but I also don't have that much experience here. All that said, I've since encountered some of these types of errors running in parallel from shell too, so it's not impossible that there is something else going on here! I can certainly test a little more carefully given your comments, perhaps you could suggest some "safe" and "not safe" parameter sets for testing to see if the errors are encountered reliably and when expected? |
In this particular case, the issue isn't in the parallel or serial execution, but in a small difference between your scan scripts. The order in which you add variables to the scan matters! In the one which fails you have
but in the one which works you have
I.e. in the failing case you add pressure, then energy to the scan, rather than the other way around. But in both cases your
which assumes energy comes first. So in the failing case you are running a scan over 0.6 to 1.4 joules of pulse energy so 16 petawatts of peak power, which is why the PPT rate fails. It does say this in the manual, but from looking at this it needs to be far more obvious. I'll turn it into a big flashing warning box. Sorry for the confusion! I don't have a properly working Julia+Jupyter setup right now--could you check whether fixing this just fixes everything? |
Ah, the meat-space "stupid-user" debug was required! 😜 Thanks for the gentle correction. It certainly looks like mea culpa, and with apologies... I do vaguely recall playing around with the ordering of the parameter passing when trying to get the parallel scans running, and had indeed probably read about same in the docs, but clearly got in a mess there regardless and may have tricked myself into thinking it was not the issue... but it would certainly explain the inconsistencies I was seeing. (And, as a side-note, I probably hadn't quite appreciated the Julia In any case, I will have a more careful/rigorous play starting with a clean notebook and see if things make more sense, or if there are any other issues that crop up. |
The plot thickens - after a bit more fiddling I now recall that that I was playing with the parameter ordering because creating a Scan() object was buggy in IJulia, and gives I think this is a more fundamental issue, presumably related to how Jupyter/IJulia is wrapping these calls (since it seems OK from CLI usually). The crashes discussed above are, presumably, as you mention, a different issue related to the actual code I ended up with at the times it worked. |
Aha! Now we're getting somewhere. The problem here seems to come from the fact that IJulia internally runs Julia with some added command-line arguments. The In a classic case of one nice feature causing really hard bugs, command-line arguments also override any One thing to try is to print the command-line arguments with [pop!(ARGS) for _ in eachindex(ARGS)] This will remove any command-line arguments currently present and should hopefully enable the I will try to think of a better way of dealing with this. Hopefully there is a way of fixing this issue and still keep all of the functionality. |
Hi @chrisbrahms perhaps you can check if |
Thanks for the suggestion @chrisbrahms , and the further reminder/prod to look at this again @michaelhemsworth! On the IJulia point, it seems like
So it makes sense this is messing up the general arg passing! Adding As @michaelhemsworth mentioned, wrapping this with an IJulia check should suffice for general use, e.g. I quickly tested this (in a notebook only), and it seemed to work as expected:
It might be germane to put the session handle somewhere too, I've no idea if deleting this will/can cause other issues! (Tested in Julia 1.9.3/IJulia 1.24.2, Luna 0.4.0.) |
Full working demo notebook: https://github.com/phockett/Luna.jl-jupyterDispatch/blob/main/demo/luna_scan_demo_271123.ipynb |
I had a lot of issues dispatching parallel scan jobs from a Jupyter/IJulia notebook (although single parameter set runs in a notebook worked fine). This occasionally seemed to work, but generally didn't, and usually crashed out after spawning workers.
As a quick work-around, I wrote a basic job dispatch notebook instead, which basically writes a job script to file then launches the parallel execution via a shell script. In case it's of interest/use to anyone else, it's now at https://github.com/phockett/Luna.jl-jupyterDispatch.
The text was updated successfully, but these errors were encountered: