-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slurm MPI submission bug #214
base: main
Are you sure you want to change the base?
Conversation
Thank you so much! From a quick look at the CI, I think it fails for an orthogonal reason that I should be able to fix later this week. Then I'll be able to look at this PR. Let me create a separate issue to be able to reference to it... |
Upcoming fix for #215 should fix those CI builds.. |
Sorry for being slow on this! That comes just in time as our local cluster at UBC just also migrated to SLURM recently. Interestingly, in their instructions, they still recommend to use mpiexec to have mpi jobs distributed across several machines. Doing a quick google, that page seems to recommend that the mpiexec is more performant and portable: https://users.open-mpi.narkive.com/a97KsQwJ/ompi-openmpi-slurm-mpiexec-mpirun-vs-srun I was wondering if you have a link for a page supporting the srun route? Maybe it is a more up to date approach (the above linke in 6 years old)? Or there is also the possibility that it is a particularity of the cluster you are using? Even if it is a particularity of a specific cluster, I want to make sure there is enough flexibility to be able to configure it correctly, but in that case I would not make it the default route. Thanks again! |
Here's additional information from the schedmd slurm guide that also suggests the use of |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #214 +/- ##
==========================================
- Coverage 86.82% 86.56% -0.27%
==========================================
Files 95 95
Lines 2429 2419 -10
==========================================
- Hits 2109 2094 -15
- Misses 320 325 +5 ☔ View full report in Codecov by Sentry. |
I tried adding an additional concept to the
|
mpiexec
typically only submits jobs locally onslurm
machines. The scheduler typically usessrun
for global submission, so I changed the exec string to usesrun
when the scheduler is set toslurm