You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I built a simple MPI+OpenACC example on Gust. It was compiled successfully but ran into an error when I turned on MPS.
The error message looks like: gu0017.hsn.gu.hpc.ucar.edu: rank 0 died from signal 11 and dumped core
If I turned off MPS and re-ran the program, it worked just fine.
The run command is mpiexec --cpu-bind depth -n 1 -ppn 1 -d 1 ./mpi_mps.exe and I only request a single GPU. I tried 2 MPI ranks per node and got the same error.
My environment module list is:
ncarenv/22.08 (S) 3) nvhpc/22.7 5) cray-mpich/8.1.18
craype/2.7.17 (S) 4) ncarcompilers/0.6.2
The text was updated successfully, but these errors were encountered:
This might be tied to an issue with MPI & GPUs and the OS version on the compute nodes. @jbaksta is looking at upgrading the OS to the supported version soon. If nothing else, that will allow us to report segfaults and get support.
Currently cray-mpich is not working with MPS. We are working with HPE to resolve. An OpenMPI build has been introduced. I recommend you give that a try.
I built a simple MPI+OpenACC example on Gust. It was compiled successfully but ran into an error when I turned on MPS.
The error message looks like:
gu0017.hsn.gu.hpc.ucar.edu: rank 0 died from signal 11 and dumped core
If I turned off MPS and re-ran the program, it worked just fine.
The run command is
mpiexec --cpu-bind depth -n 1 -ppn 1 -d 1 ./mpi_mps.exe
and I only request a single GPU. I tried 2 MPI ranks per node and got the same error.My environment module list is:
The text was updated successfully, but these errors were encountered: