You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Intel18 + openmp executable (single thread test) causes crash or just hang for MOM6 test cases on all three machines theta (KNL) , lscsky50 (skylake) and theia.
Here's the crash output for global_ALE_z test case on theta and lscsky50:
_pmiu_daemon(SIGCHLD): [NID 00471] [c2-0c1s5n3] [Fri Apr 20 16:31:03 2018] PE RANK 19 exit signal Bus error
[NID 00471] 2018-04-20 16:31:03 Apid 4349434: initiated application termination
[NID 00471] 2018-04-20 16:31:04 Apid 4349434: Error detected during page fault processing. Process terminated via bus error.
on KNL box:
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 182461 RUNNING AT lscsky50-d
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
No such issues for Intel17.
No such issue for non-openmp exec with Intel18.
The text was updated successfully, but these errors were encountered:
@byrdman1982 thanks for pointing that out.
I get the same bad behavior with both repro-openmp (-O2) and prod-openmp(-O3) on the three intel18 platforms.
With debug-openmp (-O0) the model hangs on theta and theia , but it runs fine on the skylake box!
@nikizadehgfdl have to tried watching the memory? If it's related to openmp, there could be a data race. We also saw some memory leaking with openmp when using pointers.
Intel18 + openmp executable (single thread test) causes crash or just hang for MOM6 test cases on all three machines theta (KNL) , lscsky50 (skylake) and theia.
Here's the crash output for global_ALE_z test case on theta and lscsky50:
or for another test (benchmark):
on KNL box:
No such issues for Intel17.
No such issue for non-openmp exec with Intel18.
The text was updated successfully, but these errors were encountered: