-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update toolchains on tioga, lassen, ruby and poodle #1712
Conversation
…ed, module is not enough
…ming convention to match LC’s
@adrienbernede I will get back to you for a recommendation about testing OpenMP target. |
@rhornung67 I think at least some of the above failures should be addressed by the RAJA teams. For the others, we can decide to allow the jobs to fail. |
@adrienbernede the test failure on intel2023 (poodle and ruby) is a known issue. @artv3 is looking into it, I think. I haven't seen the cce18 failure before. Is that a new version of Cray compiler in our CI? |
@rhornung67 Yes, It’s even the new default. |
Yikes! We'll look into the cce18 failure |
@adrienbernede we identified the cce18 failure as a compiler issue (we can reproduce outside of RAJA). A ticket has been submitted and it is being tracked by one of our HPE POCs. The Intel failures may also be compiler issues. The errors go away if we build with -O0 or -O1. We reported to LC and are waiting on their recommendation to address. So for now, I think we go with allowing failures for cce18 and intel. Also, we should probably add cce17 back in until the cce18 issue is resolved. |
I just allowed intel and cce 18 jobs to fail, and added a cce 17 job just for RAJA (still using cce 18 for other jobs). Is that OK ? Also, could you confirm that, on ruby and poodle, you want:
Also, do we still need to default to blt@develop in CI, if so, why ? |
…oga, add cce 17 job on tioga
@adrienbernede the changes you described make sense. We can return to not allowing failures after we have the issues resolved. The specs for ruby and poodle you mention are good. I don't know why we are defaulting to BLT@develop. I think you set that up a while ago. I think it makes sense to point to the BLT 0.6.2 release, which is what we are using in the RAJA submodule. |
@rhornung67 this is ready. Your approval being a month old I’d like a quick second look from you. |
This PR addresses most of #1683.
Update corona to ROCm 6.0.2-> same error as with tioga, need ROCm 6.1.x , reverted to ROCm 5.7.1❗ TODO before merging ❗:
Errors to investigate:
Same only test failing on both machines.
Only one test failing with several of these:
Notes
-> In conclusion, we use lc defined clang 16.0.6 + cuda 11.8.0 + gcc 11.2.1, associated with xlf 16.1.1.14 + cuda 11.8.0 + gcc 11.2.1 . It looks like we are more and more bound to use LC wrappers (after having to use them to enforce the gcc toolchain in spack context).