-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UFS-dev PR#189 #486
UFS-dev PR#189 #486
Conversation
…ated. Checkout to v4, setup-python to v5, cache to v4, upload-artifact to v4, setup-miniconda to v3.
@scrasmussen @mkavulich @dustinswales There are several CI issues outstanding:
Do we want to try to fix any of these in this PR or do it separately? |
Also @ligiabernardet @scrasmussen @dustinswales @mkavulich UFS recently updated their modulefiles for Hera: ufs-community/ufs-weather-model#2093. In particular GNU version went from 9 to 13! I'm guessing that we should follow suit. I can do this in this PR. What do you think? |
@grantfirl We can move to ubuntu24.04 for the RT test, which has GNU 13. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@grantfirl I'm still working through the CI issues. Don't let me hold this PR up. I will follow up with a CI PR once it's all working again. |
I'm going to update the Hera modulefiles at least before merging. |
@mkavulich We'll need to upload the single precision artifact to the FTP server in order for the single precision RTs to not fail the RT CI script for single precision. |
@mkavulich Could you double-check my changes to the Hera module files? I tried to make them compatible with ufs-community/ufs-weather-model#2093. Was there a reason that we needed cmake 3.28.1 or to separately load miniconda? It looks like this is already included in spack-stack 1.6.0. FYI, when using Hera GNU, I'm getting cmake warnings about policy CMP0074 for ccpp-framework and ccpp-physics. We may need to add an issue to resolve this warning in those repos. |
@grantfirl Is there a way to get the artifacts from these failed tests? Right now it's failing with an error because it can't download the data. My thinking is I could create a fake baseline file that's just a copy of the double-precision tests for it to download and compare, which should give us a "failed" test but it should complete without an error, which should get us a real Single-Precision artifact we can upload. Does that sound like a good plan? |
I'm thinking that your proposal would work. I don't know how else to do it. |
Okay the "fake" artifacts are in place (baseline and plots). Let me know if you need anything else. |
@grantfirl Can you make a quick change before this PR is merged? I got an email from Lara Ziady suggesting I move our staged artifacts to a new location on the web server. It's a one-line change, and shouldn't need any additional testing assuming the tests still pass after this change (I already copied the artifacts to the new location):
|
@mkavulich |
Done. |
@dustinswales @mkavulich @scrasmussen Unfortunately, there are some runtime failures in the CI RTs for some cases/suites. I cannot replicate the failures locally, so I don't know how to debug. Any ideas? |
@grantfirl Looking into the CI failures now. |
@grantfirl I also cannot replicate this failure on Hera.
|
@grantfirl
I'm going to try GNU13 using Ubuntu24.04 and see what happens. |
@grantfirl Same story with GNU13. RELEASE and SP Pass. Errors in DEBUG mode, error code 136. I will look into this more later on today. |
@grantfirl For some reason unknown to me, if you apply this change, all the tests run w/o error. |
@dustinswales @mkavulich @scrasmussen This doesn't seem to be the case. With 93732db, I'm still seeing some status 136s and 139s except the output is more verbose. My hunch is that there is an MPI issue causing this with the GitHub workflow somehow. I think that we can debug this more effectively using containers after the release. I don't think that this should hold up anything since we can't replicate failures on any other machines. |
@dustinswales I removed the extra verbosity flag for the RTs for now since it was just adding length and made it harder to find failures. |
Contains changes from:
NOAA-EMC/fv3atm#816
NOAA-EMC/fv3atm#831
NOAA-EMC/fv3atm#807
Plus:
cdata%thread_cnt
initialization