-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[develop] Update ufs-weather-model hash and UPP hash and use upp-addon-env spack-stack environment #1136
[develop] Update ufs-weather-model hash and UPP hash and use upp-addon-env spack-stack environment #1136
Conversation
…19) and UPP hash to 81b38a8 (Aug 13). Point to upp-addon-env spack-stack environment on Hera. Update srw_common.lua to use g2/3.5.1 and g2tmpl/1.13.0. Updated exregional_plot_allvars.py to handle updates made to postxconfig-NT-fv3lam.txt.
… for Hercules in build_hercules_intel.lua
… in build_jet_intel.lua
… in build_gaea_intel.lua
* .cicd/Jenkinsfile - Replaced cheyenne with derecho in commented sections and commented out Derecho. * doc/tables/Tests.csv - Removed nco-mode WE2E tests since these have been removed from the repository. * modulefiles/build_derecho_intel.lua - Update spack-stack environment to upp-addon-env.
… upp-addon-env in build_noaacloud_intel.lua
* .github/CODEOWNERS - Added Bruce Kropp as a reviewer from the Platform team. * modulefiles/build_orion_intel.lua - Udated spack-stack environment to upp-addon-env.
…ace SLP name and comment out REFC sections
* scripts/exregional_plot_allvars.py - Found method to successfully plot REFC using seek() and readline() pygrib commands * scripts/exregional_plot_allvars_diff.py - Same
…ther than the deprecated atmos_nthreads, to correct issue with threading in the weather model
While all 6 fundamental WE2E tests successfully pass following the latest updates (example given was run on Hercules):
the comprehensive tests are failing with the following error:
It isn't clear what the issue is. Since the indicated include file is part of the FV3, I'll reach out to them and see what might be happening for the tests that are now failing with this error message. |
@MichaelLueken Thanks for your work on the OMP problem. I came here to report a similar problem: I applied your changes and noticed there is still at least one fundamental test failing on Hera: |
@mkavulich That could certainly be the issue. Another potential issue is with respect to |
Issue #362 was opened in NOAA-GFDL/GFDL_atmos_cubed_sphere asking about the strange |
The OMP issue appears to be related to the configuration of the job_card in rocoto. In order for a WE2E test to properly run the For threading purposes, the number of nodes required is equal to the tasks per node (40 on Hera) divided by the number of threads used. This gives us a While submitting the |
…nsive WE2E tests to successfully run on Hera
… task to run on Orion
… build and run following update to Cray PE. Allow all comprehensive WE2E tests to successfully run on Gaea.
Derecho, Gaea, Hera, and Orion have been successfully updated and tested. There are still issues with Jet (high resolution tests needing 136 tasks are still in queue) and Hercules (AQM). Will continue to monitor these two issues tomorrow. |
…cst task to run on Hercules; updates for AQM WE2E
…sing a single thread
…o properly run on Jet
…. The tests are failing due to being unable to find the required files. Rearrange WE2E test suites to allow tests to successfully run on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on Hera:
----------------------------------------------------------------------------------------------------
Experiment name | Status | Core hours used
----------------------------------------------------------------------------------------------------
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_RAP_suite_RRFS_v1beta_2 COMPLETE 14.56
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v15p2_20241 COMPLETE 12.92
grid_RRFS_CONUS_25km_ics_FV3GFS_lbcs_FV3GFS_suite_GFS_v17_p8_plot COMPLETE 29.21
grid_RRFS_CONUScompact_25km_ics_HRRR_lbcs_HRRR_suite_HRRR_2024103 COMPLETE 47.55
grid_SUBCONUS_Ind_3km_ics_HRRR_lbcs_RAP_suite_WoFS_v0_20241031170 COMPLETE 26.38
grid_RRFS_CONUS_25km_ics_NAM_lbcs_NAM_suite_GFS_v16_2024103117095 COMPLETE 51.00
----------------------------------------------------------------------------------------------------
Total COMPLETE 181.62
Approved.
The Jenkins tests successfully passed for all platforms that they were run on - Derecho, Gaea, Hera GNU, Hera Intel, Jet, and Orion. The Hercules label was down yesterday, leading to the Jenkins tests skipping Hercules. Manual runs of the WE2E coverage suite on Hercules have successfully passed:
Moving forward with merging this PR now. |
DESCRIPTION OF CHANGES:
modulefiles/build_*.lua
files have been updated to use the upp-addon-env spack-stack environmentsrw_common.lua
was updated to useg2/3.5.1
andg2tmpl/1.13.0
(these are required for UPP).cicd/JENKINSFILE
was updated to replace cheyenne entries with derecho.doc/tables/Tests.csv
table had nco-mode WE2E tests removeddoc/UsersGuide/CustomizingTheWorkflow/ConfigWorkflow.rst
documentation was updated to updatedush/config_defaults.yaml
file..github/CODEOWNERS
file was updated to add Bruce Kropp to the list of reviewersexregional_plot_allvars.py
andexregional_plot_allvars_diff.py
scripts were updated to address changes made to thepostxconfig-NT-fv3lam.txt
file.ush/config_defaults.yaml
to updatePE_MEMBER01
calculation and documentation forOMP_NUM_THREADS_RUN_FCST
to allow for therun_fcst
task to properly run on Tier-1 platforms after updates to allow threading to function properly.ush/machine/*.yaml
files were updated to allow for therun_fcst
task to properly run on Tier-1 platforms after updates to allow threading to function properly.136 (ReqNodeNotAvail)
). Commented out the tests in thecomprehensive.jet
test suite and removed one test from thecoverage.jet
test suite.ufs-case-studies
WE2E tests are currently failing on Derecho. The failure is due to the file not being available. This is an issue because the file in question is named correctly and is available, but the tests fail in theget_extrn_ics/lbs
tasks stating that the files aren't present. Commented out these tests incomprehensive.derecho
and moved WE2E tests to remove fromcoverage.derecho
. Issueufs-case-studies
WE2E tests fail on Derecho inget_extrn_ics/lbcs
#1144 was opened to track this issue on Derecho.Type of change
TESTS CONDUCTED:
DOCUMENTATION:
Updated documentation related to the table defining the WE2E tests currently available in the SRW App. The nco-mode WE2E tests had been removed, but were still present in the Tests.csv file. Removed these entries. Additional documentation updates were made following updates to the config_defaults.yaml file.
CHECKLIST