Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream merge 2024-05-20 #2835

Merged
merged 524 commits into from
Jun 11, 2024
Merged

Upstream merge 2024-05-20 #2835

merged 524 commits into from
Jun 11, 2024

Conversation

bartgol
Copy link
Contributor

@bartgol bartgol commented May 20, 2024

This PR brings in kokkos 4.2 in EAMxx.

No conflicts (except git choking a bit with the ekat submodule, so had to update it manually).

whannah1 and others added 30 commits April 24, 2024 10:13
Previously, this pointed to a mask where glc was allowed to be active
everywhere. This change points to the correct mask and correct mask name
that has been updated so that GIS footprint describing the mask was made
from the current GIS 20km init. cond. file. Testing confirms the correct
mask is pulled from the repo and the relevant test cases run to
completion as expected when using this mask file.
Uses the Fortran linker for the Cray compiler on Frontier

* add "set(E3SM_LINK_WITH_FORTRAN "TRUE")" in crayclang_frontier.cmake

Some ELM external source files (such as sbetr/src/betr/betr_rxns/Tracer1beckBGCReactionsType.F90)
use Fortran-compiler-specific intrinsic functions such as 'erfc',
which require linking with the Fortran compiler.

[No baseline for Frontier yet]
This set of changes enables compatibility with FATES API 33, which brings in two-stream radiation for vegetation canopies.

[nonBfB] for FATES
The switch from climdat to e3sm in the dir name.

Fixes #6267

[BFB]
semoab_mod.F90 file is moved from
components/eam/src/dynamics/se folder to
components/homme/src/share folder.

It belongs naturally there, as it involevs just grid
routines from homme base code

also, it should prepare for SCREAM-moab connection later on

fix also some complaints about non-allocated arrays when there
are no cells on some dynamic tasks (when number of tasks is larger
than number of spectral cells in homme)
Besides doing nothing, InitArguments is deprecated in Kokkos 4.0
Kokkos 4.0 no longer allows use of volatile in this context
Some of the exec spaces static methods are no longer static
The KOKKOS_TARGET macro prevents Kokkos::initialize to be called
twice in kokkos targets. Kokkos 4.2 no longer tolerates double
initialization, so we must prevent it.
Single Column Model updates & fixes for v3

Modify and fix the single column model (SCM) to make it compatible with v3. Specifically:

1. The config_component for SCM_EAM has been changed to “scam_generic”
   (which should have never been changed).  When running cases with the standardized
   scripts prescribed aerosols are used, which is not compatible with the
   scm_generic_chemUCI-Linoz-mam5-vbs component currently specified.
   (Aside: a new prescribed aerosol file has been generated by running E3SM with mam5.
   This file has been uploaded to the E3SM data server and the E3SM SCM scripts
   have been updated to use this file).

2. Idealization flags have been added to the P3 microphysics to allow for:
    i. Runs with no precipitation.  Note that in the P3 implementation this only
       turns off liquid precipitation, which is used in several GCSS boundary
       layer cloud cases in the E3SM SCM library.
   ii. Prescribed droplet concentration.  This was already a feature in place
       in P3 but this PR connects this functionality to the EAM namelist.

3. Fixes to allow the ability to “replay” a column once again (back by semi-popular demand!).
   This was a feature that worked in E3SMv1 but has not been functional for sometime.  The fix in
   this PR allows users to generate output needed to make IOP forcing files to replay a single column.
   Note that additional post-processing is required to actually do an SCM run (in which the user
   needs to consult with the E3SM wiki page).

Further documentation and scripts have been updated on the E3SM SCM wiki to reflect changes made in this PR
(https://github.com/E3SM-Project/scmlib/wiki/E3SM-Single-Column-Model-Home).

[BFB]
Create the provis_state subpool at RK4 initialization to avoid memory leak

This PR fixes a memory leak in the RK4 timestepping when running 125 day
single-layer barotropic tides cases with the vr45to5 mesh on pm-cpu.
Previously, it could only get through about 42 days of simulation before
running out of memory. This issue is related to creating/destroying the
provis_state subpool at each timestep.

Since RK4 is not used in E3SM, this PR is B4B for all E3SM tests. The
mpas_pool_copy_pool routine modified here is not used in MPAS-Seaice or
MALI.

[BFB]
xylar and others added 5 commits May 31, 2024 04:38
A bug was introduced in #6310 that made the ocean shallower
rather than deeper than the minimum allowed depth.  This
merge fixes that bug.
modify ``bin_to_cube'' according to https://github.com/NCAR/Topo.git for cubed base topo generation
read parameters from ``bin_to_cube.nl''
Reduce the warning printing from the atm chemistry implicit solver

This PR is to reduce unnecessary warning messages from the atm chemistry
implicit solver.

The implicit solver for atmospheric chemistry prints out warning messages
every time at each location when the solver can't converge at the current
delta-t. For most times, the convergence issue could be resolved by reducing
the chemistry solver delta-t. Therefore, these messages can be overwhelming
and bury other important messages.

For production runs, we don't need these messages if the implicit solver can
converge by reducing the delta-t two times (halving delta-t each time).
We removed the message at line 530, which is responsible for the excessive
warnings and duplicated the information especially for the production runs.
Additionally, we lift the threshold to report the convergence issue from 0 to 2.

[BFB]
Adding path to the timing binary directory to the set of include
directories. This change is required for these tests that need
path to the fortran perf mod.

This fixes build issues with the HOMMEBFB test suite using the
latest version of the SCORPIO library (new versions of the SCORPIO
library no longer export internal SCORPIO build paths with the
SCORPIO library targets)
…#6454)

Fix bottom depth deepening in global_ocean init mode

A bug was introduced in #6310 that made the ocean shallower rather than
deeper than the minimum allowed depth. This merge fixes that bug.

Fixes #6453
[BFB] -- mpas-ocean standalone only
jgfouca
jgfouca previously approved these changes Jun 3, 2024
rljacob and others added 9 commits June 3, 2024 16:54
Fixes standalone HOMME build issues with SCORPIO 1.6.3

[BFB]
Change Ocean conservationCheck output frequency

For reasons that are not entirely clear, we have found that the ocean's
conservation analysis member outputs with the wrong start time when the
output frequency is 1 month. This issue is fixed when the frequency is
increased (1 day and 10 days were tested). Here, we change the default
output frequency of the conservationCheckOutput stream to 1 day.

[BFB]
Carbon balance was never checked correctly and let through negative values.
Initialization of Ecosystem variables needed to be re-adjusted, and balance
checking for FATES needed to be changed, which causes round-off differences
in `CMASS_BALANCE_ERROR` for `fate_cold_allvars` test.

Also included divide-by-zero check in PhenologyMod.F90 to fix fpe in debug mode.

[non-BFB] for one FATES test.
Fixes #6120
Fixes #6203
Fixes #6177
Generalize cice-qc scripts to work with E3SM-Polar-Developer.sh output

Original scripts assumed that the run directories being compared are
named 'run'. The E3SM-Polar-Developer scripts append the run directory
names with .k000, .k001 etc for different combinations of configuration
options. This PR generalizes the cice-qc scripts to allow other run
directory names, which are input by the user. Also changed the sample
paths in the anvil script to use anvil paths rather than chrysalis
paths.

Changes only post-processing scripts. Tested on a new pair of runs
updating the icepack submodule (QC passed).

[BFB]
tweak test system to add -c option when threading

*********1*********2*********3*********4*********5*********6*********7**
Longer commit message body describing the commit.
Can contain lists as follows:
	* Item 1
	* Item 2
	* Item 3

A good commit message should be written like an email, a subject
followed by a blank line, followed by a more descriptive body.

Can also contain a tag at the bottom describing what type of commit this is.
[BFB] - Bit-For-Bit
[FCC] - Flag Climate Changing
[Non-BFB] - Non Bit-For-Bit
[CC] - Climate Changing
[NML] - Namelist Changing

See confluence for a more detailed description about these tags.
Add compsets used in v2 BGC simulations to master for better provenance and reproducibility.

The branch is a rebase of Xiaoying Shi's `acme-y9s/BGCV2_compsets` .

[BFB]
Add support for running standalone HOMME on PM-CPU

With this PR, standalone HOMME can be build on PM-CPU, with either GNU
or Intel compilers. In particular, both of these will work:

    ./create_test --machine pm-cpu --compiler intel HOMME_P24.f19_g16_rx1.A
    ./create_test --machine pm-cpu --compiler gnu HOMME_P24.f19_g16_rx1.A

Single cmake file (pm-cpu.cmake, to match CIME machine names) supports
both GNU and Intel without hardcoded netcdf paths.  Also updated test
scripts to set SRUN_CPUS_PER_TASK properly, otherwise openMP
performance is terrible.  removed assumption that fortran and C
compiler flags for openMP are the same ( they are different under PM's
oneAPI compilers)

[BFB]
Functional update to HOMME Perlmutter 'nocuda' machine files, which
previously caused errors when running DCMIP2016 Test 2 on
Perlmutter. Tools to enable building all DCMIP test cases for
standalone HOMME on Perlmutter, and to run DCMIP2016 Test 2 on
Perlmutter.

Updates HOMME Perlmutter 'nocuda' machine files to enable PIO.  Cleans
namelists for DCMIP2016 Test 2.  Adds NERSC jobscript for DCMIP2016
Test 2.  Adds NERSC build scripts for standalone HOMME.
[NML] Formatting only on DCMIP 2016 Test 2 namelists.
[BFB]
@jgfouca
Copy link
Member

jgfouca commented Jun 10, 2024

@bartgol , I'm trying to move this along. I'm not worried about NMLDIFFs on mappy CIME cases. I see a homme_shoc_cld_p3_rrtmgp_baseline_cmp fails on weaver and mappy, which is likely a diff for this test. Is that expected?

@bartgol
Copy link
Contributor Author

bartgol commented Jun 10, 2024

Thanks for helping Jim!

I looked at the baseline cmp test, and the failure is odd. CPRNC reports NetCDF: Index exceeds dimension bound. I think that means the number of time slices is different, and I don't know why that would be the case. Perhaps the baselines are corrupted. I will quickly investigate. If that's the case, we can merge.

Edit: re-running the tests, since the folder has since been pruned.

@bartgol bartgol added the AT: RETEST Force the autotester (AT) to retest the PR label Jun 10, 2024
@bartgol
Copy link
Contributor Author

bartgol commented Jun 10, 2024

Looking at the console output, I see that the time dim does indeed differ: Dimension time differs 2 /= 1. The first file is the one generated by the PR, while the second is the baseline. I don't know why the PR should do something different from the baselines, but this console snapshot suggests we may need to correct the output yaml file for the test:

Attribute max_snapshots_per_file from file1:          744  does not match that found on file2            1

@bartgol
Copy link
Contributor Author

bartgol commented Jun 10, 2024

@jgfouca A quick inspection showed that the upstream merge somehow brought in an old version of an input yaml file, causing the fail. I fixed it. Since the upstream merge was already a bit old (~20days), I wend ahead and redid the merge from scratch, to get the newest e3sm master. The only diff in the eamxx folder is now a simple stdlib include added, which was expected from the kokkos change.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5514
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;EKAT;e3sm-update
PULLREQUESTNUM 2835
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 14d5958
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 508ac99
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5786
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;EKAT;e3sm-update
PULLREQUESTNUM 2835
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 14d5958
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 508ac99
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: bartgol/upstream-merge-20240520
  • SHA: 14d5958
  • Mode: TEST_REPO

Pull Request Author: bartgol

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5514
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;EKAT;e3sm-update
PULLREQUESTNUM 2835
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 14d5958
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 508ac99
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 5786
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS AT: RETEST;AT: AUTOMERGE;EKAT;e3sm-update
PULLREQUESTNUM 2835
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 14d5958
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 508ac99
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Mappy # 5514 FAILED (click to see last 100 lines of console output)

Waiting for tests to finish
NLFAIL ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240611_072617_2qpx3m
NLFAIL ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240611_072617_2qpx3m
NLFAIL ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240611_072617_2qpx3m
NLFAIL ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu.C.20240611_072617_2qpx3m
NLFAIL PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240611_072617_2qpx3m
test-scheduler took 783.5733327865601 seconds'
+ errors='Waiting for tests to finish
NLFAIL ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240611_072617_2qpx3m
NLFAIL ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240611_072617_2qpx3m
NLFAIL ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240611_072617_2qpx3m
NLFAIL ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu.C.20240611_072617_2qpx3m
NLFAIL PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240611_072617_2qpx3m
test-scheduler took 783.5733327865601 seconds'
+ V1_FAILURES_DETAILS+='Waiting for tests to finish
NLFAIL ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240611_072617_2qpx3m
NLFAIL ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240611_072617_2qpx3m
NLFAIL ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240611_072617_2qpx3m
NLFAIL ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu.C.20240611_072617_2qpx3m
NLFAIL PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240611_072617_2qpx3m
test-scheduler took 783.5733327865601 seconds'
+ set +x
######################################################
FAILS DETECTED:
  SCREAM V1 TESTING FAILED!
Waiting for tests to finish
NLFAIL ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240611_072617_2qpx3m
NLFAIL ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240611_072617_2qpx3m
NLFAIL ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240611_072617_2qpx3m
NLFAIL ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30_ne30.F2010-SCREAMv1-DP-DYCOMSrf01.mappy_gnu.C.20240611_072617_2qpx3m
NLFAIL PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240611_072617_2qpx3m
NLFAIL SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 (but otherwise OK) RUN
    Case dir: /ascldap/users/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240611_072617_2qpx3m
test-scheduler took 783.5733327865601 seconds
######################################################
Build step 'Execute shell' marked build as failure
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 43729 killed;
[ssh-agent] Stopped.
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh

We're having issues with some test-launcher job hanging forever. So let's make sure we clean all penting test-launcher jobs

squeue -o"%.7i %u %40j" | grep e3sm-jenkins | grep test-launcher | awk '{ print $1 }' | xargs -r scancel

[SCREAM_PullRequest_Autotester_Mappy] $ /bin/bash -le /tmp/jenkins8883144254444740972.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE

SCREAM_PullRequest_Autotester_Weaver # 5786 PASSED (click to see last 100 lines of console output)

        Start 115: shoc_p3_nudging_glob_novert
115/132 Test #115: shoc_p3_nudging_glob_novert .............................   Passed    2.57 sec
        Start 116: homme_shoc_cld_p3_rrtmgp_np1
116/132 Test #116: homme_shoc_cld_p3_rrtmgp_np1 ............................   Passed    6.88 sec
        Start 117: homme_shoc_cld_p3_rrtmgp_baseline_cmp
117/132 Test #117: homme_shoc_cld_p3_rrtmgp_baseline_cmp ...................   Passed    0.10 sec
        Start 118: homme_shoc_cld_p3_rrtmgp_pg2_np1
118/132 Test #118: homme_shoc_cld_p3_rrtmgp_pg2_np1 ........................   Passed    6.51 sec
        Start 119: homme_shoc_cld_p3_rrtmgp_pg2_baseline_cmp
119/132 Test #119: homme_shoc_cld_p3_rrtmgp_pg2_baseline_cmp ...............   Passed    0.10 sec
        Start 120: model_baseline
120/132 Test #120: model_baseline ..........................................   Passed    7.66 sec
        Start 121: model_initial
121/132 Test #121: model_initial ...........................................   Passed    4.54 sec
        Start 122: model_restart
122/132 Test #122: model_restart ...........................................   Passed    5.66 sec
        Start 123: restarted_vs_monolithic_check_np1
123/132 Test #123: restarted_vs_monolithic_check_np1 .......................   Passed    0.10 sec
        Start 124: homme_shoc_cld_spa_p3_rrtmgp_np1
124/132 Test #124: homme_shoc_cld_spa_p3_rrtmgp_np1 ........................   Passed    4.38 sec
        Start 125: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp
125/132 Test #125: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp ...............   Passed    0.12 sec
        Start 126: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1
126/132 Test #126: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1 ..............   Passed    9.03 sec
        Start 127: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1
127/132 Test #127: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1 ...   Passed    1.23 sec
        Start 128: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp
128/132 Test #128: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp .....   Passed    0.64 sec
        Start 129: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1
129/132 Test #129: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1 .................   Passed   13.56 sec
        Start 130: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp
130/132 Test #130: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp ........   Passed    0.13 sec
        Start 131: homme_shoc_cld_p3_mam_optics_rrtmgp_np1
131/132 Test #131: homme_shoc_cld_p3_mam_optics_rrtmgp_np1 .................   Passed   10.18 sec
        Start 132: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp
132/132 Test #132: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp ........   Passed    0.15 sec

100% tests passed, 0 tests failed out of 132

Label Time Summary:
baseline_cmp = 92.68 secproc (16 tests)
baseline_gen = 175.95 sec
proc (18 tests)
bfbhash = 0.69 secproc (1 test)
check = 0.70 sec
proc (1 test)
cld = 11.79 secproc (3 tests)
cld_fraction = 2.27 sec
proc (1 test)
cxx baseline_cmp = 8.06 secproc (2 tests)
diagnostics = 35.34 sec
proc (22 tests)
driver = 28.98 secproc (8 tests)
dynamics = 5.20 sec
proc (3 tests)
fail = 33.72 secproc (4 tests)
io = 46.91 sec
proc (13 tests)
mam4_optics = 5.72 secproc (1 test)
nudging = 9.50 sec
proc (2 tests)
p3 = 141.99 secproc (7 tests)
p3_sk = 109.62 sec
proc (2 tests)
physics = 274.50 secproc (19 tests)
remap = 4.13 sec
proc (1 test)
rrtmgp = 25.90 secproc (9 tests)
shoc = 15.81 sec
proc (7 tests)
spa = 9.07 secproc (4 tests)
surface_coupling = 2.70 sec
proc (1 test)

Total Test time (real) = 671.25 sec

Testing '''14d595806787f0f3da2441ba680bfcc2f45a1428''' for test '''full_sp_debug'''

RUN: taskset -c 52-103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''14d595806787f0f3da2441ba680bfcc2f45a1428''' for test '''release'''

RUN: taskset -c 104-155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/release

Testing '''14d595806787f0f3da2441ba680bfcc2f45a1428''' for test '''full_debug'''

RUN: taskset -c 0-51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.0/gcc/11.3.0/openmpi/4.1.4/5ka6asw -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.0/gcc/11.3.0/openmpi/4.1.4/mdd6fth -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.4/52dibdr -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx/ctest-build/full_debug
OVERALL STATUS: PASS
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -i -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -i -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/5786/scream/components/eamxx
Completed analysis on weaver'

  • [[ 0 != 0 ]]
  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    Performing Post build task...
    Match found for : : True
    Logical operation result is TRUE
    Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins12817805894474871360.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Finished: SUCCESS

@E3SM-Autotester E3SM-Autotester removed the AT: RETEST Force the autotester (AT) to retest the PR label Jun 11, 2024
@bartgol
Copy link
Contributor Author

bartgol commented Jun 11, 2024

@jgfouca I think we're good to merge, what do you think?

@jgfouca jgfouca merged commit 6d57c6b into master Jun 11, 2024
15 of 16 checks passed
@jgfouca jgfouca deleted the bartgol/upstream-merge-20240520 branch June 11, 2024 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: AUTOMERGE Inform the autotester (AT) that it can merge this PR if reviewers approved, and tests pass e3sm-update EKAT
Projects
None yet
Development

Successfully merging this pull request may close these issues.