Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Frontier build to gfortran and latest rocm #2969

Merged

Conversation

trey-ornl
Copy link
Contributor

Added new cime_config/machines/cmake_macros/craygnu-hipcc.cmake file, which uses the Cray compiler wrappers with Gnu compilers, along with hipcc for AMD GPUs.
Changed Adios2 path to an appropriate OLCF build.
Deleted all Crusher files. Crusher no longer exists.

@trey-ornl
Copy link
Contributor Author

I used this branch, before the last merge with master, to try to reproduce the E3SM failure on the "bad node" of Frontier, which now has the alias borg106. The run did fail, while runs on other nodes completed without failing. Diffs of the e3sm.log.* files eventually varied in their hashes, however, typically starting with exxhash around line 569. Here are the runs:

/lustre/orion/world-shared/cli115/trey/olcfhelp-14845/case/2024-08-*

I gunzip-ed the e3sm.log files for the successful runs.

@trey-ornl
Copy link
Contributor Author

A two-node run of ne30 ran to completion on Frontier with this branch before the final merge with master.

/lustre/orion/world-shared/cli115/trey/scream-ne30-2node/case

@ambrad
Copy link
Member

ambrad commented Aug 26, 2024

Am I correct that the current s/w stack can't be run from this branch?

@trey-ornl
Copy link
Contributor Author

trey-ornl commented Aug 26, 2024

Am I correct that the current s/w stack can't be run from this branch?

Yes, as written, it replaces the current stack. Shall I modify it so that the previous stack can still be selected? It could key off of the compiler choice.

The change to components/eamxx/CMakeLists.txt could be tricky.

@ambrad
Copy link
Member

ambrad commented Aug 26, 2024

Tagging @rljacob @ndkeen @jgfouca @PeterCaldwell @AaronDonahue.

Trey, I don't know the right answers to your questions in terms of what everyone wants. But in terms of technical steps:

  1. We can give machines/compiler entries uniquifying names to permit parallel stacks.
  2. The CMakeLists.txt change could probably be handled with slightly better conditionals. In particular, I think querying the Fortran compiler ID might help, e.g., something based on CMAKE_Fortran_COMPILER_ID MATCHES "gfortran".

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@ambrad ambrad changed the title Move Frontier build to gfortran and latest rocm [WIP] Move Frontier build to gfortran and latest rocm Aug 26, 2024
@ambrad
Copy link
Member

ambrad commented Aug 26, 2024

I marked this PR as WIP so that the autotester doesn't run on it until it's ready.

@ambrad
Copy link
Member

ambrad commented Aug 26, 2024

I've isolated at least one nondeterministic diff to P3 (small-kernels version). I'll first try the deopt that worked for the (differently expressed) issue with P3 using ROCm 5.7. I'll also look at the monolithic version of P3. If nothing works, then I'll have to do some manual work to isolate the problem further.

@ambrad
Copy link
Member

ambrad commented Aug 26, 2024

@trey-ornl the P3 deoptimization is working. Thus, in your performance testing, if you haven't started already, I suggest adding this:

diff --git a/cime_config/machines/cmake_macros/craygnu-hipcc.cmake b/cime_config/machines/cmake_macros/craygnu-hipcc.                                                                                                                        
cmake                                                                                                                                                                                                                                        
index 8322c5d3a9..6cb79c0146 100644                                                                                                                                                                                                          
--- a/cime_config/machines/cmake_macros/craygnu-hipcc.cmake                                                                                                                                                                                  
+++ b/cime_config/machines/cmake_macros/craygnu-hipcc.cmake                                                                                                                                                                                  
@@ -5,7 +5,7 @@ set(SCC "cc")                                                                                                                                                                                                                
 set(SCXX "hipcc")                                                                                                                                                                                                                           
 set(SFC "ftn")                                                                                                                                                                                                                              
                                                                                                                                                                                                                                             
-string(APPEND CPPDEFS " -DLINUX -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU")                                                                                                                                                                     
+string(APPEND CPPDEFS " -DLINUX -DFORTRANUNDERSCORE -DNO_R16 -DCPRGNU -DSCREAM_SYSTEM_WORKAROUND_P3_PART2")                                                                                                                                 
 if (COMP_NAME STREQUAL gptl)                                                                                                                                                                                                                
     string(APPEND CPPDEFS " -DHAVE_NANOTIME -DBIT64 -DHAVE_SLASHPROC -DHAVE_COMM_F2C -DHAVE_TIMES -DHAVE_GETTIMEOFDA                                                                                                                        
Y")                                                                                                                                                                                                                                          
 endif()                                                                                                                                                                                                                                     
diff --git a/components/eamxx/src/physics/p3/disp/p3_main_impl_part2_disp.cpp b/components/eamxx/src/physics/p3/disp/                                                                                                                        
p3_main_impl_part2_disp.cpp                                                                                                                                                                                                                  
index 2b619d54bf..28445a35c3 100644                                                                                                                                                                                                          
--- a/components/eamxx/src/physics/p3/disp/p3_main_impl_part2_disp.cpp                                                                                                                                                                       
+++ b/components/eamxx/src/physics/p3/disp/p3_main_impl_part2_disp.cpp                                                                                                                                                                       
@@ -9,7 +9,9 @@ namespace p3 {                                                                                                                                                                                                               
  * Implementation of p3 main function. Clients should NOT #include                                                                                                                                                                          
  * this file, #include p3_functions.hpp instead.                                                                                                                                                                                            
  */                                                                                                                                                                                                                                         
-                                                                                                                                                                                                                                            
+#ifdef SCREAM_SYSTEM_WORKAROUND_P3_PART2                                                                                                                                                                                                    
+# pragma clang optimize off                                                                                                                                                                                                                 
+#endif                                                                                                                                                                                                                                      
 template <>                                                                                                                                                                                                                                 
 void Functions<Real,DefaultDevice>                                                                                                                                                                                                          
 ::p3_main_part2_disp(                                                                                                                                                                                                                       
@@ -130,7 +132,9 @@ void Functions<Real,DefaultDevice>                                                                                                                                                                                       
     if (!hydrometeorsPresent(i)) return;                                                                                                                                                                                                    
   });                                                                                                                                                                                                                                       
 }                                                                                                                                                                                                                                           
-                                                                                                                                                                                                                                            
+#ifdef SCREAM_SYSTEM_WORKAROUND_P3_PART2                                                                                                                                                                                                    
+# pragma clang optimize on                                                                                                                                                                                                                  
+#endif                                                                                                                                                                                                                                      
 } // namespace p3                                                                                                                                                                                                                           
 } // namespace scream

I'll edit this comment with more information as I gather it.

First, with the above, I got 11 passes, 0 fails, of ERS_P8x1_Ln90.ne30pg2_ne30pg2.F2010-SCREAMv1.frontier-scream-gpu_craygnu-hipcc.scream-internal_diagnostics_level. All final bfbhash lines are identical. Without the fix, this test failed every time.

Second, I tried monolithic-kernel P3 and get

0: exxhash>    1-  0.00000 0 de8745af60d78ad9 (p3-pre-sc-0)
1: Memory access fault by GPU node-9 (Agent handle: 0x18b37710) on address 0x7ffefffff000. Reason: Unknown.
2: Memory access fault by GPU node-6 (Agent handle: 0x18b37710) on address 0x7ffec094c000. Reason: Unknown.
6: Memory access fault by GPU node-4 (Agent handle: 0x18b37710) on address 0x7ffeffff9000. Reason: Unknown.
4: Failed to allocate file: Bad file descriptor
4: GPU core dump failed
0: Failed to allocate file: Bad file descriptor
0: GPU core dump failed
4: Memory access fault by GPU node-10 (Agent handle: 0x18b37710) on address 0x7ffd99999000. Reason: Unknown.

The core file with gdb doesn't give me a stack trace, but the hasher output is enough to know that the crash is in P3, as we would expect given this one change.

Third, I tried to deopt the corresponding "part2" section of code. It either doesn't solve the problem or the pragma doesn't work within a kernel. If I put the pragma outside the kernel invocation (deoptimizing the entire monolithic P3), then the test runs. I'm at 5 passes, 0 fails, all final bfbhash lines identical. The lines are also identical to the small-kernels-P3 case, although that isn't necessary.

@rljacob
Copy link
Member

rljacob commented Aug 26, 2024

@trey-ornl you should key off the compiler choice but we want to deprecate all machine entries with "scream" in the title. Maybe now would be a good time to do that.

@rljacob
Copy link
Member

rljacob commented Aug 26, 2024

Also I think the E3SM convention for this compiler option name is "craygnugpu" . We have not put the vendor-specific GPU language in the names. Not sure if we should.

@ndkeen
Copy link
Contributor

ndkeen commented Aug 26, 2024

Why would the compiler name not be just gnugpu?

@trey-ornl
Copy link
Contributor Author

Also I think the E3SM convention for this compiler option name is "craygnugpu" . We have not put the vendor-specific GPU language in the names. Not sure if we should.

"craygnugpu" says to me that the Cray wrappers will use the Gnu compilers to offload to GPUs. That is not the case here. The intended message for the name to convey is this: "Use Cray wrappers with Gnu compilers for the host, and use the AMD Hip compiler for the GPU (currently named 'hipcc')." Maybe "craygnuamdpgu" or "craygnu-amdgpu"?

@trey-ornl
Copy link
Contributor Author

Why would the compiler name not be just gnugpu?

The name "gnugpu" says to me that the Gnu compilers are used directly ("g++", "gfortran"), and that they are used to compile for the GPUs. Neither is true in this case.

@rljacob
Copy link
Member

rljacob commented Aug 27, 2024

@trey-ornl good point about the GPU compile. I guess craygnuamdpgu is what we should use.

Yes "craygnu" implies cray wrappers while "gnu" is just calling gfortran directly.

Ignoring ones with "scream", we currently have:

amdclang_frontier.cmake				crayclang_frontier.cmake
amdclanggpu_frontier.cmake			crayclanggpu_frontier.cmake
gnu_frontier.cmake		gnugpu_frontier.cmake

So we'd be adding "craygnuamdgpu_frontier.cmake"

@rljacob
Copy link
Member

rljacob commented Aug 27, 2024

Ignore my comment about editing the "frontier" entry. We'll do that later. Go ahead and add to frontier-scream-gpu

@trey-ornl
Copy link
Contributor Author

@ambrad, are the tests in components/eamxx/src/physics/p3/tests useful? Do you know if they detect the issue in p3_main_part2?

@ambrad
Copy link
Member

ambrad commented Aug 29, 2024

I don't use those tests for nondeterminism analysis and have not run them in years. I consider a single-node ne30 ERS test to be the most useful test configuration.

@rljacob
Copy link
Member

rljacob commented Aug 30, 2024

Why did you merge master in to this feature branch? We discourage that unless you have a specific reason.

@trey-ornl
Copy link
Contributor Author

trey-ornl commented Aug 30, 2024

Why did you merge master in to this feature branch? We discourage that unless you have a specific reason.

Newbie mistake. How do you test your branch with all the latest changes without merging them in first? Is the expectation that we only test our changes relative to where the branch started?

Please let me know if there is TFM that I should R.

@rljacob
Copy link
Member

rljacob commented Aug 31, 2024

To answer your question, I think CI does a merge to master and tests that. You could also do a merge to master in your local copy, test, then reset.

@mahf708
Copy link
Contributor

mahf708 commented Sep 3, 2024

How do you test your branch with all the latest changes without merging them in first? Is the expectation that we only test our changes relative to where the branch started?

Rebase (git fetch --all && git pull --rebase upstream master) and then force-push (git push --force)

@bartgol
Copy link
Contributor

bartgol commented Sep 5, 2024

If you are collaboratively working with someone else, it's even better to use git push --force-with-lease [...].

From the docs:

--force-with-lease alone, without specifying the details, will protect all remote refs that are going to be updated by requiring their current value to be the same as the remote-tracking branch we have for them.
So if someone else pushed something to the remote branch in the meantime, your force push will fail.

@AaronDonahue
Copy link
Contributor

@trey-ornl , is this PR ready to be "un-WIP'ed" and tested?

@AaronDonahue AaronDonahue added the AT: RETEST Force the autotester (AT) to retest the PR label Sep 25, 2024
@AaronDonahue
Copy link
Contributor

@trey-ornl , would ou mind manually testing on frontier with master merged in to make sure everything still works as expected? Once you report back the all-clear we can merge.

@trey-ornl
Copy link
Contributor Author

@trey-ornl , would ou mind manually testing on frontier with master merged in to make sure everything still works as expected? Once you report back the all-clear we can merge.

Just built and ran an ne30 performance test from @ndkeen yesterday on Frontier with these changes merged into master. New build (compiler craygnuamdgpu) ran 3% faster than the old (compiler crayclang-scream).

@trey-ornl
Copy link
Contributor Author

I did a test of the decadal run with the new craygnuamdgpu build, and it ran very very slowly. I tried the libfabric/1.20.1 workaround, and it then ran very (just one very instead of two) slowly. Then I changed the build to load module libfabric/1.15.2.0, and got performance as good as the "old" build. I updated this pull request with the change, along with an updated path to an appropriate build of Adios2 (with the new build modules + libfabric/1.15.2.0).

Timing results from 5-day decadal run on 2048 nodes of Frontier.
Existing crayclang-scream:

TOT Run Time:    3788.487 seconds      757.697 seconds/mday         0.31 myears/wday 

Previous version of new craygnuamdpgu:
Did not even reach the first hash output in a two-hour run.
Previous version of new craygnuamdgpu with FI_MR_CACHE_MONITOR=disabled:

TOT Run Time:    5789.003 seconds     1157.801 seconds/mday         0.20 myears/wday 

Current new craygnuamdgpu with module load libfabric/1.15.2.0:

TOT Run Time:    3753.817 seconds      750.763 seconds/mday         0.32 myears/wday

We will likely want to update the modules again once Frontier has a fixed version of Libfabric.

@trey-ornl trey-ornl changed the title [WIP] Move Frontier build to gfortran and latest rocm Move Frontier build to gfortran and latest rocm Sep 30, 2024
@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@ambrad
Copy link
Member

ambrad commented Oct 1, 2024

Looks good. Visual inspection shows that the current build (crayclang-scream) is available again after the most recent commits to this PR. Trey, have you checked that crayclang-scream in this PR reproduces crayclang-scream on master? One way to check would be to diff the .env_mach_specific.sh files from branch and master configurations. If they're the same, I'm ready to approve. Thanks for adding a GNU configuration; that will be great to have and hopefully make default in future simulation campaigns.

@mahf708 mahf708 added the AT: PRE-TEST INSPECTED When pre-test inspection is required, set this label to pass the inspection label Oct 1, 2024
@E3SM-Autotester E3SM-Autotester removed the AT: PRE-TEST INSPECTED When pre-test inspection is required, set this label to pass the inspection label Oct 1, 2024
@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6089
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS Machine File;AT: RETEST;Frontier
PULLREQUESTNUM 2969
SCREAM_SOURCE_REPO https://github.com/trey-ornl/scream
SCREAM_SOURCE_SHA e6c67b9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 798bfa6
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5859
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS Machine File;AT: RETEST;Frontier
PULLREQUESTNUM 2969
SCREAM_SOURCE_REPO https://github.com/trey-ornl/scream
SCREAM_SOURCE_SHA e6c67b9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 798bfa6
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (trey-ornl/scream)
  • Branch: trey/cime_config/frontier-gnu
  • SHA: e6c67b9
  • Mode: TEST_REPO

Pull Request Author: trey-ornl

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6089
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS Machine File;AT: RETEST;Frontier
PULLREQUESTNUM 2969
SCREAM_SOURCE_REPO https://github.com/trey-ornl/scream
SCREAM_SOURCE_SHA e6c67b9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 798bfa6
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5859
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS Machine File;AT: RETEST;Frontier
PULLREQUESTNUM 2969
SCREAM_SOURCE_REPO https://github.com/trey-ornl/scream
SCREAM_SOURCE_SHA e6c67b9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 798bfa6
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6089 PASSED (click to see last 100 lines of console output)

142/157 Test #142: model_initial .........................................................   Passed    6.06 sec
        Start 143: model_restart
143/157 Test #143: model_restart .........................................................   Passed    7.11 sec
        Start 144: restarted_vs_monolithic_check_np1
144/157 Test #144: restarted_vs_monolithic_check_np1 .....................................   Passed    0.12 sec
        Start 145: homme_shoc_cld_spa_p3_rrtmgp_np1
145/157 Test #145: homme_shoc_cld_spa_p3_rrtmgp_np1 ......................................   Passed   13.96 sec
        Start 146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp
146/157 Test #146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp .............................   Passed    0.12 sec
        Start 147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1
147/157 Test #147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1 ............................   Passed   17.42 sec
        Start 148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1
148/157 Test #148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1 .................   Passed    1.46 sec
        Start 149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp
149/157 Test #149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp ...................   Passed    0.59 sec
        Start 150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1
150/157 Test #150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1 ...............................   Passed   13.02 sec
        Start 151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp
151/157 Test #151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp ......................   Passed    0.09 sec
        Start 152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1
152/157 Test #152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1 ...............................   Passed   19.81 sec
        Start 153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp
153/157 Test #153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp ......................   Passed    0.16 sec
        Start 154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1
154/157 Test #154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1 ............   Passed   21.32 sec
        Start 155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp
155/157 Test #155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp ...   Passed    0.15 sec
        Start 156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1
156/157 Test #156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1 .........................   Passed   41.83 sec
        Start 157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp
157/157 Test #157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp ................   Passed    0.22 sec

100% tests passed, 0 tests failed out of 157

Label Time Summary:
baseline_cmp = 150.76 secproc (23 tests)
baseline_gen = 375.04 sec
proc (25 tests)
bfbhash = 0.89 secproc (1 test)
check = 0.89 sec
proc (1 test)
cld = 62.53 secproc (7 tests)
cld_fraction = 1.17 sec
proc (1 test)
cxx baseline_cmp = 10.97 secproc (2 tests)
diagnostics = 52.71 sec
proc (23 tests)
driver = 115.67 secproc (16 tests)
dynamics = 5.14 sec
proc (3 tests)
fail = 30.38 secproc (5 tests)
io = 52.87 sec
proc (14 tests)
mam4_aci = 40.21 secproc (4 tests)
mam4_constituent_fluxes = 8.36 sec
proc (1 test)
mam4_drydep = 3.66 secproc (1 test)
mam4_optics = 8.62 sec
proc (1 test)
mam4_srf_online_emiss = 8.36 secproc (1 test)
mam4_wetscav = 26.32 sec
proc (2 tests)
nudging = 11.38 secproc (2 tests)
p3 = 128.97 sec
proc (12 tests)
p3_sk = 34.77 secproc (2 tests)
physics = 211.88 sec
proc (27 tests)
remap = 3.81 secproc (1 test)
rrtmgp = 60.48 sec
proc (11 tests)
shoc = 78.34 secproc (13 tests)
spa = 12.09 sec
proc (4 tests)
surface_coupling = 1.63 sec*proc (1 test)

Total Test time (real) = 881.08 sec

Testing '''911b7a6469fd40e91219ef4ca68e60d6c8c13a99''' for test '''full_sp_debug'''

RUN: taskset -c 52-103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''911b7a6469fd40e91219ef4ca68e60d6c8c13a99''' for test '''release'''

RUN: taskset -c 104-155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/release

Testing '''911b7a6469fd40e91219ef4ca68e60d6c8c13a99''' for test '''full_debug'''

RUN: taskset -c 0-51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx/ctest-build/full_debug
OVERALL STATUS: PASS
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6089/scream/components/eamxx
Completed analysis on weaver'

  • [[ 0 != 0 ]]
  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    Performing Post build task...
    Match found for : : True
    Logical operation result is TRUE
    Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins8217132087345492842.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Finished: SUCCESS

SCREAM_PullRequest_Autotester_Mappy # 5859 FAILED (click to see last 100 lines of console output)

+ V1_FAILURES_DETAILS+='Waiting for tests to finish
PASS ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240930_235845_r85ku9
PASS ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240930_235845_r85ku9
PASS ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97 (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01 (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu.C.20240930_235845_r85ku9
PASS PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav.C.20240930_235845_r85ku9
PASS SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240930_235845_r85ku9
test-scheduler took 1977.9384441375732 seconds'
+ set +x
######################################################
FAILS DETECTED:
  SCREAM V1 TESTING FAILED!
Waiting for tests to finish
PASS ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240930_235845_r85ku9
PASS ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20240930_235845_r85ku9
PASS ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5.C.20240930_235845_r85ku9
PASS ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97 (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01 (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01.C.20240930_235845_r85ku9
DIFF ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu (phase BASELINE)
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu.C.20240930_235845_r85ku9
PASS PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20240930_235845_r85ku9
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav.C.20240930_235845_r85ku9
PASS SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20240930_235845_r85ku9
test-scheduler took 1977.9384441375732 seconds
######################################################
Build step 'Execute shell' marked build as failure
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 2645528 killed;
[ssh-agent] Stopped.
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh

We're having issues with some test-launcher job hanging forever. So let's make sure we clean all penting test-launcher jobs

squeue -o"%.7i %u %40j" | grep e3sm-jenkins | grep test-launcher | awk '{ print $1 }' | xargs -r scancel

[SCREAM_PullRequest_Autotester_Mappy] $ /bin/bash -le /tmp/jenkins16963833232449659901.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE

@E3SM-Autotester E3SM-Autotester removed the AT: RETEST Force the autotester (AT) to retest the PR label Oct 1, 2024
@trey-ornl
Copy link
Contributor Author

trey-ornl commented Oct 1, 2024

@ambrad I confirmed with the decadal run on Frontier that master and this branch using crayclang-scream generate identical .env-mach-specific.sh files.

@mahf708 mahf708 added AT: Integrate Without Testing AT: RETEST Force the autotester (AT) to retest the PR AT: AUTOMERGE Inform the autotester (AT) that it can merge this PR if reviewers approved, and tests pass AT: PRE-TEST INSPECTED When pre-test inspection is required, set this label to pass the inspection labels Oct 1, 2024
@mahf708
Copy link
Contributor

mahf708 commented Oct 1, 2024

Since Andrew approved, and the fail reported is unrelated to this PR as expected, I have set this PR to automerge and skipped testing. Other frontier work depends on this PR, so hopefully this will let the PR sooner rather than later. Thanks @trey-ornl! 🎉

@AaronDonahue
Copy link
Contributor

I'll go ahead and manually merge now that we have approval and confirmation that everything works as expected on frontier. Thanks for putting this in @trey-ornl !

@AaronDonahue AaronDonahue merged commit a269ef9 into E3SM-Project:master Oct 1, 2024
3 of 4 checks passed
@trey-ornl trey-ornl deleted the trey/cime_config/frontier-gnu branch October 3, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AT: AUTOMERGE Inform the autotester (AT) that it can merge this PR if reviewers approved, and tests pass AT: Integrate Without Testing AT: PRE-TEST INSPECTED When pre-test inspection is required, set this label to pass the inspection AT: RETEST Force the autotester (AT) to retest the PR Frontier Machine File
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants