Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flush and close output files that are full #3032

Merged
merged 8 commits into from
Oct 17, 2024

Conversation

AaronDonahue
Copy link
Contributor

This commit causes the output manager to flush and close a file if the max number of snapshots has been reached.

This is mainly an issue when the simulation exits abnormally before SCORPIO has flushed a file. The result is that the output file exists, but is empty. Now, the output manager will check if the maximum number of snapshots allowable in the file has been reached, and if so it forces the file to be flushed and closed.

Fixes #3026

This commit addresses an issue with empty output files that should have
been flushed and closed.  If the simulation exits abnormally before a
file has been flushed then the file will be empty.

Before, we left it up to SCORPIO to decide the optimal time to flush
output.  In this commit we force a file that is full, i.e. max_snapshots
has been reached, to be flushed and closed before moving on.  This
should ensure that all full files are written before any chance of an
abnormal exit.
@AaronDonahue
Copy link
Contributor Author

AaronDonahue commented Oct 7, 2024

@bartgol I noticed you already had a is_file_full check in the filespecs for IO but it was commented out. So I revived that and used it here to force the flushing. I tested using my other branch from #3032 and it did indeed populate the file that was empty before.

Since this PR is dependent on #3031 I am labeling it as WIP until that is merged. But in the meantime I wanted to solicit any comments on this approach.

@AaronDonahue AaronDonahue added the AT: WIP Inform the autotester (AT) that the PR is a work in progress, and should not be tested label Oct 7, 2024
Copy link

github-actions bot commented Oct 7, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://E3SM-Project.github.io/scream/pr-preview/pr-3032/
on branch gh-pages at 2024-10-16 21:10 UTC

@bartgol
Copy link
Contributor

bartgol commented Oct 7, 2024

@bartgol I noticed you already had a is_file_full check in the filespecs for IO but it was commented out. So I revived that and used it here to force the flushing. I tested using my other branch from #3032 and it did indeed populate the file that was empty before.

Since this PR is dependent on #3032 I am labeling it as WIP until that is merged. But in the meantime I wanted to solicit any comments on this approach.

I'm guessing you wanted to link a different PR? 3032 is this PR...

bartgol
bartgol previously requested changes Oct 7, 2024
Copy link
Contributor

@bartgol bartgol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to use snapshot_fits to accommodate also a different storage type.

@@ -86,7 +86,10 @@ struct IOFileSpecs {
// If positive, flush the output file every these many snapshots
int flush_frequency = std::numeric_limits<int>::max();

// bool file_is_full () const { return num_snapshots_in_file>=max_snapshots_in_file; }
bool file_is_full () const {
return storage.num_snapshots_in_file>=storage.max_snapshots_in_file;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only works if the storage type is based on max number of snapshots. It will not work in case one chooses "one_month" or "one_year" (the former being quite appealing for single month data).

Copy link
Contributor

@bartgol bartgol Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that this only works for max-snaps based storage may very well be the case why it was commented out. I believe I switched to using snapshot_fits precisely for this reason.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent point. Thank you for pointing this out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the simulation writes a restart file and the rpointers are all consistent, and then the simulation fails, will a file that is still open (for a year, for example) still have the risk of containing 0s instead of flushed data in a time period that is prior to the valid restart? I.e., could we still end up with 0 data?

Perhaps this is already done, but it seems to me the only way to guarantee things is by flushing all files prior to writing the rpointer file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambrad we also have a separate todo item: every time we write a rhist file, also flush the corresponding output file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And the other todo item is significantly more important than this fwiw)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ambrad was this just a question to ensure we're not forgetting anything, or do you have a scenario where you think that would/should be the case (so flushing at rhist write would not be enough)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a flush_all type call that is initiated whenever a restart is written? Wouldn't that remedy the concern of having all 0's if a fail occurs after a restart is written?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartgol, right, to ensure we're not forgetting anything. But I agree with Aaron that if the AD has a list of open write-mode files, why not just iterate through the list and flush them all before writing the rpointer file? You'd then skip more file-specific flushing. I don't think there can be an open write-mode file that shouldn't be flushed at a restart write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the AD has a handle to all files directly. But each output manger has a handle to its output and rhist files, so when the rhist file is written, it can flush the output one too.

And as Naser said, in EAMxx (for some technical reasons) we always write a rhist file, even if there is no "restart data" (e.g., for INSTANT output) and even if we just wrote in the output file. So flushing the .h file when the corresponding .rhist is flushed, should cover every output file.

@@ -550,6 +550,12 @@ void OutputManager::run(const util::TimeStamp& timestamp)
if (filespecs.file_needs_flush()) {
flush_file (filespecs.filename);
}

// Check if we have hit the max number of snapshots and need to close the file
if (filespecs.file_is_full()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using this method, we should use

  if (filespecs.storage.snapshot_fits (m_output_control.next_write_ts))

Caveat: you need to ensure that m_output_control.compute_next_write_ts() has been called before you attempt to use next_write_ts. I believe at the point where you added these mods, this will be the case, but you may want to double check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, just confirmed that compute_next_write_ts() is called before this chunk of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bartgol do you mean the negation of snapshot_fits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes, good call. @AaronDonahue see andrew's comment, so don't just copy paste the line I wrote.

@bartgol
Copy link
Contributor

bartgol commented Oct 7, 2024

One more thing: if you switch to closing the file as soon as it's full, then you can also get rid of these lines

    if (filespecs.is_open and not filespecs.storage.snapshot_fits(snapshot_start)) {
      release_file(filespecs.filename);
      filespecs.close();
    }

since we should never hit this scenario anymore.

@AaronDonahue
Copy link
Contributor Author

@bartgol I noticed you already had a is_file_full check in the filespecs for IO but it was commented out. So I revived that and used it here to force the flushing. I tested using my other branch from #3032 and it did indeed populate the file that was empty before.
Since this PR is dependent on #3032 I am labeling it as WIP until that is merged. But in the meantime I wanted to solicit any comments on this approach.

I'm guessing you wanted to link a different PR? 3032 is this PR...

doh! Too many open windows next to each other. Thanks for the correction, I meant 3031

@AaronDonahue
Copy link
Contributor Author

One more thing: if you switch to closing the file as soon as it's full, then you can also get rid of these lines

    if (filespecs.is_open and not filespecs.storage.snapshot_fits(snapshot_start)) {
      release_file(filespecs.filename);
      filespecs.close();
    }

since we should never hit this scenario anymore.

@bartgol do I need the filespecs.close() line for my changes?

@AaronDonahue AaronDonahue removed the AT: WIP Inform the autotester (AT) that the PR is a work in progress, and should not be tested label Oct 7, 2024
@bartgol
Copy link
Contributor

bartgol commented Oct 8, 2024

One more thing: if you switch to closing the file as soon as it's full, then you can also get rid of these lines

    if (filespecs.is_open and not filespecs.storage.snapshot_fits(snapshot_start)) {
      release_file(filespecs.filename);
      filespecs.close();
    }

since we should never hit this scenario anymore.

@bartgol do I need the filespecs.close() line for my changes?

Aren't you closing the file when you first find out it was full?

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6128
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA eabd70f
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5890
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA eabd70f
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: eabd70f
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6128
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA eabd70f
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5890
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA eabd70f
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6128 FAILED (click to see last 100 lines of console output)

CMake Error at /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/ctest_script.cmake:76 (message):
  Test had fails

===============================================================================
Testing '''49f82de844eac302dc95b7489a657e3174301205''' for test '''full_sp_debug'''

RUN: taskset -c 52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''49f82de844eac302dc95b7489a657e3174301205''' for test '''release'''

RUN: taskset -c 104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/release

Testing '''49f82de844eac302dc95b7489a657e3174301205''' for test '''full_debug'''

RUN: taskset -c 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx/ctest-build/full_debug
Build type full_debug failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Build type full_sp_debug failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Build type release failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx
weaver failed'

  • errors='Build type full_debug failed at testing time. Here'''s a list of failed tests:
    4:io_monthly_np1

Build type full_sp_debug failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Build type release failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx
weaver failed'

  • SA_FAILURES_DETAILS+='Build type full_debug failed at testing time. Here'''s a list of failed tests:
    4:io_monthly_np1

Build type full_sp_debug failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Build type release failed at testing time. Here'''s a list of failed tests:
4:io_monthly_np1

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx
weaver failed'

  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    ######################################################
    FAILS DETECTED:
    SCREAM STANDALONE TESTING FAILED!
    Build type full_debug failed at testing time. Here's a list of failed tests:
    4:io_monthly_np1

Build type full_sp_debug failed at testing time. Here's a list of failed tests:
4:io_monthly_np1

Build type release failed at testing time. Here's a list of failed tests:
4:io_monthly_np1

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6128/scream/components/eamxx
weaver failed
######################################################
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins8600751239369240865.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE

SCREAM_PullRequest_Autotester_Mappy # 5890 FAILED (click to see last 100 lines of console output)

Starting RUN for test PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 with 1 proc on interactive node and 64 procs on compute nodes
Finished RUN for test PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 in 1.777873 seconds (PEND). [COMPLETED 16 of 17]
Finished MODEL_BUILD for test ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 in 770.873622 seconds (PASS)
Starting RUN for test ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 with 1 proc on interactive node and 64 procs on compute nodes
Finished RUN for test ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 in 0.733301 seconds (PEND). [COMPLETED 17 of 17]
Waiting for tests to finish
PASS ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_D_Lh4.ne4_ne4.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20241008_064318_zhxn1f
PASS ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERP_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-4.C.20241008_064318_zhxn1f
PASS ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_D_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-rad_frequency_2--scream-output-preset-5.C.20241008_064318_zhxn1f
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels--scream-output-preset-5.C.20241008_064318_zhxn1f
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_p3--scream-output-preset-5.C.20241008_064318_zhxn1f
PASS ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln22.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-small_kernels_shoc--scream-output-preset-5.C.20241008_064318_zhxn1f
PASS ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_Ln9.ne4_ne4.F2000-SCREAMv1-AQP1.mappy_gnu.scream-output-preset-2.C.20241008_064318_zhxn1f
PASS ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-arm97.C.20241008_064318_zhxn1f
PASS ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-comble.C.20241008_064318_zhxn1f
PASS ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FIOP-SCREAMv1-DP.mappy_gnu.scream-dpxx-dycomsrf01.C.20241008_064318_zhxn1f
PASS ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/ERS_P16_Ln22.ne30pg2_ne30pg2.FRCE-SCREAMv1-DP.mappy_gnu.C.20241008_064318_zhxn1f
PASS PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/PET_Ln9_P32x2.ne4pg2_ne4pg2.F2010-SCREAMv1.mappy_gnu.scream-output-preset-1.C.20241008_064318_zhxn1f
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-aci.C.20241008_064318_zhxn1f
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-drydep.C.20241008_064318_zhxn1f
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-optics.C.20241008_064318_zhxn1f
PASS SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln5.ne4pg2_oQU480.F2010-SCREAMv1-MPASSI.mappy_gnu.scream-mam4xx-wetscav.C.20241008_064318_zhxn1f
PASS SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3 RUN
    Case dir: /home/e3sm-jenkins/acme/scratch/SMS_D_Ln9.ne4_ne4.F2010-SCREAMv1-noAero.mappy_gnu.scream-output-preset-3.C.20241008_064318_zhxn1f
test-scheduler took 2090.8968183994293 seconds'
+ [[ 0 != 0 ]]
+ set +x
######################################################
FAILS DETECTED:
  SCREAM STANDALONE TESTING FAILED!
Build type full_debug failed at testing time. Here's a list of failed tests:
10:io_monthly_np1
11:io_monthly_np2
12:io_monthly_np3
13:io_monthly_np4

Build type full_sp_debug failed at testing time. Here's a list of failed tests:
10:io_monthly_np1
11:io_monthly_np2
12:io_monthly_np3
13:io_monthly_np4

Build type debug_nopack_fpe failed at testing time. Here's a list of failed tests:
10:io_monthly_np1
11:io_monthly_np2
12:io_monthly_np3
13:io_monthly_np4

Build type release failed at testing time. Here's a list of failed tests:
10:io_monthly_np1
11:io_monthly_np2
12:io_monthly_np3
13:io_monthly_np4

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on mappy with cmd: cd /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5890/scream/components/eamxx && source /projects/sems/modulefiles/utils/sems-modules-init.sh && module purge && module load sems-cmake/3.27.9 sems-git/2.42.0 sems-gcc/11.4.0 sems-openmpi-no-cuda/4.1.6 sems-netcdf-c/4.9.2 sems-netcdf-cxx/4.2 sems-netcdf-fortran/4.6.1 sems-parallel-netcdf/1.12.3 sems-openblas && export GATOR_INITIAL_MB=4000MB && export OMP_PROC_BIND=spread && true && ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m mappy
RUN: cd /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5890/scream/components/eamxx && source /projects/sems/modulefiles/utils/sems-modules-init.sh && module purge && module load sems-cmake/3.27.9 sems-git/2.42.0 sems-gcc/11.4.0 sems-openmpi-no-cuda/4.1.6 sems-netcdf-c/4.9.2 sems-netcdf-cxx/4.2 sems-netcdf-fortran/4.6.1 sems-parallel-netcdf/1.12.3 sems-openblas && export GATOR_INITIAL_MB=4000MB && export OMP_PROC_BIND=spread && true && ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m mappy
FROM: /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5890/scream/components/eamxx
mappy failed
######################################################
Build step 'Execute shell' marked build as failure
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 3743062 killed;
[ssh-agent] Stopped.
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh

We're having issues with some test-launcher job hanging forever. So let's make sure we clean all penting test-launcher jobs

squeue -o"%.7i %u %40j" | grep e3sm-jenkins | grep test-launcher | awk '{ print $1 }' | xargs -r scancel

[SCREAM_PullRequest_Autotester_Mappy] $ /bin/bash -le /tmp/jenkins11948053557837947249.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE

@AaronDonahue
Copy link
Contributor Author

One more thing: if you switch to closing the file as soon as it's full, then you can also get rid of these lines

    if (filespecs.is_open and not filespecs.storage.snapshot_fits(snapshot_start)) {
      release_file(filespecs.filename);
      filespecs.close();
    }

since we should never hit this scenario anymore.

@bartgol do I need the filespecs.close() line for my changes?

Aren't you closing the file when you first find out it was full?

Oh duh. Sorry, when I read this line just to delete it I thought it was "closing the FileSpecs" object which I didn't remember doing. But yes, I already have this line...

@AaronDonahue
Copy link
Contributor Author

Looks like we have a fail. I'll investigate

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6151
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5908
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: c99a7d9
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6151
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5908
  • Status: FAILED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6151 PASSED (click to see last 100 lines of console output)

        Start 143: model_restart
143/157 Test #143: model_restart .........................................................   Passed    7.09 sec
        Start 144: restarted_vs_monolithic_check_np1
144/157 Test #144: restarted_vs_monolithic_check_np1 .....................................   Passed    0.13 sec
        Start 145: homme_shoc_cld_spa_p3_rrtmgp_np1
145/157 Test #145: homme_shoc_cld_spa_p3_rrtmgp_np1 ......................................   Passed    6.17 sec
        Start 146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp
146/157 Test #146: homme_shoc_cld_spa_p3_rrtmgp_baseline_cmp .............................   Passed    0.12 sec
        Start 147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1
147/157 Test #147: homme_shoc_cld_spa_p3_rrtmgp_128levels_np1 ............................   Passed    8.70 sec
        Start 148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1
148/157 Test #148: homme_shoc_cld_spa_p3_rrtmgp_128levels_tend_check_np1 .................   Passed    1.42 sec
        Start 149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp
149/157 Test #149: homme_shoc_cld_spa_p3_rrtmgp_128levels_baseline_cmp ...................   Passed    0.65 sec
        Start 150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1
150/157 Test #150: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_np1 ...............................   Passed   13.02 sec
        Start 151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp
151/157 Test #151: homme_shoc_cld_spa_p3_rrtmgp_pg2_dp_baseline_cmp ......................   Passed    0.11 sec
        Start 152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1
152/157 Test #152: homme_shoc_cld_p3_mam_optics_rrtmgp_np1 ...............................   Passed   16.52 sec
        Start 153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp
153/157 Test #153: homme_shoc_cld_p3_mam_optics_rrtmgp_baseline_cmp ......................   Passed    0.13 sec
        Start 154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1
154/157 Test #154: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_np1 ............   Passed   17.64 sec
        Start 155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp
155/157 Test #155: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep_baseline_cmp ...   Passed    0.23 sec
        Start 156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1
156/157 Test #156: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_np1 .........................   Passed   37.38 sec
        Start 157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp
157/157 Test #157: homme_shoc_cld_spa_p3_rrtmgp_mam4_wetscav_baseline_cmp ................   Passed    0.14 sec

100% tests passed, 0 tests failed out of 157

Label Time Summary:
baseline_cmp = 139.27 secproc (23 tests)
baseline_gen = 337.32 sec
proc (25 tests)
bfbhash = 0.89 secproc (1 test)
check = 0.95 sec
proc (1 test)
cld = 44.63 secproc (7 tests)
cld_fraction = 4.25 sec
proc (1 test)
cxx baseline_cmp = 10.10 secproc (2 tests)
diagnostics = 42.67 sec
proc (23 tests)
driver = 95.77 secproc (16 tests)
dynamics = 8.13 sec
proc (3 tests)
fail = 30.03 secproc (5 tests)
io = 57.02 sec
proc (14 tests)
mam4_aci = 23.31 secproc (4 tests)
mam4_constituent_fluxes = 7.55 sec
proc (1 test)
mam4_drydep = 3.63 secproc (1 test)
mam4_optics = 4.06 sec
proc (1 test)
mam4_srf_online_emiss = 7.55 secproc (1 test)
mam4_wetscav = 24.65 sec
proc (2 tests)
nudging = 11.94 secproc (2 tests)
p3 = 111.84 sec
proc (12 tests)
p3_sk = 31.68 secproc (2 tests)
physics = 189.05 sec
proc (27 tests)
remap = 5.68 secproc (1 test)
rrtmgp = 43.12 sec
proc (11 tests)
shoc = 59.08 secproc (13 tests)
spa = 11.40 sec
proc (4 tests)
surface_coupling = 4.28 sec*proc (1 test)

Total Test time (real) = 816.28 sec

Testing '''296cfb1368a106ccc6f0084ca29f00cb68ae5fa1''' for test '''full_sp_debug'''

RUN: taskset -c 52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_sp_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_sp_debug -DBUILD_NAME_MOD=full_sp_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DSCREAM_DOUBLE_PRECISION=False -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_sp_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_sp_debug

Testing '''296cfb1368a106ccc6f0084ca29f00cb68ae5fa1''' for test '''full_debug'''

RUN: taskset -c 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_debug/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_debug -DBUILD_NAME_MOD=full_debug -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Debug -DEKAT_DEFAULT_BFB=True -DKokkos_ENABLE_DEBUG_BOUNDS_CHECK=True -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/full_debug" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/full_debug

Testing '''296cfb1368a106ccc6f0084ca29f00cb68ae5fa1''' for test '''release'''

RUN: taskset -c 104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155 sh -c '''SCREAM_BUILD_PARALLEL_LEVEL=52 CTEST_PARALLEL_LEVEL=1 ctest -V --output-on-failure --resource-spec-file /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/release/ctest_resource_file.json -DNO_SUBMIT=True -DBUILD_WORK_DIR=/home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/release -DBUILD_NAME_MOD=release -S /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/ctest_script.cmake -DCTEST_SITE=weaver -DCMAKE_COMMAND="-C /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/cmake/machine-files/weaver.cmake -DNetCDF_Fortran_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-fortran/4.6.1/gcc/11.3.0/openmpi/4.1.6/5tv5psl -DNetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/netcdf-c/4.9.2/gcc/11.3.0/openmpi/4.1.6/pyuuqd3 -DPnetCDF_C_PATH=/projects/ppc64le-pwr9-rhel8/tpls/parallel-netcdf/1.12.3/gcc/11.3.0/openmpi/4.1.6/2s52shy -DCMAKE_BUILD_TYPE=Release -DEKAT_DISABLE_TPL_WARNINGS='''''''''ON''''''''' -DCMAKE_CXX_COMPILER=mpicxx -DCMAKE_C_COMPILER=mpicc -DCMAKE_Fortran_COMPILER=mpifort -DSCREAM_DYNAMICS_DYCORE=HOMME -DSCREAM_TEST_MAX_TOTAL_THREADS=1 -DSCREAM_BASELINES_DIR=/home/projects/e3sm/scream/pr-autotester/master-baselines/weaver/release" '''
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx/ctest-build/release
OVERALL STATUS: PASS
Starting analysis on weaver with cmd: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
RUN: cd /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx && source /etc/profile.d/modules.sh && module purge && module load cmake/3.25.1 git/2.39.1 python/3.10.8 py-netcdf4/1.5.8 gcc/11.3.0 cuda/11.8.0 openmpi netcdf-c netcdf-fortran parallel-netcdf netlib-lapack && export HDF5_USE_FILE_LOCKING=FALSE && true && bsub -I -q rhel8 -n 4 -gpu num=4 ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m weaver
FROM: /home/e3sm-jenkins/weaver/workspace/SCREAM_PullRequest_Autotester_Weaver/6151/scream/components/eamxx
Completed analysis on weaver'

  • [[ 0 != 0 ]]
  • [[ 1 == 0 ]]
  • [[ weaver == \m\a\p\p\y ]]
  • set +x
    Performing Post build task...
    Match found for : : True
    Logical operation result is TRUE
    Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins1216082531371680173.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: SUCCESS

SCREAM_PullRequest_Autotester_Mappy # 5908 FAILED (click to see last 100 lines of console output)

[ 63%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_cloud_rain_acc.cpp.o
[ 63%] Linking CXX static library libnudging.a
[ 63%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_compute_shoc_temperature_disp.cpp.o
[ 63%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_calc_rime_density.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_diag_obklen_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_cldliq_imm_freezing.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_rain_imm_freezing.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_pblintd_cldcheck.cpp.o
[ 64%] Built target nudging
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_droplet_self_coll.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_evaporate_rain.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_pblintd_height.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/eti/p3_prevent_liq_supersaturation.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_impose_max_total_ni.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_pblintd_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_calc_liq_relaxation_timescale.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_length_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_check_values_impl_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_ice_sed_impl_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_tke_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_update_prognostics_implicit_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_main_impl_part1_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_ice_relaxation_timescale.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_ice_nucleation.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_main_impl_part3_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_diag_second_shoc_moments_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_pblintd_init_pot.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_diag_third_shoc_moments_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_pblintd_surf_temp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_tke.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_tridiag_solver.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_update_host_dse.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_ice_cldliq_wet_growth.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_assumed_pdf_disp.cpp.o
[ 64%] Building CXX object src/physics/shoc/CMakeFiles/shoc_sk.dir/disp/shoc_update_host_dse_disp.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_check_values.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_incloud_mixingratios.cpp.o
[ 64%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_subgrid_variance_scaling.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_main.cpp.o
[ 65%] Building CXX object src/physics/shoc/CMakeFiles/shoc.dir/eti/shoc_update_prognostics_implicit.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_cloud_sed_impl_disp.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_main_part1.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_main_part2.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_main_part3.cpp.o
[ 65%] Building Fortran object src/physics/shoc/CMakeFiles/shoc_sk.dir/home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5908/scream/components/eam/src/physics/cam/shoc.F90.o
[ 65%] Building Fortran object src/physics/shoc/CMakeFiles/shoc.dir/home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5908/scream/components/eam/src/physics/cam/shoc.F90.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_main_impl_disp.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_main_impl_part2_disp.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3_sk.dir/disp/p3_rain_sed_impl_disp.cpp.o
[ 65%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_ice_supersat_conservation.cpp.o
[ 66%] Building Fortran object src/physics/p3/CMakeFiles/p3_sk.dir/p3_iso_c.f90.o
[ 66%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_nc_conservation.cpp.o
[ 66%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_nr_conservation.cpp.o
[ 66%] Building Fortran object src/physics/shoc/CMakeFiles/shoc.dir/shoc_iso_c.f90.o
[ 66%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_ni_conservation.cpp.o
[ 66%] Building CXX object src/physics/p3/CMakeFiles/p3.dir/eti/p3_prevent_liq_supersaturation.cpp.o
[ 66%] Building Fortran object src/physics/shoc/CMakeFiles/shoc_sk.dir/shoc_iso_c.f90.o
[ 66%] Building Fortran object src/physics/p3/CMakeFiles/p3.dir/p3_iso_c.f90.o
[ 66%] Linking CXX static library libshoc_sk.a
[ 66%] Linking CXX static library libshoc.a
[ 66%] Built target shoc_sk
[ 66%] Linking CXX static library libp3_sk.a
[ 66%] Linking CXX static library libp3.a
[ 66%] Built target shoc
[ 66%] Built target p3
[ 66%] Built target p3_sk
[ 66%] Linking CXX static library libmam.a
[ 66%] Built target mam
gmake: *** [Makefile:166: all] Error 2

Error(s) occurred during test phase
OVERALL STATUS: FAIL
Starting analysis on mappy with cmd: cd /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5908/scream/components/eamxx && source /projects/sems/modulefiles/utils/sems-modules-init.sh && module purge && module load sems-cmake/3.27.9 sems-git/2.42.0 sems-gcc/11.4.0 sems-openmpi-no-cuda/4.1.6 sems-netcdf-c/4.9.2 sems-netcdf-cxx/4.2 sems-netcdf-fortran/4.6.1 sems-parallel-netcdf/1.12.3 sems-openblas && export GATOR_INITIAL_MB=4000MB && export OMP_PROC_BIND=spread && true && ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m mappy
RUN: cd /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5908/scream/components/eamxx && source /projects/sems/modulefiles/utils/sems-modules-init.sh && module purge && module load sems-cmake/3.27.9 sems-git/2.42.0 sems-gcc/11.4.0 sems-openmpi-no-cuda/4.1.6 sems-netcdf-c/4.9.2 sems-netcdf-cxx/4.2 sems-netcdf-fortran/4.6.1 sems-parallel-netcdf/1.12.3 sems-openblas && export GATOR_INITIAL_MB=4000MB && export OMP_PROC_BIND=spread && true && ./scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m mappy
FROM: /home/e3sm-jenkins/jenkins-ws/workspace/SCREAM_PullRequest_Autotester_Mappy/5908/scream/components/eamxx
mappy failed
######################################################
Build step 'Execute shell' marked build as failure
$ ssh-agent -k
unset SSH_AUTH_SOCK;
unset SSH_AGENT_PID;
echo Agent pid 921 killed;
[ssh-agent] Stopped.
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh

We're having issues with some test-launcher job hanging forever. So let's make sure we clean all penting test-launcher jobs

squeue -o"%.7i %u %40j" | grep e3sm-jenkins | grep test-launcher | awk '{ print $1 }' | xargs -r scancel

[SCREAM_PullRequest_Autotester_Mappy] $ /bin/bash -le /tmp/jenkins8936001311501249363.sh
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Sending e-mails to: [email protected]
Finished: FAILURE

@bartgol bartgol added the AT: RETEST Force the autotester (AT) to retest the PR label Oct 15, 2024
@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - User Requested Retest - Label AT: RETEST will be reset after testing.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6157
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;AT: RETEST;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5913
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;AT: RETEST;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: c99a7d9
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments, mainly to push you to clarify changes/comments.

Also, would you please rebase this PR if more commits are needed? That way we could get rid of that already merged changes from showing up here...

components/eamxx/src/share/io/scream_io_control.hpp Outdated Show resolved Hide resolved

void set_dt (const double dt_in) {
EKAT_REQUIRE_MSG (dt==0 or dt==dt_in,
"[IOControl::set_dt] Error! Cannot reset dt once it is set.\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like above.

// computed next_write_ts=last_write_ts (in terms of date:time, the num_steps is correct).
// This means that at that time we deemed that the next_write_ts definitely fit in the same
// file as last_write_ts (date/time are the same!), which may or may not be true for non NumSnaps
// storage. To fix this, we recompute next_write_ts here, and close the file if it doesn't.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: doesn't ... what? fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this seems a bit too complex? Some questions:

  • when did we the "currently open file"?
  • why can't we have all the info we need to determine if we can close it after write? If we are deciding to flush and close full files (per title of PR), then can't we deduce that if the number of snaps in file == max number of snaps then close based on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for storage type "NumSnaps". But with type "one_month" (say), we can't say if the file is full unless we know the time stamp of the next write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, at a given time step, we know (can calculate) its time stamp, right? Then, why can't we deduce if one_month or one_day is ending here? Do we not know dt?

I understand the logic can be too convoluted, but it is still doable, no?

We don't have to do it now, but trying to understand if it is doable at all (ignoring the fact that we may choose not to make the code super ugly for some corner case)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot compute the next time stamp during t0 output. The driver does not have info about dt during the init sequence, which is when the OM is setup. When I originally designed the driver, I wanted to separate concerns as much as possible. In my mind, dt was a "run" time param, not an "init" time param (I didn't even know if dt could in principle change dynamically down the road).

If you want to compute next_write_ts during t0 output (which, again, happens during the init sequence), we need to pass dt to the driver init methods (from the f90 cpl interface). We can of course do that. And all in all, it may make the code simpler. It's a slightly deeper interface change though, so we could do it as a follow up PR.

Comment on lines +394 to +395
// In case REST_OPT=nsteps, don't count t0 output as one of those steps
// NOTE: for m_output_control, it doesn't matter, since it'll be reset to 0 before we return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmmm.... I see where things get weird! I wonder if shifting the indexing altogether can help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbc, this small issue is unrelated from closing the file at the right time. To be honest, I think we could flat out rm the line that updates nsamples_since_last_write for checkpoint control: it is not used anyways! And since, as the comment states, for output control it doesn't matter, we may as well remove this if block altogether...

@@ -118,6 +118,10 @@ class OutputManager
void finalize();

long long res_dep_memory_footprint () const;

// For debug and testing purposes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind elaborating what debug and testing we mean here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I want to be able to verify the correctness of the control/filespecs structs during unit tests. The comment was meant to say "this is not really needed at runtime".

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6157
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;AT: RETEST;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5913
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;AT: RETEST;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

@E3SM-Autotester E3SM-Autotester removed the AT: RETEST Force the autotester (AT) to retest the PR label Oct 15, 2024
@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES!

@E3SM-Autotester
Copy link
Collaborator

All Jobs Finished; status = PASSED, target_sha=0422f5754bde808b99f738c9dca1af4379634fbe, However Inspection must be performed before merge can occur...

@E3SM-Autotester
Copy link
Collaborator

The base branch has been updated since the last successful testing.

  • last PASS base branch sha: 0422f57
  • current base branch sha : 75ef2ed
    The AutoTester will discard the last PASS, and re-test the PR from scratch

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6158
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5914
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: c99a7d9
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6158
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5914
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

@E3SM-Autotester
Copy link
Collaborator

All Jobs Finished; status = PASSED, target_sha=75ef2edca2837f1f61601a83fba2b559b26df85f, However Inspection must be performed before merge can occur...

1 similar comment
@E3SM-Autotester
Copy link
Collaborator

All Jobs Finished; status = PASSED, target_sha=75ef2edca2837f1f61601a83fba2b559b26df85f, However Inspection must be performed before merge can occur...

@E3SM-Autotester
Copy link
Collaborator

The base branch has been updated since the last successful testing.

  • last PASS base branch sha: 75ef2ed
  • current base branch sha : 10fd3d0
    The AutoTester will discard the last PASS, and re-test the PR from scratch

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6171
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5923
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: c99a7d9
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has pushed a change to the PR before testing completed. NEW EVENT 'committed', ID C_kwDOCEfuetoAKDAzM2QzMWUxMjg5OGNlN2YwZjdiYjA1NzMyZWEyMzNiYjliOGNhYzM... The Jenkins Jobs will be shutdown; Testing of this PR must occur again.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED

Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run.

Pull Request Auto Testing has FAILED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6171
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5923
  • Status: ERROR

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA c99a7d9
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM
SCREAM_PullRequest_Autotester_Weaver # 6171 ERROR (click to see last 100 lines of console output)

PYTHON_BIN=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/bin;
export PYTHON_BIN;
PYTHON_INC=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/include;
export PYTHON_INC;
PYTHON_LIB=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib;
export PYTHON_LIB;
PYTHON_ROOT=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe;
export PYTHON_ROOT;
PYTHON_VERSION=3.10.8;
export PYTHON_VERSION;
_LMFILES_=/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/utilities/linux-rhel8-ppc64le/Core/python/3.10.8.lua;
export _LMFILES_;
_ModuleTable001_=X01vZHVsZVRhYmxlXyA9IHsKTVR2ZXJzaW9uID0gMywKY19yZWJ1aWxkVGltZSA9IGZhbHNlLApjX3Nob3J0VGltZSA9IGZhbHNlLApkZXB0aFQgPSB7fSwKZmFtaWx5ID0ge30sCm1UID0gewpweXRob24gPSB7CmZuID0gIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC91dGlsaXRpZXMvbGludXgtcmhlbDgtcHBjNjRsZS9Db3JlL3B5dGhvbi8zLjEwLjgubHVhIiwKZnVsbE5hbWUgPSAicHl0aG9uLzMuMTAuOCIsCmxvYWRPcmRlciA9IDEsCnByb3BUID0ge30sCnN0YWNrRGVwdGggPSAwLApzdGF0dXMgPSAiYWN0aXZlIiwKdXNlck5hbWUgPSAicHl0aG9uLzMuMTAuOCIsCndWID0gIjAwMDAwMDAwMy4wMDAwMDAwMTAuMDAwMDAwMDA4Lip6;
export _ModuleTable001_;
_ModuleTable002_=ZmluYWwiLAp9LAp9LAptcGF0aEEgPSB7CiIvcHJvamVjdHMvcHBjNjRsZS1wd3I5LXJoZWw4L21vZHVsZWZpbGVzL2xtb2QvY29tcGlsZXJzIiwgIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC91dGlsaXRpZXMvbGludXgtcmhlbDgtcHBjNjRsZS9Db3JlIiwKfSwKc3lzdGVtQmFzZU1QQVRIID0gIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC9jb21waWxlcnM6L3Byb2plY3RzL3BwYzY0bGUtcHdyOS1yaGVsOC9tb2R1bGVmaWxlcy9sbW9kL3V0aWxpdGllcy9saW51eC1yaGVsOC1wcGM2NGxlL0NvcmUiLAp9Cg==;
export _ModuleTable002_;
_ModuleTable_Sz_=2;
export _ModuleTable_Sz_;'
+++ __LMOD_REF_COUNT_CMAKE_PREFIX_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe:1
+++ export __LMOD_REF_COUNT_CMAKE_PREFIX_PATH
+++ CMAKE_PREFIX_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe
+++ export CMAKE_PREFIX_PATH
+++ __LMOD_REF_COUNT_LD_LIBRARY_PATH='/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib:1;/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/lib:1'
+++ export __LMOD_REF_COUNT_LD_LIBRARY_PATH
+++ LD_LIBRARY_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib:/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/lib
+++ export LD_LIBRARY_PATH
+++ __LMOD_REF_COUNT_LIBRARY_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib:1
+++ export __LMOD_REF_COUNT_LIBRARY_PATH
+++ LIBRARY_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib
+++ export LIBRARY_PATH
+++ LOADEDMODULES=python/3.10.8
+++ export LOADEDMODULES
+++ __LMOD_REF_COUNT_MANPATH='/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/share/man:1;/opt/lsf/10.1/man:1'
+++ export __LMOD_REF_COUNT_MANPATH
+++ MANPATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/share/man:/opt/lsf/10.1/man::
+++ export MANPATH
+++ __LMOD_REF_COUNT_MODULEPATH='/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/compilers:1;/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/utilities/linux-rhel8-ppc64le/Core:1'
+++ export __LMOD_REF_COUNT_MODULEPATH
+++ MODULEPATH=/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/compilers:/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/utilities/linux-rhel8-ppc64le/Core
+++ export MODULEPATH
+++ __LMOD_REF_COUNT_PATH='/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/bin:1;/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/etc:1;/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/bin:1;/usr/local/bin:1;/usr/bin:1;/usr/local/sbin:1;/usr/sbin:1;/home/e3sm-jenkins/.local/bin:1;/home/e3sm-jenkins/bin:1'
+++ export __LMOD_REF_COUNT_PATH
+++ PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/bin:/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/etc:/opt/lsf/10.1/linux3.10-glibc2.17-ppc64le/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/e3sm-jenkins/.local/bin:/home/e3sm-jenkins/bin
+++ export PATH
+++ __LMOD_REF_COUNT_PKG_CONFIG_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib/pkgconfig:1
+++ export __LMOD_REF_COUNT_PKG_CONFIG_PATH
+++ PKG_CONFIG_PATH=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib/pkgconfig
+++ export PKG_CONFIG_PATH
+++ PYTHON_BIN=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/bin
+++ export PYTHON_BIN
+++ PYTHON_INC=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/include
+++ export PYTHON_INC
+++ PYTHON_LIB=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe/lib
+++ export PYTHON_LIB
+++ PYTHON_ROOT=/projects/ppc64le-pwr9-rhel8/utilities/python/3.10.8/gcc/8.3.1/base/qmix2fe
+++ export PYTHON_ROOT
+++ PYTHON_VERSION=3.10.8
+++ export PYTHON_VERSION
+++ _LMFILES_=/projects/ppc64le-pwr9-rhel8/modulefiles/lmod/utilities/linux-rhel8-ppc64le/Core/python/3.10.8.lua
+++ export _LMFILES_
+++ _ModuleTable001_=X01vZHVsZVRhYmxlXyA9IHsKTVR2ZXJzaW9uID0gMywKY19yZWJ1aWxkVGltZSA9IGZhbHNlLApjX3Nob3J0VGltZSA9IGZhbHNlLApkZXB0aFQgPSB7fSwKZmFtaWx5ID0ge30sCm1UID0gewpweXRob24gPSB7CmZuID0gIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC91dGlsaXRpZXMvbGludXgtcmhlbDgtcHBjNjRsZS9Db3JlL3B5dGhvbi8zLjEwLjgubHVhIiwKZnVsbE5hbWUgPSAicHl0aG9uLzMuMTAuOCIsCmxvYWRPcmRlciA9IDEsCnByb3BUID0ge30sCnN0YWNrRGVwdGggPSAwLApzdGF0dXMgPSAiYWN0aXZlIiwKdXNlck5hbWUgPSAicHl0aG9uLzMuMTAuOCIsCndWID0gIjAwMDAwMDAwMy4wMDAwMDAwMTAuMDAwMDAwMDA4Lip6
+++ export _ModuleTable001_
+++ _ModuleTable002_=ZmluYWwiLAp9LAp9LAptcGF0aEEgPSB7CiIvcHJvamVjdHMvcHBjNjRsZS1wd3I5LXJoZWw4L21vZHVsZWZpbGVzL2xtb2QvY29tcGlsZXJzIiwgIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC91dGlsaXRpZXMvbGludXgtcmhlbDgtcHBjNjRsZS9Db3JlIiwKfSwKc3lzdGVtQmFzZU1QQVRIID0gIi9wcm9qZWN0cy9wcGM2NGxlLXB3cjktcmhlbDgvbW9kdWxlZmlsZXMvbG1vZC9jb21waWxlcnM6L3Byb2plY3RzL3BwYzY0bGUtcHdyOS1yaGVsOC9tb2R1bGVmaWxlcy9sbW9kL3V0aWxpdGllcy9saW51eC1yaGVsOC1wcGM2NGxlL0NvcmUiLAp9Cg==
+++ export _ModuleTable002_
+++ _ModuleTable_Sz_=2
+++ export _ModuleTable_Sz_
++ SCREAM_MACHINE=weaver
+ [[ 0 == 1 ]]
+ [[ 0 == 1 ]]
+ [[ 0 == 1 ]]
++ whoami
+ [[ e3sm-jenkins == \e\3\s\m\-\j\e\n\k\i\n\s ]]
+ git config --local user.email [email protected]
+ git config --local user.name 'Jenkins Jenkins'
+ declare -i fails=0
+ BASELINES_DIR=AUTO
+ TAS_ARGS='--baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m $machine'
+ [[ weaver == \p\m\-\g\p\u ]]
+ set +e
+ '[' -n 3032 ']'
+ is_at_run=1
+ SA_FAILURES_DETAILS=
+ '[' 1 -eq 1 ']'
++ ./scripts/gather-all-data './scripts/test-all-scream --baseline-dir AUTO $compiler -p -c EKAT_DISABLE_TPL_WARNINGS=ON -m $machine' -l -m weaver
***Forced exclusive execution
<>
<>
Build was aborted
Aborted by Luca Bertagna
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash -le

cd $WORKSPACE/${BUILD_ID}/

./scream/components/eamxx/scripts/jenkins/jenkins_cleanup.sh
[SCREAM_PullRequest_Autotester_Weaver] $ /bin/bash -le /tmp/jenkins7076630350333083630.sh
./scream/components/eamxx/scripts/jenkins/jenkins_common.sh: line 7: 905779 Broken pipe $JENKINS_SCRIPT_DIR/jenkins_common_impl.sh 2>&1
905780 Terminated | tee JENKINS_$DATE_STAMP

SCREAM_PullRequest_Autotester_Mappy # 5923 ERROR (click to see last 100 lines of console output)

   prescribed_wind: no

************** General run info **********************

ncols: 218
nlevs: 72
npacks: 5
league_size: 218
team_size: 1
concurrent teams: 1


P3_INIT (reading/creating look-up tables) ...

Using memory pool. Initial size: 3.90625GB ; Grow size: 3.90625GB.
INFORM: Automatically inserting fence() after every parallel_for
[EAMxx] initialize_atm_procs ... done!
[EAMxx::init] resolution-dependent device memory footprint: 60.849512MB
[EAMxx] initialize_output_managers ...
[EAMxx::output_manager] - Writing model-output:
[EAMxx::output_manager] FILE: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep.INSTANT.nsteps_x4.np1.2021-10-12-45000.nc
[EAMxx::scorpio_output] Writing variables to file
file name: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep.INSTANT.nsteps_x4.np1.2021-10-12-45000.nc
Done! Elapsed time: 0.000000 seconds
[EAMxx::scorpio_output] Writing variables to file
file name: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep.INSTANT.nsteps_x4.np1.2021-10-12-45000.nc
Done! Elapsed time: 0.012000 seconds
[EAMxx::output_manager] - New Output stream
Filename prefix: homme_shoc_cld_mam_aci_p3_mam_optics_rrtmgp_mam_drydep
Run t0: 2021-10-12-45000
Case t0: 2021-10-12-45000
Reference t0: 2021-10-12-45000
Is Restart File ?: NO
Is Restarted Run ?: NO
Averaging Type: INSTANT
Output Frequency: 4 nsteps
File Capacity: 1snapshots
Includes Grid Data ?: YES
[EAMxx] initialize_output_managers ... done!
Start time stepping loop... [ 0%]
Atmosphere step = 0
model start-of-step time = 2021-10-12 12:30:00

WARNING: Failed and repaired post-condition property check.

  • Atmosphere process name: homme

  • Property check name: tracers lower bound check: 0

  • Atmosphere process MPI Rank: 0

  • Message: Check failed.

  • check name: tracers lower bound check: 0

  • field id: tracers[Physics GLL] double:ncol,dim,lev(218,41,72) [1]

  • minimum:

    • value: -3.13818e-53
    • indices (w/ global column index): (54,13,38)
    • lat/lon: (-31.9482, 77.5623)
  • maximum:

    • value: 1.05706e+10
    • indices (w/ global column index): (47,20,70)
    • lat/lon: (44.3197, 12.4377)
  • Iteration 1 completed [ 25%]
    Atmosphere step = 1
    model start-of-step time = 2021-10-12 13:00:00

WARNING: Failed and repaired post-condition property check.

  • Atmosphere process name: homme

  • Property check name: tracers lower bound check: 0

  • Atmosphere process MPI Rank: 0

  • Message: Check failed.

  • check name: tracers lower bound check: 0

  • field id: tracers[Physics GLL] double:ncol,dim,lev(218,41,72) [1]

  • minimum:

    • value: -4.29349e-61
    • indices (w/ global column index): (60,13,34)
    • lat/lon: (0, 77.5623)
  • maximum:

    • value: 1.73157e+10
    • indices (w/ global column index): (47,20,70)
    • lat/lon: (44.3197, 12.4377)
  • Iteration 2 completed [ 50%]
    Atmosphere step = 2
    model start-of-step time = 2021-10-12 13:30:00

srun: forcing job termination
slurmstepd: error: *** STEP 197802.0 ON localhost CANCELLED AT 2024-10-16T15:10:03 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: error: localhost: task 0: Killed

    Start  41: output_restart_check_AVERAGE_np4

251/550 Test #37: output_restart_check_INSTANT_np4 ...................................... Passed 0.04 sec
Start 454: p3_mam4_wetscav_np3_vs_np1
252/550 Test #41: output_restart_check_AVERAGE_np4 ...................................... Passed 0.04 sec
Start 461: shoc_cldfrac_p3_wetscav_np3_vs_np1'
Terminated
POST BUILD TASK : SUCCESS
END OF POST BUILD TASK : 0
Finished: ABORTED

mahf708
mahf708 previously approved these changes Oct 16, 2024
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@AaronDonahue AaronDonahue dismissed mahf708’s stale review October 16, 2024 21:39

The merge-base changed after approval.

@bartgol bartgol requested a review from mahf708 October 16, 2024 22:35
@bartgol bartgol dismissed their stale review October 16, 2024 22:36

I contributed to the PR, so I won't be a reviewer anymore

mahf708
mahf708 previously approved these changes Oct 16, 2024
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by order of the peaky blinders

@AaronDonahue AaronDonahue dismissed mahf708’s stale review October 16, 2024 22:52

The merge-base changed after approval.

@bartgol
Copy link
Contributor

bartgol commented Oct 16, 2024

I'm not sure what gh is doing with this weird dismissal of the review. If testing passes, we'll just merge manually.

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6174
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 033d31e
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5926
  • Status: STARTED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 033d31e
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Using Repos:

Repo: SCREAM (E3SM-Project/scream)
  • Branch: aarondonahue/close_output_when_full
  • SHA: 033d31e
  • Mode: TEST_REPO

Pull Request Author: AaronDonahue

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SCREAM_PullRequest_Autotester_Weaver

  • Build Num: 6174
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 033d31e
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

Build Information

Test Name: SCREAM_PullRequest_Autotester_Mappy

  • Build Num: 5926
  • Status: PASSED

Jenkins Parameters

Parameter Name Value
PR_LABELS I/O;bugfix
PULLREQUESTNUM 3032
SCREAM_SOURCE_REPO https://github.com/E3SM-Project/scream
SCREAM_SOURCE_SHA 033d31e
SCREAM_TARGET_BRANCH master
SCREAM_TARGET_REPO https://github.com/E3SM-Project/scream
SCREAM_TARGET_SHA 9b1b4c7
TEST_REPO_ALIAS SCREAM

@E3SM-Autotester
Copy link
Collaborator

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES!

@E3SM-Autotester
Copy link
Collaborator

All Jobs Finished; status = PASSED, target_sha=41f563d7ec3e4e1727d25267796e9beac13ffb12, However Inspection must be performed before merge can occur...

mahf708
mahf708 previously approved these changes Oct 17, 2024
Copy link
Contributor

@mahf708 mahf708 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by order of the peaky blinders

@AaronDonahue AaronDonahue dismissed mahf708’s stale review October 17, 2024 04:20

The merge-base changed after approval.

@bartgol bartgol merged commit 68935b3 into master Oct 17, 2024
5 of 6 checks passed
@bartgol bartgol deleted the aarondonahue/close_output_when_full branch October 17, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Empty (zero-sized) monthly outputs
5 participants