Skip to content

Commit

Permalink
fix final checkpoint file write (#558)
Browse files Browse the repository at this point in the history
### Description
This fixes the checkpoint/restart failure identified in
#554 and adds a GitHub
action to test that checkpoint/restart works correctly for the
HydroBlast3D test.

### Related issues
Fixes #554.

### Checklist
_Before this pull request can be reviewed, all of these tasks should be
completed. Denote completed tasks with an `x` inside the square brackets
`[ ]` in the Markdown source below:_
- [x] I have added a description (see above).
- [x] I have added a link to any related issues see (see above).
- [x] I have read the [Contributing
Guide](https://github.com/quokka-astro/quokka/blob/development/CONTRIBUTING.md).
- [ ] I have added tests for any new physics that this PR adds to the
code.
- [x] I have tested this PR on my local computer and all tests pass.
- [x] I have manually triggered the GPU tests with the magic comment
`/azp run`.
- [x] I have requested a reviewer for this PR.
  • Loading branch information
BenWibking authored Mar 13, 2024
1 parent fcde1aa commit 66d3b80
Show file tree
Hide file tree
Showing 4 changed files with 109 additions and 5 deletions.
64 changes: 64 additions & 0 deletions .github/workflows/checkpoint-restart.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: CheckpointRestart

on:
push:
branches: [ development ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ development ]
merge_group:
branches: [ development ]

concurrency:
group: ${{ github.ref }}-${{ github.head_ref }}-final-checkpoint-restart
cancel-in-progress: true

env:
# Customize the CMake build type here (Release, Debug, RelWithDebInfo, etc.)
BUILD_TYPE: Release

jobs:
test:
runs-on: ubuntu-20.04

steps:
- uses: actions/checkout@v4
with:
submodules: true

- name: Create Build Environment
run: cmake -E make_directory ${{runner.workspace}}/build

- name: Install dependencies
run: sudo apt-get update && sudo apt-get install gcc-11 g++-11 python3-dev python3-numpy python3-matplotlib python3-pip libopenmpi-dev libhdf5-mpi-dev

- name: Build PlotfileTools
shell: bash
working-directory: ${{github.workspace}}/extern/amrex/Tools/Plotfile
run: make -j4

- name: Configure CMake
shell: bash
working-directory: ${{runner.workspace}}/build
run: cmake $GITHUB_WORKSPACE -DCMAKE_BUILD_TYPE=$BUILD_TYPE -DCMAKE_C_COMPILER=gcc-11 -DCMAKE_CXX_COMPILER=g++-11 -DAMReX_SPACEDIM=3

- name: Build
working-directory: ${{runner.workspace}}/build
shell: bash
# Execute the build. You can specify a specific target with "--target <NAME>"
run: cmake --build . --config $BUILD_TYPE --parallel 4 --target test_hydro3d_blast

- name: Checkpoint/Restart Test
working-directory: ${{github.workspace}}/tests
shell: bash
env:
BUILD_DIR: ${{runner.workspace}}/build
PLOTFILETOOLS_DIR: ${{github.workspace}}/extern/amrex/Tools/Plotfile
run: ./checkpoint_restart_test.sh

- name: Upload output
if: always()
uses: actions/upload-artifact@v4
with:
name: checkpoint-restart-results
path: ${{github.workspace}}/tests
16 changes: 11 additions & 5 deletions src/simulation.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -908,11 +908,15 @@ template <typename problem_t> void AMRSimulation<problem_t>::evolve()
WritePlotFile();
}

// IMPORTANT: this MUST be written *after* the plotfile to avoid corruption:
// https://github.com/quokka-astro/quokka/issues/554
if (checkpointTimeInterval_ > 0 && next_chk_file_time <= cur_time) {
next_chk_file_time += checkpointTimeInterval_;
WriteCheckpointFile();
}

// IMPORTANT: this MUST be written *after* the plotfile to avoid corruption:
// https://github.com/quokka-astro/quokka/issues/554
if (checkpointInterval_ > 0 && (step + 1) % checkpointInterval_ == 0) {
last_chk_file_step = step + 1;
WriteCheckpointFile();
Expand Down Expand Up @@ -958,11 +962,6 @@ template <typename problem_t> void AMRSimulation<problem_t>::evolve()
}
amrex::Print() << '\n';

// write final checkpoint
if (checkpointInterval_ > 0 && istep[0] > last_chk_file_step) {
WriteCheckpointFile();
}

// write final plotfile
if (plotfileInterval_ > 0 && istep[0] > last_plot_file_step) {
WritePlotFile();
Expand All @@ -978,6 +977,13 @@ template <typename problem_t> void AMRSimulation<problem_t>::evolve()
WriteStatisticsFile();
}

// write final checkpoint
// IMPORTANT: this MUST be written *after* the plotfile to avoid corruption:
// https://github.com/quokka-astro/quokka/issues/554
if (checkpointInterval_ > 0 && istep[0] > last_chk_file_step) {
WriteCheckpointFile();
}

#ifdef AMREX_USE_ASCENT
// close Ascent
ascent_.close();
Expand Down
24 changes: 24 additions & 0 deletions tests/blast_32.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# *****************************************************************
# Problem size and geometry
# *****************************************************************
geometry.prob_lo = 0.0 0.0 0.0
geometry.prob_hi = 1.2 1.2 1.2
geometry.is_periodic = 0 0 0

# *****************************************************************
# VERBOSITY
# *****************************************************************
amr.v = 0 # verbosity in Amr

# *****************************************************************
# Resolution and refinement
# *****************************************************************
amr.n_cell = 32 32 32
amr.max_level = 0 # number of levels = max_level + 1
amr.max_grid_size = 16 # at least 128 for GPUs
amr.blocking_factor = 64 # grid size must be divisible by this
amr.n_error_buf = 3 # minimum 3 cell buffer around tagged cells
amr.grid_eff = 0.7 # default

do_reflux = 0
do_subcycle = 0
10 changes: 10 additions & 0 deletions tests/checkpoint_restart_test.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/bin/sh
set -x

$BUILD_DIR/src/HydroBlast3D/test_hydro3d_blast blast_32.in max_walltime=0:00:10 plotfile_interval=100 checkpoint_interval=100
$BUILD_DIR/src/HydroBlast3D/test_hydro3d_blast blast_32.in restartfile=last_chk max_timesteps=1 plotfile_interval=100 checkpoint_interval=100

old_plotfile=`ls -1drt plt*.old.* | head -1`
plotfile=${old_plotfile%.old.*}

$PLOTFILETOOLS_DIR/fcompare.gnu.ex $plotfile $old_plotfile

0 comments on commit 66d3b80

Please sign in to comment.