You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using CICE in a S2S configuration in ufs-weather-model causes failures after a large number of CICE file (restart and/or history) writes (500-700ish) when CICE is compiled with PIO but not with NetCDF. The failure always happens on a CICE process. The current work around for weather model regression tests have been to set export I_MPI_SHM_HEAP_VSIZE=16384 in the job submission script, but this is not a long-term solution.
To Reproduce:
Compile weather model with ATM+ICE+OCN on Hera, Gaea, or WCOSS2. Have used multiple different weather model regression test configurations and resolutions (cpld_control_c48, cpld_control_nowave_noaero_p8) and stack-stack versions/intel compilers (2021v2023) with similar results.
Either run very long simulations with infrequent output or shorter simulations with high frequency output.
@LarissaReames-NOAA@junwang-noaa We have a proposed fix for this issue now. I reached out to Tony Craig and he was able to reproduce the issue in standalone CICE and quickly zero'd in on the problem/solution. He was able to generate 8700 files in standalone testing. I'll make a test branch and hopefully one of us can try it out and ensure it works.
I've tested Tony's fix (https://github.com/DeniseWorthen/CICE/tree/bugfix/manyfiles) using the C48-5deg case on Gaea. I was able to create 1906 hourly history files before hitting the wall clock time (8hours). So I think I have a fix, although the exact implementation may change a bit.
Description
Using CICE in a S2S configuration in ufs-weather-model causes failures after a large number of CICE file (restart and/or history) writes (500-700ish) when CICE is compiled with PIO but not with NetCDF. The failure always happens on a CICE process. The current work around for weather model regression tests have been to set
export I_MPI_SHM_HEAP_VSIZE=16384
in the job submission script, but this is not a long-term solution.To Reproduce:
Additional context
Cause of issue first reported in weather model issue 2320
I've also tried all possible options of restart/history_format in ice_in and the failure is always the same.
Output
On Hera the failure looks like:
On WCOSS2 and Gaea the error looks like
The text was updated successfully, but these errors were encountered: