Milestone: CMEPS 0.4

Overview

In this milestone, the Community Mediator for Earth Prediction Systems (CMEPS) is used to couple the model components in NOAA's Unified Forecast System (UFS) Subseasonal-to-Seasonal (S2S) application. The model components included are the Global Forecast System (GFS) atmosphere with the Finite Volume Cubed Sphere (FV3) dynamical core (FV3GFS), the Modular Ocean Model (MOM6) and the Los Alamos sea ice model (CICE5).

CMEPS leverages the Earth System Modeling Framework (ESMF) infrastructure, which consists of tools for building and coupling component models, along with the National Unified Operational Capability (NUOPC) Layer, a set of templates and conventions that increases interoperability in ESMF-based systems. The workflow used is the Common Infrastructure for Modeling the Earth (CIME) community workflow software.

This is not an official release of the UFS S2S application, but a prototype system being used to evaluate specific aspects of CMEPS and the CIME workflow.

Changes relative to CMEPS 0.3 are described here.

Grids

The FV3GFS atmosphere is discretized on a cubed sphere grid. This grid is based on a decomposition of the sphere into six identical regions, obtained by projecting the sides of a circumscribed cube onto a spherical surface. See more information about the cubed sphere grid here. The cubed sphere grid resolution for this milestone is C384.

The MOM6 ocean and CICE5 sea ice components are discretized on a tripolar grid. This type of grid avoids a singularity at the North Pole by relocating poles to the land masses of northern Canada and northern Russia, in addition to one at the South Pole. See more information about tripolar grids here. The tripolar grid resolution for this milestone is 1/4 degree.

Run Sequences

In ESMF, component model execution is split into initialize, run, and finalize methods, and each method can have multiple phases. The run sequence specifies the order in which model component phases are called by a driver.

To complete a run, a sequential cold start run sequence generates an initial set of surface fluxes using a minimum set of files. The system is then restarted using a warm start run sequence, with CMEPS reading in the initial fluxes and the atmosphere, ocean, and ice components reading in their initial conditions.

Cold Start Sequence

The cold start sequence initializes all model components using only a few files. The cold start sequence only needs to run for an hour. However, it is also possible to have longer initial run to restart the model.

The cold start run sequence below shows an outer (slow) loop and an inner (fast) loop each associated with a coupling interval. Normally the inner loop is faster, but for this milestone both loops were set to the same coupling interval, 1800 seconds. In general, the outer loop coupling interval must be a multiple of the inner loop interval.

An arrow ( ->) indicates that data is being transferred from one component to another during that step. CMEPS is shown as MED, FV3GFS as ATM, CICE5 as ICE, and MOM6 as OCN. Where a component name (e.g. ATM) or a component name and specific phase appears in the run sequence (e.g. MED med_phases_prep_atm), that is where in the sequence the component method or phase is run.

runSeq::                                
@1800                
   @1800             
     MED med_phases_prep_atm            
     MED -> ATM :remapMethod=redist     
     ATM                                
     ATM -> MED :remapMethod=redist     
     MED med_phases_prep_ice            
     MED -> ICE :remapMethod=redist     
     ICE                                
     ICE -> MED :remapMethod=redist     
     MED med_fraction_set               
     MED med_phases_prep_ocn_map        
     MED med_phases_aofluxes_run        
     MED med_phases_prep_ocn_merge      
     MED med_phases_prep_ocn_accum_fast 
     MED med_phases_history_write       
     MED med_phases_profile             
   @                                    
   MED med_phases_prep_ocn_accum_avg    
   MED -> OCN :remapMethod=redist       
   OCN                                  
   OCN -> MED :remapMethod=redist       
   MED med_phases_restart_write         
@                                       
::

Warm Start Sequence

The warm start sequence, shown below, is for the main time integration loop. It is initialized by restart files generated by the cold start sequence. For this milestone the inner and outer coupling intervals are both set to 1800 seconds.

@1800                
   MED med_phases_prep_ocn_accum_avg    
   MED -> OCN :remapMethod=redist       
   OCN                                  
   @1800             
     MED med_phases_prep_atm            
     MED med_phases_prep_ice            
     MED -> ATM :remapMethod=redist     
     MED -> ICE :remapMethod=redist     
     ATM                                
     ICE                                
     ATM -> MED :remapMethod=redist     
     ICE -> MED :remapMethod=redist     
     MED med_fraction_set               
     MED med_phases_prep_ocn_map        
     MED med_phases_aofluxes_run        
     MED med_phases_prep_ocn_merge      
     MED med_phases_prep_ocn_accum_fast 
     MED med_phases_history_write       
     MED med_phases_profile             
   @                                    
   OCN -> MED :remapMethod=redist       
   MED med_phases_restart_write         
@                                       
::

Limitations and Known Bugs

The NEMS (NOAA Environmental Modeling System) Mediator is not currently fully conservative when remapping surface fluxes between model grids: a special nearest neighbor fill is used along the coastline to ensure physically realistic values are available in cells where masking mismatches would otherwise leave unmapped values. Although CMEPS has a conservative option and support for fractional surface types, the NEMS nearest neighbor fills were implemented in CMEPS so that the two Mediators would have similar behavior. In a future release, the fully conservative option will be enabled.
The current version of CIME case control system does not support JULIAN calendar type and the default calendar is set to NOLEAP for this milestone. In addtion, there are inconcistencies among the model components and mediator in terms of calendar type and this might by observed by checking NetCDF time attibutes of the model outputs. In the future release of CMEPS modeling system, JULIAN calendar will be set as default by ensuring the consistency of calendar type among the model components and mediator.
The namelist options of ESMF config file with uppercase characters are not handled properly by the current version of the CIME case control system and xmlchange can not change the value of those prticular namelist options. This will be fixed in future relase of the CMESP modeling system.
The modeling system uses same versions of model components from CMEPS 0.3 but updated mediator. The specified tags for the components with the explaination can be seen as follows,
- CMEPS: cmeps_v0.4.1 (new version of mediator but to create numerically stable modeling system and to be consistent with 0.3 release, flux_convergence=0 and flux_max_iteration=2 are used)
- CIME: cmeps_v0.4.2 (includes fix for Theia Slurum transition)
- FV3GFS CIME interface: cmeps_v0.4.2 tag (FV3GFS namelist options are compared and the differencies are fixed such as ENS_SPS)
- FV3GFS: cmeps_v0.3
- MOM CIME interface: cmeps_v0.4
- MOM: cmeps_v0.3
- CICE: cmeps_v0.3.1 (just includes modifications to work with newer version of mediator 0.4)

Supported Platforms

Cheyenne/NCAR
Stampede2/XSEDE
Theia/NOAA

Validation

Used software environment: intel/19.0.2, mpt/2.19, netcdf-mpi/4.6.1, pnetcdf/1.11.0, optimized version of ESMF 8.0.0 Beta Snapshot 38 (compiled with intel/19.0.2 and mpt/2.19).

Download, Build and Run

Currently, the UFSCOMP, CMEPS, CIME and model components MOM6 and CICE are distributed using public repositories but FV3GFS is a private repository on GitHub and requires additional step to add user as a collaborator. Please send a mail to here to get access for private FV3GFS repository.

To download, run and build:

# Clone UFSCOMP umbrella repository
$ git clone https://github.com/ESCOMP/UFSCOMP.git
$ cd UFSCOMP
# To checkout the tag for this milestone (not yet available):
$ git checkout cmeps_v0.4.1


# Check out all model components and CIME
# Note that there is no need to use separate Externals.cfg file for Stampede2 and Theia anymore
# The manage externals asks username and password four times just for FV3GFS private repository 
$ ./manage_externals/checkout_externals

# Go to CIME scripts directory
$ cd cime/scripts

# Set the PROJECT environment variable and replace PROJECT ID with an appropriate project number
# Bourne shell (sh, ksh, bash, sh)
$ export PROJECT=[PROJECT ID]
# C shell (csh, tcsh)
$ setenv PROJECT [PROJECT ID]

# Create UFS S2S case using the name "ufs.s2s.c384_t025" (the user can choose any name)
$ ./create_newcase --compset UFS_S2S --res C384_t025 --case ufs.s2s.c384_t025 --driver nuopc --run-unsupported

# Setup
$ cd ufs.s2s.c384_t025  # this is your "case root" directory selected above, and can be whatever name you choose
$ ./case.setup

# Set start time
$ ./xmlchange RUN_REFDATE=2012-01-01
$ ./xmlchange RUN_STARTDATE=2012-01-01
$ ./xmlchange JOB_WALLCLOCK_TIME=00:30:00 # wall-clock time limit for job scheduler (optional)
$ ./xmlchange USER_REQUESTED_WALLTIME=00:30:00 # to be consistent with JOB_WALLCLOCK_TIME

# Turn off short term archiving
$ ./xmlchange DOUT_S=FALSE

# Submit a 1-hour cold start run to generate mediator restart
$ ./xmlchange STOP_OPTION=nhours
$ ./xmlchange STOP_N=1

# To use correct initial condition for CICE, edit user_nl_cice and add following line
# ice_ic = "$ENV{UGCSINPUTPATH}/cice5_model.res_2012010100.nc"

# Build and Submit case
$ ./case.build # on Cheyenne $ qcmd -- ./case.build
$ ./case.submit

# Output appears in the case run directory:
# On Cheyenne:
$ cd /glade/scratch/<user>/ufs.s2s.c384_t025

# On Stampede2:
$ cd $SCRATCH/ufs.s2s.c384_t025

# On Theia:
$ cd /scratch4/NCEPDEV/nems/noscrub/<user>/cimecases/ufs.s2s.c384_t025

# Then model can be restarted using restart file generated by 1-hour simulation
$ ./xmlchange MEDIATOR_READ_RESTART=TRUE
$ ./xmlchange STOP_OPTION=ndays
$ ./xmlchange STOP_N=1
$ ./case.submit

FAQ

Can I change the model resolution?

Not for this milestone. The only model resolution that is supported in this milestone is C384_t025 (atmosphere: C384 cubed sphere ~ 25 km, ocean/ice: 1/4 degree tripolar).

How do I change the processor layout?

The Persistent Execution Threads (PET) layout affects the overall performance of the system. The following command shows the default PET layout of the case:

$ cd /path/to/UFSCOMP/cime/scripts/ufs.s2s.c384_t025
$ ./pelayout

To change the PET layout temporarily, the xmlchange command is used. For example, the following commands are used to assign a custom PET layout for the c384_t025 resolution case:

$ cd /path/to/UFSCOMP/cime/scripts/ufs.s2s.c384_t025
$ ./xmlchange NTASKS_CPL=648
$ ./xmlchange NTASKS_ATM=648
$ ./xmlchange NTASKS_OCN=360
$ ./xmlchange NTASKS_ICE=360
$ ./xmlchange ROOTPE_CPL=0
$ ./xmlchange ROOTPE_ATM=0
$ ./xmlchange ROOTPE_OCN=648
$ ./xmlchange ROOTPE_ICE=1008

This will double the default number of PETs used for the ATM component and will assign 648 cores to ATM and CPL (Mediator), 360 cores to OCN and 360 cores to ICE components. In this setup, all the model components run concurrently and PETs will be distributed as 0-647 for ATM and CPL, 648-1007 for OCN and 1008-1367 for ICE components. Note that the PETs that are assigned to ATM component (FV3GFS) also includes the IO tasks (write_tasks_per_group), which the default value set as 48. The real number of PETs that is used by the ATM component is 600 (648-48).

To change the default PET layout permanently for a specific case:

# Go to main configuration file used to define PET layout
$ cd /path/to/UFSCOMP/components/fv3/cime_config
# And find specific case in config_PETs.xml file and edit ntasks and rootpe elements in the XML file.
# For example, a%C384.+oi%tx0.25v1 is used to modify default PET layout for c384_t025 case.

How do I change the layout namelist option for FV3GFS?

The layout namelist option mainly defines the processor layout on each tile and the number of PETs assigned to ATM component must equal layout(1)*layout(2)*ntiles+write_tasks_per_group*write_groups. In this case, layout(1) indicates the number of sub-regions in X direction and layout(2) in Y direction to specify two-dimensional decomposition ratios. For the cubed sphere, ntiles should be 6, one tile for each face of the cubed sphere.

To change the default layout option (6x8):

$ cd /path/to/UFSCOMP/cime/scripts/ufs.s2s.c384_t025

# Add following line to user_nl_fv3gfs 
layout = 6 12

This will set layout(1) to 6 and layout(2) to 12 and the total number of PETs used by the FV3GFS model will be 6x12*6 = 432 (except PETs used by IO).

Also note that if the layout namelist option is not provided by the user using user_nl_fv3gfs, then the CIME case control system will automatically calculate appropriate layout options by considering the total number of PETs assigned to ATM component and to the IO tasks (write_tasks_per_group*write_groups). In this case, the CIME case control system will define the layout option to have as close to a square two-dimensional decomposition as possible.

How do I change number of IO tasks for FV3GFS?

In the current version of the CMEPS modeling system, FV3GFS namelist options are handled by the CIME.

To change the default number of IO tasks (write_tasks_per_group, 48):

$ cd /path/to/UFSCOMP/cime/scripts/ufs.s2s.c384_t025

# Add following line to user_nl_fv3gfs 
write_tasks_per_group = 72

This will increase the IO tasks by 50% over the default (48+24=72).

CIME calculates and modifies the FV3GFS layout namelist parameter (input.nml) automatically.

How do I adjust the wallclock time for jobs submitted to the batch system?

For a new case, pass the --walltime parameter to create_newcase. For example, to default the job time to 20 minutes, you would use this command:

$ ./create_newcase --compset UFS_S2S --res C384_t025 --case ufs.s2s.c384_t025.tw --driver nuopc --run-unsupported --walltime=00:20:00

Alternatively, the xmlchange command is also used to change wallclock time and job submission queue for an existing case:

# the collowing command sets job time to 20 minutes
$ ./xmlchange JOB_WALLCLOCK_TIME=00:20:00
$ ./xmlchange USER_REQUESTED_WALLTIME=00:20:00

# the following command changes the queue from regular to premium on NCAR's Cheyenne system
$ ./xmlchange JOB_QUEUE=premium

How do I restart the coupled system?

The coupled system should be run for at least 1 hour (or more precisely two slow coupling time steps) forecast period before restarting (cold run sequence). To restart the model, re-submit a case after the previous run by setting CONTINUE_RUN XML option to TRUE using xmlchange.

$ cd /path/to/UFSCOMP/cime/scripts/ufs.s2s.c384_t025  # back to your case root
$ ./xmlchange CONTINUE_RUN=TRUE
$ ./xmlchange STOP_OPTION=ndays
$ ./xmlchange STOP_N=1
$ ./case.submit

In this case, the modeling system will use the warm start run sequence which allows concurrency.

How do I generate an initial set of surface fluxes for the Mediator?

To run the system concurrently, an initial set of surface fluxes must be available to the Mediator. This capability was implemented to match the existing protocol used in the UFS. The basic procedure is to (1) run a cold start run sequence for one hour so the Mediator can write out a restart file, and then (2) run the system again with the Mediator set to read this restart file (containing surfaces fluxes for the first timestep) while the other components read in their original initial conditions. The detailed steps are as follows:

1. Run the model for 1 hour to produce a Mediator restart file.

$ ./create_newcase --compset UFS_S2S --res C384_t025 --case ufs.s2s.c384_t025.20120101 --driver nuopc --run-unsupported
$ cd ufs.s2s.c384_t025.20120101
$ ./case.setup
$ ./case.build
$ ./xmlchange DOUT_S=FALSE # no follow-up job submission for archiving
$ ./xmlchange STOP_N=1
$ ./xmlchange STOP_OPTION=nhours
$ ./case.submit

2. Restart the coupled system with only the Mediator reading in a restart file.

$ ./xmlchange MEDIATOR_READ_RESTART=TRUE
$ ./xmlchange STOP_OPTION=ndays
$ ./xmlchange STOP_N=1
$ ./case.submit

Just as with the CONTINUE_RUN option, the modeling system will also use the warm start run sequence when MEDIATOR_READ_RESTART is set as TRUE so that the model component will run concurrently.

How do I change the initial conditions?

The input file directories are set in the XML file /path/to/UFSCOMP/cime/config/cesm/machines/config_machines.xml. Note that this file is divided into sections, one for each supported platform (machine). This file sets the following three environment variables:

UGCSINPUTPATH: directory containing initial conditions
UGCSFIXEDFILEPATH: fixed files for FV3GFS such as topography, land-sea mask and land use types for different model resolutions
UGCSADDONPATH: fixed files for mediator such as grid spec file for desired FV3GFS model resolution

The relevant entries for each directory can be found under special XML element called as machine. For example, the <machine MACH="cheyenne"> element points to machine dependent entries for Cheyenne.

The default directories for Cheyenne, Stampede2 and Theia are as follows:

Cheyenne:
<env name="UGCSINPUTPATH">/glade/work/turuncu/FV3GFS/benchmark-inputs/2012010100/gfs/fcst</env>
<env name="UGCSFIXEDFILEPATH">/glade/work/turuncu/FV3GFS/fix_am</env>
<env name="UGCSADDONPATH">/glade/work/turuncu/FV3GFS/addon</env>

Stampede2:
<env name="UGCSINPUTPATH">/work/06242/tg855414/stampede2/FV3GFS/benchmark-inputs/2012010100/gfs/fcst</env>
<env name="UGCSFIXEDFILEPATH">/work/06242/tg855414/stampede2/FV3GFS/fix_am</env>
<env name="UGCSADDONPATH">/work/06242/tg855414/stampede2/FV3GFS/addon</env>

Theia:
<env name="UGCSINPUTPATH">/scratch4/NCEPDEV/nems/noscrub/Rocky.Dunlap/INPUTDATA/benchmark-inputs/2012010100/gfs/fcst</env>
<env name="UGCSFIXEDFILEPATH">/scratch4/NCEPDEV/nems/noscrub/Rocky.Dunlap/INPUTDATA/fix_am</env>
<env name="UGCSADDONPATH">/scratch4/NCEPDEV/nems/noscrub/Rocky.Dunlap/INPUTDATA/addon</env>

There are four different initial conditions for the C384/0.25 degree resolution available with this release, each based on CFS analyses: 2012-01-01, 2012-04-01, 2012-07-01 and 2012-10-01.

There are several steps to modify the initial condition of the coupled system:

1. Update the UGCSINPUTPATH variable to point to the desired directory containing the initial conditions

Make the change in /path/to/UFSCOMP/cime/config/cesm/machines/config_machines.xml. This will set the default path used for all new cases created after the change.
In an existing case, make the change in env_machine_specific.xml in the case root directory. This will only affect this single case.

2. Update the ice namelist to point to the correct initial condition file

The ice component needs to be manually updated to point to the correct initial conditions file. To update the initial conditions for the ice component, modify the user_nl_cice file in the case root and set ice_ic to the full path of the ice initial conditions file. For example, initial conditions for date 2012010100 should be provided as follows:

For example, the user_nl_cice would look like this:

!----------------------------------------------------------------------------------
! Users should add all user specific namelist changes below in the form of 
!   namelist_var = new_namelist_value 
! Note - that it does not matter what namelist group the namelist_var belongs to
!----------------------------------------------------------------------------------

ice_ic = "$ENV{UGCSINPUTPATH}/cice5_model.res_2012010100.nc"

3. Modify the start date of the simulation

The start date of the simulation needs to be set to the date of the new initial condition. This is done using xmlchange in your case root directory. Then, CIME will update date related namelist options for model_configure automatically. For example:
```
$ ./xmlchange RUN_REFDATE=2012-01-01
$ ./xmlchange RUN_STARTDATE=2012-01-01
```

How do I modify namelist/configuration of model components?

CIME interface of each model component is mainly extended to allow manual modification of the namelist options. This is maintained by individual files (user_nl_cice for CICE, user_nl_mom for MOM, user_nl_fv3gfs for FV3GFS and user_nl_cpl for Mediator).

How do I compile/run in debug mode?

Running in debug mode will enable compiler checks and turn on additional diagnostic output including the ESMF PET logs (one for each MPI task). Running in debug mode is recommended if you run into an issue or plan to make any code changes.

In your case directory:

$ ./xmlchange DEBUG=TRUE
$ ./case.build --clean-all
$ ./case.build

How do I dump mediator output?

The mediator fields (or history) can be written to disk to check exchanged fields among model components. Due to the large volume of data that will be written to the disk, it is better to keep the simulation length relatively short (i.e. couple of hours or days).