Skip to content

Introduction to YAML

Cory Martin edited this page Aug 18, 2021 · 1 revision

YAML stands for ‘YAML Ain’t Markup Language’, it’s a markup language similar to XML, etc. that allows for configuration in a controlled format in plain text files. JEDI uses YAML for all of its configuration files and this format is becoming increasingly important for atmospheric science applications.

Information about YAML file format and some useful websites/references:

https://blog.stackpath.com/yaml/

https://docs.ansible.com/ansible/latest/reference_appendices/YAMLSyntax.html

https://yaml.org/

http://www.yamllint.com/ - a YAML web validator (check syntax)

Dissect Sample JEDI YAML Files

In the next two sections, we will take a look at a sample YAML file for a test from two different components and briefly explain what each line means/represents. This is by no means an exhaustive discussion, but will serve to show how to work with the YAML files, and how there are similarities and differences between different applications.

UFO Example

First, let’s start with something simple. One of the shortest YAML test files is for the aircraft observation operator tests, fv3-bundle/ufo/test/testinput/aircraft.yaml.

1  window begin: '2018-04-14T20:30:00Z'
2  window end: '2018-04-15T03:30:00Z'
3  
4  observations:
5  - obs space:
6      name: Aircraft
7      obsdatain:
8       obsfile: Data/ioda/testinput_tier_1/aircraft_obs_2018041500_m.nc4
9      simulated variables: [air_temperature,specific_humidity]
10    #simulated variables: [eastward_wind, northward_wind]
11   obs operator:
12    name: VertInterp
13   linear obs operator test:
14     coef TL: 0.1
15     tolerance TL: 1.0e-13
16     tolerance AD: 1.0e-11
17   geovals:
18     filename: Data/ufo/testinput_tier_1/aircraft_geoval_2018041500_m.nc4
19   vector ref: GsiHofX
20   tolerance: 1.0e-6

The above lines are numbered, and each will be described below:

  1. Window begin is the string of the beginning timestamp of the assimilation window
  2. Window end is the string of the ending timestamp of the assimilation window
  3. The high level observations section is used to describe all of the observation types to be used
  4. In YAML, a ‘-’ indicates an item in a list, so this is listing each observation type. For this file, there is only one in the list
  5. name of the obs space to reference in memory (can really be anything you want)
  6. obsdatain: this section will describe the input
  7. path to input IODA observation file
  8. list of variables/observations to simulate
  9. obs operator: this section will setup the UFO that you wish to use
  10. name of the UFO, this must match what is is in the source code/factory
  11. this section defines parameters for the linear operator tests
  12. coefficient of the perturbation to apply in the tangent linear
  13. tolerance of the tangent linear test
  14. tolerance of the adjoint test
  15. geovals: this section is for the pre interpolated model fields for these unit tests
  16. path to the interpolated GeoVaLs
  17. vector ref: this tells the test executable what to use as a reference for the H(x) calculation
  18. tolerance of the test comparing output from UFO H(x) and the GSI H(x)

This YAML file is used by the following ctests: test_ufo_vertinterp_aircraft_opr test_ufo_linopr_vertinterp_aircraft

Note that this is a very simple YAML configuration file, and that most will be longer and have more components. Thus, we will look at an example from FV3-JEDI to see a different perspective. The practicals later in this document will feature more comprehensive YAML files.

FV3-JEDI Example

Building from this example which runs the vertical interpolation observation operator for aircraft observations, let us look at an example FV3-JEDI YAML file that runs the fv3jedi_hofx_nomodel.x application to produce H(x), fv3-jedi/test/testinput/hofx_nomodel.yaml.

1  window begin: '2018-04-14T21:00:00Z'
2  window length: PT6H
3  forecast length: PT6H
4  geometry:
5    nml_file_mpp: Data/fv3files/fmsmpp.nml
6    trc_file: Data/fv3files/field_table
7    akbk: Data/fv3files/akbk64.nc4
8    # input.nml
9    layout: [1,1]
10   io_layout: [1,1]
11   npx: 13
12   npy: 13
13   npz: 64
14   ntiles: 6
15   fieldsets:
16     - fieldset: Data/fieldsets/dynamics.yaml
17     - fieldset: Data/fieldsets/ufo.yaml
18 forecasts:
19   #state:
20   filetype: gfs
21   datapath: Data/inputs/gfs_c12/bkg/
22   filename_core: 20180415.000000.fv_core.res.nc
23   filename_trcr: 20180415.000000.fv_tracer.res.nc
24   filename_sfcd: 20180415.000000.sfc_data.nc
25   filename_sfcw: 20180415.000000.fv_srf_wnd.res.nc
26   filename_cplr: 20180415.000000.coupler.res
27   state variables: [u,v,ua,va,T,DELP,sphum,ice_wat,liq_wat,o3mr,phis,
28                     slmsk,sheleg,tsea,vtype,stype,vfrac,stc,smc,snwdph,
29                     u_srf,v_srf,f10m,sss]
30 observations:
31 - obs space:
32     name: Aircraft
33     obsdatain:
34       obsfile: Data/obs/testinput_tier_1/aircraft_obs_2018041500_m.nc4
35     obsdataout:
36       obsfile: Data/hofx/aircraft_hofx_gfs_2018041500_m.nc4
37     simulated variables: [eastward_wind,northward_wind,air_temperature]
38   obs operator:
39     name: VertInterp
40 - obs space:
41     name: AMSUA-NOAA19
42     obsdatain:
43       obsfile: Data/obs/testinput_tier_1/amsua_n19_obs_2018041500_m.nc4
44     obsdataout:
45       obsfile: Data/hofx/amsua_n19_hofx_gfs_2018041500_m.nc4
46     simulated variables: [brightness_temperature]
47     channels: 1-15
48   obs operator:
49     name: CRTM
50     Absorbers: [H2O,O3,CO2]
51     Clouds: [Water, Ice]
52     Cloud_Fraction: 1.0
53     obs options:
54       Sensor_ID: amsua_n19
55       EndianType: little_endian
56       CoefficientPath: Data/crtm/
57 prints:
58   frequency: PT3H

Again, the above lines are numbered, and will be summarized below.

  1. Window begin, same as the UFO example, the start of the assimilation window

  2. Window length, this time we specify a 6 hour assimilation window

  3. Forecast length, the background used as input is FH006 4-17. The model geometry for FV3 is defined here.

    5 - Path to the FMS/MPP namelist file

    6 - Path to the FV3 field table file

    7 - Path to the FV3 AK/BK hybrid coordinate netCDF input file

    9 - the MPI layout for each FV3 cubed-sphere tile

    10 - the IO layout for each FV3 cubed-sphere tile

    11-14 - FV3 grid size definitions

15-17. List of fieldsets (YAML files that describe which input FV3 tile file contains each model field, the field’s units, etc.) 18-29. This section describes the input model forecast.

  1. gfs or geos?

  2. path to input model restart files

22-26. list the filenames for the core, trcr, sfcd, sfcw, cplr restart files

27-29. list of the state variables to read in from the restart files

30-56. Like before, this is the observations section, defining all observations

31-39. Aircraft observations are defined here. This will be very similar to the UFO example with a few exceptions. 35-36 are obsdataout/obsfile, this is the path to an output file that will write the H(x) values (and other things). Also, any missing lines that were in the UFO example (the tolerances, geovals, etc.) are not needed for anything besides the simple unit tests, like in a ‘real’ case such as this one to compute H(x).

40-56. This is an example section of how to simulate AMSU-A brightness temperatures. Line 47 specifies the channels (can be a range like 1-15 or something like 1-15,17-20,22,24-30). Unlike the vertical interpolation observation operator, the CRTM operator requires some configuration options to be set (lines 50-56). Things like the list of absorbers (50), clouds (51), and the parameters for the correct CRTM coefficients (54-56) must be defined.

57-58. This just specifies that output is printed every 3 hours of model time (not really used if there isn’t a model being integrated or no First-Guess at Appropriate Time (FGAT).