Skip to content

Commit

Permalink
updates to docs and defaults
Browse files Browse the repository at this point in the history
  • Loading branch information
allibco committed Jan 26, 2024
1 parent 3a4d5c7 commit ebfe2a2
Show file tree
Hide file tree
Showing 8 changed files with 192 additions and 243 deletions.
78 changes: 29 additions & 49 deletions docs/source/pyEnsSum.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ $CESMDATAROOT/inputdata/validation/uf_ensembles

Alternatively, pyEnsSum.py be used to create a summary file for CAM-ECT or
UF-CAM-ECT, given the location of appropriate ensemble history files (which should
be generated via CIME, https://github.com/ESMCI/cime)
be generated via CESM, https://github.com/ESCOMP/CESM).

(Note: to generate a summary file for POP-ECT, you must use pyEnsSumPop.py,
which has its own corresponding instructions)
(Note: to generate a summary file for POP-ECT or MPAS-ECT, you must use pyEnsSumPop.py
or PyEnsSumMPAS.py, respectively, each of which have their own corresponding instructions.)

To use pyEnsSum:
--------------------

1. On NCAR's Cheyenne machine:
1. On NCAR's Derecho machine:

An example script is given in ``test_pyEnsSum.sh``. Modify as needed and do:

Expand All @@ -36,79 +36,58 @@ To use pyEnsSum:
2. Otherwise you need these packages (see ``requirements.txt`):
* numpy
* scipy
* netcdf4
* mpi4py
* scipy
* netcdf4
* mpi4py
3. To see all options (and defaults):
``python pyEnsSum.py -h*``::

Creates the summary file for an ensemble of CAM data.

Args for pyEnsSum :

pyEnsSum.py
-h : prints out this usage message
--verbose : prints out in verbose mode (off by default)
--sumfile <ofile> : the output summary data file (default = ens.summary.nc)
--indir <path> : directory containing all of the ensemble runs (default = ./)
--esize <num> : Number of ensemble members (default = 350)
--tag <name> : Tag name used in metadata (default = cesm2_0)
--compset <name> : Compset used in metadata (default = F2000climo)
--res <name> : Resolution used in metadata (default = f19_f19)
--tslice <num> : the index into the time dimension (default = 1)
--mach <name> : Machine name used in the metadata (default = cheyenne)
--jsonfile <fname> : Jsonfile to provide that a list of variables that will be excluded
(default = exclude_empty.json)
--mpi_disable : Disable mpi mode to run in serial (off by default)
--fIndex <num> : Use this to start at ensemble member <num> instead of 000 (so
ensembles with numbers less than <num> are excluded from summary file)


Notes:
------------------

1. CAM-ECT uses yearly average files, which by default (in the ensemble.py
generation script in CIME) also contains the initial conditions. Therefore,
generation script in CESM) also contains the initial conditions. Therefore,
one typically needs to set ``--tslice 1`` to use the yearly average (because
slice 0 is the initial conditions.)

2. UF-CAM-ECT uses timestep nine. By default (in the ensemble.py
generation script in CIME) the ouput file also contains the initial conditions.
Therefore, one typically needs to set ``--tslice 1`` to use time step nine (because
slice 0 is the initial conditions.)

2. UF-CAM-ECT uses an early timestep such as 7 or 9. By default (in the ensemble.py
generation script in CESM) the ouput file no longer contains the initial conditions.
Therefore, one typically needs to set ``--tslice 0`, assuming that only one timestep
is written to the file.
3. There is no need to indicate UF-CAM-ECT vs. CAM-ECT to this routine. It
simply creates statistics for the supplied history files at the specified
time slice. For example, if you want to look at monthly files, simply
supply their location. Monthly files typically do not contain an initial
condition and would require ``--tslice 0``.

4. The ``--esize`` (the ensemble size) can be less than or equal to the number of files
in ``--indir``. Ensembles numbered 000-(esize-1) will be included unless ``--fIndex``
is specified. UF-CAM-ECT typically uses at least 350 members (the default),
in ``--indir``. Ensembles numbered 0000-(esize-1) will be included unless ``--fIndex``
is specified. UF-CAM-ECT typically uses at least 350 members,
whereas CAM-ECT does not require as many.

5. Note that ``--res``, ``--tag``, ``--compset``, and ``--mach``
parameters only affect the metadata in the summary file.
parameters only affect the metadata written to the summary file.

6. When running in parallel, the recommended number of cores to use is one
for each 3D variable. The default is to run in paralalel (recommended).
for each 3D variable. The default is to run in parallel (recommended).

7. You must specify a json file (via ``--jsonfile``) that indicates
the variables in the ensemble
output files that you want to exclude from the summary file
statistics (see the example json files). The pyEnsSum routine
will let you know if you have not
listed variables that need to be excluded (see next note). Keep in mind that
you must have *fewer* variables included than ensemble members.

statistics (see the example json files). The default is the provided
empty_excluded,json, which is does not contain any variables.
The pyEnsSum routine will let you know if you have not
listed variables that need to be excluded (see more in next note).
8. *IMPORTANT:* If there are variables that need to be excluded (that are not in
the .json file already) for the summary to be generated, pyEnsSum will list these
variables in the output. These variables will also be added to a copy of
your exclude variable list (prefixed with "NEW.") for future reference and use.
The summary file will be geberated with all listed variables excluded.
The summary file will be generated with all listed variables excluded.
Note that the following types of variables will be removed: any variables that
are constant across the ensemble, are not floating-point (e.g., integer),
are linearly dependant, or have very few (< 3%) unique values.
Expand All @@ -120,28 +99,29 @@ Example:

*To generate a summary file for 350 UF-CAM-ECT simulations runs (time step nine):*

* we specify the size (this is optional since 350 is the default) and data location:
* we specify the size and data location:

``--esize 350``

``--indir /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files``
``--indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files``

* We also specify the name of file to create for the summary:

``--sumfile uf.ens.c1.2.2.1_fc5.ne30.nc``

* Since the ensemble files contain the intial conditions as well as the values at time step 9 (this is optional as 1 is the default), we set:
* Since the ensemble files contain the intial conditions as well as the time slice that
contains the desired values at time step 9, we set:

``--tslice 1``

* We also specify the CESM tag, compset and resolution and machine of our ensemble data so that it can be written to the metadata of the summary file:

``--tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --mach cheyenne``

* We can exclude or include some variables from the analysis by specifying them in a json file:
* We can exclude variables from the analysis by specifying them in a json file:

``--jsonfile excluded_varlist.json``

* This yields the following command for your job submission script:

``python pyCECT.py --esize 350 --indir /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files --sumfile uf.ens.c1.2.2.1_fc5.ne30.nc --tslice 1 --tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --jsonfile excluded_varlist.json``
``python pyCECT.py --esize 350 --indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files --sumfile uf.ens.c1.2.2.1_fc5.ne30.nc --tslice 1 --tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --jsonfile excluded_varlist.json``
110 changes: 110 additions & 0 deletions docs/source/pyEnsSumMPAS.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@

pyEnsSumMPAS
==============

The verification tools in the CECT suite all require an *ensemble
summary file*, which contains statistics describing the ensemble distribution.
pyEnsSumMPAS can be used to create an MPAS (atmospheric component) ensemble summary file.

Note that an ensemble summary files for existing MPAS tags are not yet available as this
functionality is new. Therefore, pyEnsSum.py be used to create a summary file for MPAS-ECT,
given the location of appropriate ensemble history files (which should be generated
via MPAS-A, https://github.com/MPAS-Dev/MPAS-Model).

(Note: to generate a summary file for POP-ECT or MPAS-ECT, you must use pyEnsSumPop.py
or PyEnsSumMPAS.py, respectively, each of which have their own corresponding instructions.)


o use pyEnsSumMPAS:
--------------------

1. On NCAR's Derecho machine:

An example script is given in ``test_pyEnsSumMPAS.sh``. Modify as needed and do:

``qsub test_pyEnsSumMPAS.sh``

Note that the python environment is loaded in the script:
``module load conda``
``conda activate npl``

2. Otherwise you need these packages (see ``requirements.txt`):
* numpy
* scipy
* netcdf4
* mpi4py
3. To see all options (and defaults):
``python pyEnsSumMPAS.py -h*``::


Notes:
------------------

1. MPAS-ECT typically uses data after several timeteps, and the output file may contain
multiple timeslice and may or may not
contain initial conditions. Therefore, just be aware when choosing which time to use
to generate the summary that this same time slice is used for testing with pyCECT. Specify
the time slice with ``--tslice 0`, for example.
2. The ``--esize`` (the ensemble size) can be less than or equal to the number of files
in ``--indir``. Ensembles numbered 0000-(esize-1) will be included unless ``--fIndex``
is specified. MPAS-ECT typically uses at least 200 members.

3. Note that ``--core``, ``--tag``, ``--mesh``, ``--model``, and ``--mach``
parameters only affect the metadata written to the summary file.

4. When running in parallel, the recommended number of cores to use is one
for each 3D variable. The default is to run in parallel (recommended).

5. You must specify a json file (via ``--jsonfile``) that indicates
the variables in the ensembleoutput files that you want to exclude from the summary file
statistics (see the example json files). The default is the provided
empty_excluded,json, which is does not contain any variables.
The pyEnsSumMPAS routine will let you know if you have not
listed variables that need to be excluded (see more in next note).

6. *IMPORTANT:* If there are variables that need to be excluded (that are not in
the .json file already) for the summary to be generated, pyEnsSumMPAS will list these
variables in the output. These variables will also be added to a copy of
your exclude variable list (prefixed with "NEW.") for future reference and use.
The summary file will be generated with all listed variables excluded.
Note that the following types of variables will be removed: any variables that
are constant across the ensemble, are not floating-point (e.g., integer),
are linearly dependant, or have very few (< 3%) unique values.


Example:
--------------------------------------
(Note: This example is in test_pyEnsSumMPAS.sh)

*To generate a summary file for 200 MPAS-ECT simulations runs (from time slice 3 in the file):*

* we specify the size and data location:

``--esize 200``

``--indir /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/mpas_ens_files``

* We also specify the name of file to create for the summary:

``--sumfile mpas_sum.nc.nc``

* Since the ensemble files could contain more than one time steps (in this example,
starting a 3 and output every 3), then we specify a timeslice corresponding to timestep 12 with:

``--tslice 3``

* We can also specify the MPAS tag, model, mesh, core and machine of our ensemble data so that it can be written to the metadata of the summary file:

``--tag v7.3 --model mpas --mach cheyenne``

* We can exclude variables from the analysis by specifying them in a json file:

``--jsonfile empty_excluded.json``

* This yields the following command for your job submission script:

``python pyEnsSumMPAS.py --esize 200 --indir /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/mpas_ens_files --sumfile mpas_sum.nc --tslice 3 --tag v7.3 --model mpas --mach cheyenne --verbose --jsonfile empty_excluded.json``
47 changes: 12 additions & 35 deletions docs/source/pyEnsSumPop.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,16 @@ Alternatively, pyEnsSumPop can be used to create a summary file for POP-ECT
given the location of appropriate ensemble history files (which should
be generated in CIME via $CIME/tools/statistical_ensemble_test/ensemble.py).

(Note: to generate a summary file for UF-CAM-ECT or CAM-ECT, you must use
pyEnsSum.py, which has its own corresponding instructions.)
(Note: to generate a summary file for UF-CAM-ECT/CAM-ECT or MPAS-ECT, you must use
pyEnsSum.py or PyEnsSUmMPAS.py, each of which have their own corresponding instructions.)


To use pyEnsSumPop:
--------------------------

1. On NCAR's Cheyenne machine:
1. On NCAR's Derecho machine:

An example script is given in ``test_pyEnsSum.sh``. Modify as needed and do:
An example script is given in ``test_pyEnsSumPop.sh``. Modify as needed and do:

``qsub test_pyEnsSumPop.sh``

Expand All @@ -43,43 +43,20 @@ To use pyEnsSumPop:
3. To see all options (and defaults):
``python pyEnsSumPop.py -h``::

Creates the summary file for an ensemble of POP data.


Args for pyEnsSumPop :

pyEnsSumPop.py
-h : prints out this usage message
--verbose : prints out in verbose mode (off by default)
--sumfile <ofile> : the output summary data file (default = pop.ens.summary.nc)
--indir <path> : directory containing all of the ensemble runs (default = ./)
--esize <num> : Number of ensemble members (default = 40)
--tag <name> : Tag name used in metadata (default = cesm2_0_0)
--compset <name> : Compset used in metadata (default = G)
--res <name> : Resolution (used in metadata) (default = T62_g17)
--mach <num> : Machine name used in the metadata (default = cheyenne)
--tslice <num> : the time slice of the variable that we will use (default = 0)
--nyear <num> : Number of years (default = 1)
--nmonth <num> : Number of months (default = 12)
--jsonfile <fname> : Jsonfile to provide that a list of variables that will be included
(RECOMMENDED: default = pop_ensemble.json)
--mpi_disable : Disable mpi mode to run in serial (off by default)

``python pyEnsSumPop.py -h``


Notes:
----------------

1. POP-ECT uses monthly average files. Therefore, one typically needs
to set ``--tslice 0`` (which is the default).
to set ``--tslice 0``.

2. Note that ``--res``, ``--tag``, ``--compset``, and --mach only affect the
metadata in the summary file.

3. The sample script test_pyEnsSumPop.sh gives a recommended parallel
configuration for Cheyenne. We recommend one core per month (and make
configuration for Derecho. We recommend one core per month (and make
sure each core has sufficient memory).

4. The json file indicates variables from the output files that you want
Expand All @@ -94,17 +71,17 @@ Example:

*To generate a summary file for 40 POP-ECT simulations runs (1 year of monthly output):*

* We specify the size (this is optional since 40 is the default) and data location:
* We specify the size and data location:

``--esize 40``

``--indir /glade/p/cisl/iowa/pop_verification/cesm2_0_beta10/ensembles``
``--indir /glade/campaign/cisl/asap/pop_verification/cesm2_0_beta10/ensembles``

* We also specify the name of file to create for the summary:

``--sumfile pop.ens.sum.cesm2.0.nc``

* Since these are monthly average files, we set (optional as 0 is the default):
* Since these are monthly average files:

``--tslice 0``

Expand All @@ -121,7 +98,7 @@ Example:

``--res T62_g16``

``--mach cheyenne``
``--mach derecho``

``--compset G``

Expand All @@ -133,4 +110,4 @@ Example:

* This yields the following command for your job submission script:

``python pyEnsSumPop.py --indir /glade/p/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_ens_files --sumfile pop.cesm2.0.b10.nc --tslice 0 --nyear 1 --nmonth 12 --esize 40 --jsonfile pop_ensemble.json --mach cheyenne --compset G --tag cesm2_0_beta10 --res T62_g17``
``python pyEnsSumPop.py --indir /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_ens_files --sumfile pop.cesm2.0.b10.nc --tslice 0 --nyear 1 --nmonth 12 --esize 40 --jsonfile pop_ensemble.json --mach derecho --compset G --tag cesm2_0_beta10 --res T62_g17``
Loading

0 comments on commit ebfe2a2

Please sign in to comment.