updates to docs and defaults

NCAR · Jan 26, 2024 · ebfe2a2 · ebfe2a2
1 parent 3a4d5c7
commit ebfe2a2
Show file tree

Hide file tree

Showing 8 changed files with 192 additions and 243 deletions.
diff --git a/docs/source/pyEnsSum.rst b/docs/source/pyEnsSum.rst
@@ -15,15 +15,15 @@ $CESMDATAROOT/inputdata/validation/uf_ensembles
 
 Alternatively, pyEnsSum.py be used to create a summary file for CAM-ECT or
 UF-CAM-ECT, given the location of appropriate ensemble history files (which should
-be generated via CIME,  https://github.com/ESMCI/cime)
+be generated via CESM,  https://github.com/ESCOMP/CESM).
 
-(Note: to generate a summary file for POP-ECT, you must use pyEnsSumPop.py,
-which has its own corresponding instructions)
+(Note: to generate a summary file for POP-ECT or MPAS-ECT, you must use pyEnsSumPop.py
+or PyEnsSumMPAS.py, respectively, each of which have their own corresponding instructions.)
 
 To use pyEnsSum:
 --------------------
 
-1. On NCAR's Cheyenne machine:
+1. On NCAR's Derecho machine:
 
    An example script is given in ``test_pyEnsSum.sh``.  Modify as needed and do:
 
@@ -36,79 +36,58 @@ To use pyEnsSum:
 2.  Otherwise you need these packages (see ``requirements.txt`):
 
          * numpy
-	 * scipy
-	 * netcdf4
-	 * mpi4py
+         * scipy
+         * netcdf4
+         * mpi4py
 
 3. To see all options (and defaults):
 
    ``python pyEnsSum.py -h*``::
 
-       Creates the summary file for an ensemble of CAM data.
-
-       Args for pyEnsSum :
-
-       pyEnsSum.py
-       -h                   : prints out this usage message
-       --verbose            : prints out in verbose mode (off by default)
-       --sumfile <ofile>    : the output summary data file (default = ens.summary.nc)
-       --indir <path>       : directory containing all of the ensemble runs (default = ./)
-       --esize  <num>       : Number of ensemble members (default = 350)
-       --tag <name>         : Tag name used in metadata (default = cesm2_0)
-       --compset <name>     : Compset used in metadata (default = F2000climo)
-       --res <name>         : Resolution used in metadata (default = f19_f19)
-       --tslice <num>       : the index into the time dimension (default = 1)
-       --mach <name>        : Machine name used in the metadata (default = cheyenne)
-       --jsonfile <fname>   : Jsonfile to provide that a list of variables that will be excluded
-                               (default = exclude_empty.json)
-       --mpi_disable        : Disable mpi mode to run in serial (off by default)
-       --fIndex <num>       : Use this to start at ensemble member <num> instead of 000 (so
-                              ensembles with numbers less than <num> are excluded from summary file)
-
 
 Notes:
 ------------------
 
 1. CAM-ECT uses yearly average files, which by default (in the ensemble.py
-   generation script in CIME) also contains the initial conditions.  Therefore,
+   generation script in CESM) also contains the initial conditions.  Therefore,
    one typically needs to set ``--tslice 1`` to use the yearly average (because
    slice 0 is the initial conditions.)
 
-2.  UF-CAM-ECT uses timestep nine.  By default (in the ensemble.py
-    generation script in CIME) the ouput file also contains the initial conditions.
-    Therefore, one typically needs to set ``--tslice 1`` to use time step nine (because
-    slice 0 is the initial conditions.)
-
+2.  UF-CAM-ECT uses an early timestep such as 7 or 9.  By default (in the ensemble.py
+    generation script in CESM) the ouput file no longer contains the initial conditions.
+    Therefore, one typically needs to set ``--tslice 0`, assuming that only one timestep
+    is written to the file.
+   
 3. There is no need to indicate UF-CAM-ECT vs. CAM-ECT to this routine.  It
    simply creates statistics for the supplied history files at the specified
    time slice. For example, if you want to look at monthly files, simply
    supply their location.  Monthly files typically do not contain an initial
    condition and would require ``--tslice 0``.
 
 4. The ``--esize``  (the ensemble size) can be less than or equal to the number of files
-   in ``--indir``.  Ensembles numbered 000-(esize-1) will be included unless ``--fIndex``
-   is specified.  UF-CAM-ECT typically uses at least 350 members (the default),
+   in ``--indir``.  Ensembles numbered 0000-(esize-1) will be included unless ``--fIndex``
+   is specified.  UF-CAM-ECT typically uses at least 350 members,
    whereas CAM-ECT does not require as many.
 
 5. Note that ``--res``, ``--tag``, ``--compset``, and ``--mach``
-   parameters only affect the metadata in the summary file.
+   parameters only affect the metadata written to the summary file.
 
 6. When running in parallel, the recommended number of cores to use is one
-   for each 3D variable. The default is to run in paralalel (recommended).
+   for each 3D variable. The default is to run in parallel (recommended).
 
 7. You must specify a json file (via ``--jsonfile``) that indicates
    the variables in the ensemble
    output files that you want to exclude from the summary file
-   statistics (see the example json files).  The pyEnsSum routine
-   will let you know if you have not
-   listed variables that need to be excluded (see next note).  Keep in mind that
-   you must have *fewer* variables included than ensemble members.
-
+   statistics (see the example json files).  The default is the provided
+   empty_excluded,json, which is does not contain any variables.
+   The pyEnsSum routine will let you know if you have not
+   listed variables that need to be excluded (see more in next note).
+   
 8. *IMPORTANT:* If there are variables that need to be excluded (that are not in
    the .json file  already) for the summary to be generated, pyEnsSum will list these
    variables in the output.  These variables will also be added to a copy of
    your exclude variable list (prefixed with "NEW.") for future reference and use.
-   The summary file will be geberated with all listed variables excluded.
+   The summary file will be generated with all listed variables excluded.
    Note that the following types of variables will be removed:  any variables that
    are constant across the ensemble, are not floating-point (e.g., integer),
    are linearly dependant, or have very few (< 3%) unique values.
@@ -120,28 +99,29 @@ Example:
 
 *To generate a summary file for 350 UF-CAM-ECT simulations runs (time step nine):*
 
-* we specify the size (this is optional since 350 is the default) and data location:
+* we specify the size and data location:
 
   ``--esize 350``
 
-  ``--indir /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files``
+  ``--indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files``
 
 * We also specify the name of file to create for the summary:
 
   ``--sumfile uf.ens.c1.2.2.1_fc5.ne30.nc``
 
-* Since the ensemble files contain the intial conditions  as well as the values at time step 9 (this is optional as 1 is the default), we set:
+* Since the ensemble files contain the intial conditions  as well as the time slice that
+  contains the desired values at time step 9, we set:
 
   ``--tslice 1``
 
 * We also specify the CESM tag, compset and resolution and machine of our ensemble data so that it can be written to the metadata of the summary file:
 
   ``--tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --mach cheyenne``
 
-* We can exclude or include some variables from the analysis by specifying them in a json file:
+* We can exclude variables from the analysis by specifying them in a json file:
 
   ``--jsonfile excluded_varlist.json``
 
 * This yields the following command for your job submission script:
 
-  ``python pyCECT.py --esize 350 --indir /glade/p/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files  --sumfile uf.ens.c1.2.2.1_fc5.ne30.nc  --tslice 1 --tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --jsonfile excluded_varlist.json``
+  ``python pyCECT.py --esize 350 --indir /glade/campaign/cisl/asap/pycect_sample_data/cam_c1.2.2.1/uf_cam_ens_files  --sumfile uf.ens.c1.2.2.1_fc5.ne30.nc  --tslice 1 --tag cesm1.2.2.1 --compset FC5 --res ne30_ne30 --jsonfile excluded_varlist.json``
diff --git a/docs/source/pyEnsSumMPAS.rst b/docs/source/pyEnsSumMPAS.rst
@@ -0,0 +1,110 @@
+
+pyEnsSumMPAS
+==============
+
+The verification tools in the CECT suite all require an *ensemble
+summary file*, which contains statistics describing the ensemble distribution.
+pyEnsSumMPAS can be used to create an MPAS (atmospheric component) ensemble summary file.
+
+Note that an ensemble summary files for existing MPAS tags are not yet available as this
+functionality is new.  Therefore, pyEnsSum.py be used to create a summary file for MPAS-ECT,
+given the location of appropriate ensemble history files (which should be generated
+via MPAS-A, https://github.com/MPAS-Dev/MPAS-Model).
+
+(Note: to generate a summary file for POP-ECT or MPAS-ECT, you must use pyEnsSumPop.py
+or PyEnsSumMPAS.py, respectively, each of which have their own corresponding instructions.)
+
+
+o use pyEnsSumMPAS:
+--------------------
+
+1. On NCAR's Derecho machine:
+
+   An example script is given in ``test_pyEnsSumMPAS.sh``.  Modify as needed and do:
+
+   ``qsub test_pyEnsSumMPAS.sh``
+
+   Note that the python environment is loaded in the script:
+   ``module load conda``
+   ``conda activate npl``
+
+2.  Otherwise you need these packages (see ``requirements.txt`):
+
+         * numpy
+         * scipy
+         * netcdf4
+         * mpi4py
+
+3. To see all options (and defaults):
+
+   ``python pyEnsSumMPAS.py -h*``::
+
+
+Notes:
+------------------
+
+1. MPAS-ECT typically uses data after several timeteps, and the output file may contain
+   multiple timeslice and may or may not
+   contain initial conditions.   Therefore, just be aware when choosing which time to use
+   to generate the summary that this same time slice is used for testing with pyCECT. Specify
+   the time slice with ``--tslice 0`, for example.
+
+2. The ``--esize``  (the ensemble size) can be less than or equal to the number of files
+   in ``--indir``.  Ensembles numbered 0000-(esize-1) will be included unless ``--fIndex``
+   is specified.  MPAS-ECT typically uses at least 200 members.
+
+3. Note that ``--core``, ``--tag``, ``--mesh``, ``--model``, and ``--mach``
+   parameters only affect the metadata written to the summary file.
+
+4. When running in parallel, the recommended number of cores to use is one
+   for each 3D variable. The default is to run in parallel (recommended).
+
+5. You must specify a json file (via ``--jsonfile``) that indicates
+   the variables in the ensembleoutput files that you want to exclude from the summary file
+   statistics (see the example json files).  The default is the provided
+   empty_excluded,json, which is does not contain any variables.
+   The pyEnsSumMPAS routine will let you know if you have not
+   listed variables that need to be excluded (see more in next note).
+
+6. *IMPORTANT:* If there are variables that need to be excluded (that are not in
+   the .json file  already) for the summary to be generated, pyEnsSumMPAS will list these
+   variables in the output.  These variables will also be added to a copy of
+   your exclude variable list (prefixed with "NEW.") for future reference and use.
+   The summary file will be generated with all listed variables excluded.
+   Note that the following types of variables will be removed:  any variables that
+   are constant across the ensemble, are not floating-point (e.g., integer),
+   are linearly dependant, or have very few (< 3%) unique values.
+
+
+Example:
+--------------------------------------
+(Note: This example is in test_pyEnsSumMPAS.sh)
+
+*To generate a summary file for 200 MPAS-ECT simulations runs (from time slice 3 in the file):*
+
+* we specify the size and data location:
+
+  ``--esize 200``
+
+  ``--indir /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/mpas_ens_files``
+
+* We also specify the name of file to create for the summary:
+
+  ``--sumfile mpas_sum.nc.nc``
+
+* Since the ensemble files could contain more than one time steps (in this example,
+  starting a 3 and output every 3), then we specify a timeslice corresponding to timestep 12 with:
+
+``--tslice 3``
+
+* We can also specify the MPAS tag, model, mesh, core and machine of our ensemble data so that it can be written to the metadata of the summary file:
+
+  ``--tag v7.3 --model mpas --mach cheyenne``
+
+* We can exclude variables from the analysis by specifying them in a json file:
+
+  ``--jsonfile empty_excluded.json``
+
+* This yields the following command for your job submission script:
+
+  ``python pyEnsSumMPAS.py --esize 200 --indir /glade/campaign/cisl/asap/pycect_sample_data/mpas_a.v7.3/mpas_ens_files  --sumfile mpas_sum.nc --tslice 3 --tag v7.3 --model mpas  --mach cheyenne --verbose --jsonfile empty_excluded.json``
diff --git a/docs/source/pyEnsSumPop.rst b/docs/source/pyEnsSumPop.rst
@@ -17,16 +17,16 @@ Alternatively, pyEnsSumPop can be used to create a summary file for POP-ECT
 given the location of appropriate ensemble history files (which should
 be generated in CIME via $CIME/tools/statistical_ensemble_test/ensemble.py).
 
-(Note: to generate a summary file for UF-CAM-ECT or CAM-ECT, you must use
-pyEnsSum.py, which has its own corresponding instructions.)
+(Note: to generate a summary file for UF-CAM-ECT/CAM-ECT or MPAS-ECT, you must use
+pyEnsSum.py or PyEnsSUmMPAS.py, each of which have their own corresponding instructions.)
 
 
 To use pyEnsSumPop:
 --------------------------
 
-1. On NCAR's Cheyenne machine:
+1. On NCAR's Derecho machine:
 
-   An example script is given in ``test_pyEnsSum.sh``.  Modify as needed and do:
+   An example script is given in ``test_pyEnsSumPop.sh``.  Modify as needed and do:
 
    ``qsub test_pyEnsSumPop.sh``
 
@@ -43,43 +43,20 @@ To use pyEnsSumPop:
 
 3. To see all options (and defaults):
 
-   ``python pyEnsSumPop.py -h``::
-
-       Creates the summary file for an ensemble of POP data.
-
-
-       Args for pyEnsSumPop :
-
-       pyEnsSumPop.py
-       -h                   : prints out this usage message
-       --verbose            : prints out in verbose mode (off by default)
-       --sumfile  <ofile>   : the output summary data file (default = pop.ens.summary.nc)
-       --indir    <path>    : directory containing all of the ensemble runs (default = ./)
-       --esize <num>        : Number of ensemble members (default = 40)
-       --tag <name>         : Tag name used in metadata (default = cesm2_0_0)
-       --compset <name>     : Compset used in metadata (default = G)
-       --res <name>         : Resolution (used in metadata) (default = T62_g17)
-       --mach <num>         : Machine name used in the metadata (default = cheyenne)
-       --tslice <num>       : the time slice of the variable that we will use (default = 0)
-       --nyear  <num>       : Number of years (default = 1)
-       --nmonth  <num>      : Number of months (default = 12)
-       --jsonfile <fname>   : Jsonfile to provide that a list of variables that will be included
-                              (RECOMMENDED: default = pop_ensemble.json)
-       --mpi_disable        : Disable mpi mode to run in serial (off by default)
-
+   ``python pyEnsSumPop.py -h``
 
 
 Notes:
 ----------------
 
 1. POP-ECT uses monthly average files. Therefore, one typically needs
-    to set ``--tslice 0`` (which is the default).
+    to set ``--tslice 0``.
 
 2.  Note that ``--res``, ``--tag``, ``--compset``, and --mach only affect the
     metadata in the summary file.
 
 3.  The sample script test_pyEnsSumPop.sh gives a recommended parallel
-    configuration for Cheyenne.  We recommend one core per month (and make
+    configuration for Derecho.  We recommend one core per month (and make
     sure each core has sufficient memory).
 
 4.  The json file indicates variables from the output files that you want
@@ -94,17 +71,17 @@ Example:
 
 *To generate a summary file for 40 POP-ECT simulations runs (1 year of monthly output):*
 
-* We specify the size (this is optional since 40 is the default) and data location:
+* We specify the size and data location:
 
   ``--esize 40``
 
-  ``--indir /glade/p/cisl/iowa/pop_verification/cesm2_0_beta10/ensembles``
+  ``--indir /glade/campaign/cisl/asap/pop_verification/cesm2_0_beta10/ensembles``
 
 *  We also specify the name of file to create for the summary:
 
    ``--sumfile pop.ens.sum.cesm2.0.nc``
 
-* Since these are monthly average files, we set (optional as 0 is the default):
+* Since these are monthly average files:
 
   ``--tslice 0``
 
@@ -121,7 +98,7 @@ Example:
 
    ``--res T62_g16``
 
-   ``--mach cheyenne``
+   ``--mach derecho``
 
    ``--compset G``
 
@@ -133,4 +110,4 @@ Example:
 
  * This yields the following command for your job submission script:
 
- ``python pyEnsSumPop.py  --indir  /glade/p/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_ens_files  --sumfile pop.cesm2.0.b10.nc --tslice 0 --nyear 1 --nmonth 12 --esize 40 --jsonfile pop_ensemble.json   --mach cheyenne --compset G --tag cesm2_0_beta10 --res T62_g17``
+ ``python pyEnsSumPop.py  --indir  /glade/campaign/cisl/asap/pycect_sample_data/pop_c2.0.b10/pop_ens_files  --sumfile pop.cesm2.0.b10.nc --tslice 0 --nyear 1 --nmonth 12 --esize 40 --jsonfile pop_ensemble.json   --mach derecho --compset G --tag cesm2_0_beta10 --res T62_g17``