Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify WRF-DART Tutorial scripting for Derecho #627

Closed
braczka opened this issue Jan 24, 2024 · 7 comments
Closed

Modify WRF-DART Tutorial scripting for Derecho #627

braczka opened this issue Jan 24, 2024 · 7 comments
Assignees
Labels
Derecho issues related to running on NCAR's new supercomputer wrf Weather Research & Forecasting Model

Comments

@braczka
Copy link
Contributor

braczka commented Jan 24, 2024

Use case

Modify WRF-DART tutorial code to work with Derecho.

Is your feature request related to a problem?

Scripting currently compatible with de-comissionned Cheyenne, however both systems using PBS queueing systems so modification should be minimal.

Describe your preferred solution

  1. Adapt queuing system options for Derecho
  2. Adapt mpi run commands
  3. Adapt WRF and DART processor layout
  4. Document system environment that works with precompiled Derecho compatible WRF executables and any other
    steps that depart from Cheyenne

Describe any alternatives you have considered

None.

@braczka braczka added Derecho issues related to running on NCAR's new supercomputer wrf Weather Research & Forecasting Model labels Jan 24, 2024
@braczka braczka self-assigned this Jan 24, 2024
@braczka
Copy link
Contributor Author

braczka commented Jan 24, 2024

Initial testing suggests modifying system environment (through c-shell scripting module load commands) is not working with Derecho. Initial workaround was to modify system environment through log-in .tcshrc scripting to adjust system environment.

Initial testing of PBS scripting with WRF job provides output and error files for each task within in a single ensemble member. Prior submissions on Cheyenne only provided single output/error files. Could be non-issue as WRF simulations are successful.

@braczka
Copy link
Contributor Author

braczka commented Jan 25, 2024

Update: Tutorial works OK unitil step:

./driver.csh 2017042706 param.csh >& run.out &

The DART filter step completes successfully, however subsequent WRF model advance after update fails for some ensemble members. Still investigating cause. Possible reasons include use of newer WRF version (4.0) and using pre-compiled wrfda executable. WRF-DART tutorial was designed with WRF 3.9.1.

@braczka
Copy link
Contributor Author

braczka commented Jan 31, 2024

Csh script submission errors: Inserting the command 'source /etc/profile.d/z00_modules.csh' into the csh script within a PBS submission allows the module load command to work. Cisl-help recommended avoiding the usage of module load in any non-PBS csh scripts for now. Both these issues should be addressed in the next downtime (Feb 5-7).

To isolate cause of WRFv4.0 step advance failure, I switched back to the WRFv3.9.1, which has been used/recommended to run the tutorial. Source code for WRF, WPS and WRFDA build on Derecho are located here:

WRF_DM_SRC_DIR = /glade/work/bmraczka/WRF/WRFV3.9.1.1.TAR.gz
WPS_SRC_DIR = /glade/work/bmraczka/WRF/WPSV3.9.1.TAR.gz
VAR_SRC_DIR = /glade/work/bmraczka/WRF/WRFDA_V3.9.1.tar.gz

Following guidance from cislhelp I switched from standard intel compiler to gnu to compile WRF on Derecho. The upgrade in GCC has led to some bugs requiring certain environmental settings as a workaround as documented here:https://forum.mmm.ucar.edu/threads/how-to-fix-rank-mismatch-between-actual-argument-at-1-and-actual-argument-at-2-scalar-and-rank-1.14995/.

To successfully build the required WRF, WPS and WRFDA executables I did the following:

>> module --force purge
>> module load ncarenv/23.09 gcc/12.2.0 udunits/2.2.28 ncview/2.1.9 ncarcompilers/1.0.0 craype/2.7.23 cray-mpich/8.1.27 hdf5-mpi/1.12.2 netcdf-mpi/4.9.2
>> cd {WRF_directory}
>> ./configure    # Choose gnu dmpar option (34), then option 1 to generate configure.wrf

! Edits to configure.wrf file
 FCBASEOPTS = $(FCBASEOPTS_NO_G) $(FCDEBUG) -fallow-argument-mismatch  -fallow-invalid-boz
...
LDFLAGS = $(OMP) $(FCFLAGS) $(LDFLAGS_LOCAL) -ltirpc

>> ./compile em_real  >& compile.log
>> cd {WPS_directory}
>> ./configure.    # Choose option 1 for gfortran (serial)

! Edits to configure.wrf
FFLAGS              = -ffree-form -O -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz
F77FLAGS            = -ffixed-form -O -fconvert=big-endian -frecord-marker=4 -fallow-argument-mismatch -fallow-invalid-boz

Edit to WPS install ~/ungrib/src/ngl/g2/intmath.f

Solution posted on github here:
(https://github.com/wrf-model/WPS/pull/119/files)
Will solve the error: Argument of 'iand' have different tupe parameters

./compile >& compile.log

cd {WRFDA_directory}
./configure wrda --> choose option 34. (dmpar) GNU (gfortran/gcc)

Edit configure.wrf file

  FCBASEOPTS = $(FCBASEOPTS_NO_G) $(FCDEBUG) -fallow-argument-mismatch -fallow-invalid-boz
  FCOPTIM = -O2 -ftree-vectorize -funroll-loops -fallow-argument-mismatch

Make manual edit in {WRDA}/var/da/da_monitor/da_rad_diags.f90
This will avoid Symbol -- must be declared before the namelist id declared error

 integer                                :: nproc, cycle_period
  integer, parameter                     :: maxnum = 20
  character(len=20), dimension(maxnum)   :: instid
  character(len=6)                       :: file_prefix
  character(len=10)                      :: start_date, end_date

  namelist /record1/ nproc, instid, file_prefix, start_date, end_date, cycle_period
          ! nproc: number of processsors used when writing out inv files
          ! instid, eg dmsp-16-ssmis
          ! file_prefix, inv or oma
          ! start_date, end_date, eg 2006100100, 2006102800
          ! cycle_period (hours) between dates, eg 6 or 12
  integer, parameter                     :: maxlvl = 100
  integer                                :: nml_unit = 87
  integer                                :: nlev, ilev, ich
  integer                                :: nlev_rtm, nlev_mdl
!  character(len=20), dimension(maxnum)   :: instid
!  character(len=6)                       :: file_prefix
!  character(len=10)                      :: start_date, end_date

./compile all_wrfvar >& compile.log

Check for all executables at the end of compile step as following documentation: [(https://www2.mmm.ucar.edu/wrf/users/docs/user_guide_V3/user_guide_V3.9/users_guide_chap2.htm#_Required_Compilers_and_1)

@braczka
Copy link
Contributor Author

braczka commented Feb 1, 2024

Successfully ran full WRF-DART Tutorial on Derecho. All output statistics/diagnostics looked nearly identical to the previous Cheyenne intel compiler example provided in the WRF-DART web diagnostics section. I used the gfortran compiler for WRF executables (as described in previous comments) and also with the DART build. For the DART build I used the mkmf.template.gfortran as template and edited the following line:

FFLAGS = -O2 -ffree-line-length-none -fallow-argument-mismatch -fallow-invalid-boz $(INCS)

Because the tutorial code often uses nco and ncl commands, and current Derecho environment makes it challenging to load these modules using csh scripting, this necessitated insertion of:

   source /etc/profile.d/z00_modules.csh
   module load nco
   module load ncl 

within PBS portion of init_ensemble_var.csh script. Because driver.csh also requires nco commands in non-PBS scripting I also inserted module load nco and ncl commands within my home directory .tcshrc to generate the proper environment.

Based on this, I can generate a PR to update the WRF-DART tutorial csh scripting itself, and also provide improved documentation on how to generate the correct environment on Derecho. Not sure if I should wait on issuing PR given the system will be undergoing changes during the Feb 5-7th downtime. Probably will issue a draft PR and wait until system is more stable before trying to merge.

WRF source code for this build and simulation is located here:

WRF_DM_SRC_DIR    = /glade/work/bmraczka/WRF/WRFv3.9.1.1       
WPS_SRC_DIR       = /glade/work/bmraczka/WRF/WPSv3.9.1                   
VAR_SRC_DIR       = /glade/work/bmraczka/WRF/WRFDAv3.9.1          

My WRF-DART tutorial example (Derecho, gfortan, WRFv3.9.1) is located here:

/glade/derecho/scratch/bmraczka/WRFv3.9.1_DART_Tutorial/

My prior example (Derecho, precompiled intel exectuables, WRFv4.0) is located here:

/glade/derecho/scratch/bmraczka/WRFv3.9.1_DART_Tutorial/

I am circling back to the WRFv4.0 case to figure out why it failed on Derecho... newer WRF version?, hybrid-coordinate system? intel compiler issue?

@braczka
Copy link
Contributor Author

braczka commented Feb 1, 2024

Typo fix:

Prior example (Derecho, precompiled intel exectuables, WRFv4.0) located here:

/glade/derecho/scratch/bmraczka/WRF_DART_Tutorial/

@braczka
Copy link
Contributor Author

braczka commented Feb 10, 2024

The csh module load command issues were resolved during the Feb 5-7th downtime. Module loads can now be directly executed through execution of csh scripting, and through PBS submissions, therefore I will not include temporary csh related fixes mentioned earlier in this issue in subsequent PR.

@hkershaw-brown
Copy link
Member

fixed by #636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Derecho issues related to running on NCAR's new supercomputer wrf Weather Research & Forecasting Model
Projects
None yet
Development

No branches or pull requests

2 participants