Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mesoscale mpmd g2g #670

Merged

Conversation

PerryShafran-NOAA
Copy link
Contributor

Note to developers: You must use this PR template!

Description of Changes

Please include a summary of the changes and the related GitHub issue(s). Please also include relevant motivation and context.

Relating to issue #557 :

This PR completes the fixing of the Bugzilla 1547, ensuring that scripts that use MPMD have child processes where the work is done in. This particular script adds the child processes for NAM/RAP mesoscale precip and snowfall stats. The rest of the mesoscale component, stats and plots, was completed earlier in PR #619 .

Developer Questions and Checklist

  • Is this a high priority PR? If so, why and is there a date it needs to be merged by?

Yes, it is a bugzilla fix.

  • Do you have any planned upcoming annual leave/PTO?
    No
  • Are there any changes needed in the times when the jobs are supposed to run/kick-off?
    No
  • [x ] The code changes follow NCO's EE2 Standards.
  • [x ] Developer's name is removed throughout the code and have used ${USER} where necessary throughout the code.
  • [x ] References the feature branch for HOMEevs are removed from the code.
  • [x ] J-Job environment variables, COMIN and COMOUT directories, and output follow what has been defined for EVS.
  • [x ] Jobs over 15 minutes in runtime have restart capability.
  • [x ] If applicable, changes in the dev/drivers/scripts or dev/modulefiles have been made in the corresponding ecf/scripts and ecf/defs/evs-nco.def?
  • [x ] Jobs contain the appropriate file checking and don't run METplus for any missing data.
  • [ x] Code is using METplus wrappers structure and not calling MET executables directly.
  • [x ] Log is free of any ERRORs or WARNINGs.

Testing Instructions

Please include testing instructions for the PR assignee. Include all relevant input datasets needed to run the tests.

  1. Clone my fork https://github.com/PerryShafran-NOAA/EVS.git
  2. Checkout the branch feature/mesoscale_mpmd_g2g
  3. Link to the fix directory.
  4. cd to directory dev/drivers/scripts/stats/mesoscale.
  5. In the precip and snowfall scripts for NAM and RAP, set the working directory and also set COMIN to the emc.vpppg directory.
  6. The precip scripts are run hourly. Please run jevs_mesoscale_nam_precip_stats.sh and jevs_mesoscale_rap_precip_stats.sh using qsub -v vhr=${vhr}, where ${vhr}=1 to 22. When all 22 scripts are completed, then run using qsub -v vhr=23 to do the gather step.
  7. The snowfall scripts are run every 6 hours. Please run jevs_mesoscale_nam_snowfall_stats.sh and jevs_mesoscale_rap_snowfall_stats.sh using qsub -v vhr=${vhr}, where ${vhr}=00, 06, 12. When these three runs are completed, then run using qsub -v vhr=18 to do the gather step.

@malloryprow
Copy link
Contributor

Testing just jevs_mesoscale_nam_precip_stats.sh with vhr=01 to make sure everything looks good with the MPMD.

1. jevs_mesoscale_nam_precip_stats.sh (vhr=01)

Log File: /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/EVS/dev/drivers/scripts/stats/mesoscale/jevs_mesoscale_nam_precip_stats_00.o181623068
DATA: /lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_nam_precip_stats_00.181623068.cbqs01
COMOUT: /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/evs/v2.0/stats/mesoscale

It looks like the code needs to be updated with the change of the CCPA file output path to OBS_PCP_COMBINE_OUTPUT_DIR. The source in a cp command is where it was writing to previously before the last commit.

7 + cp -v /lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_nam_precip_stats_00.181623068.cbqs01/atmos.20250223/nam/precip/pcp_combine_ccpa_accum01hr_valid2025022301.nc /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/evs/v2.0/stats/mesoscale/atmos.20250223/nam/precip/pcp_combine_ccpa_accum01hr_valid2025022301.nc
cp: cannot stat '/lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_nam_precip_stats_00.181623068.cbqs01/atmos.20250223/nam/precip/pcp_combine_ccpa_accum01hr_valid2025022301.nc': No such file or directory
CFP RANK 8 CFP TASK NUMBER: 0009 FAILED. USER COMMAND: /lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_nam_precip_stats_00.181623068.cbqs01/jobs/assemble_data/job9

@PerryShafran-NOAA
Copy link
Contributor Author

The ccpa observation is now linked to the job_num_work_dir and then copied to the COMOUT directory like it should be. You can now test again.

@malloryprow
Copy link
Contributor

I have manually submitted the NAM and RAP precip jobs for vhr=00-12, and a cronjob is set up in place to continue running the rest of the vhrs for today. I can't run for VDATE=PDYm3 because the MRMS data isn't around for that long. I ran into that problem yesterday.

@PerryShafran-NOAA
Copy link
Contributor Author

That sounds good, cron is the best way to get all the hours in there. Then I can look at the results tomorrow.

@PerryShafran-NOAA
Copy link
Contributor Author

@malloryprow Can you send me the path to where the .o files are located, so I can at least follow along there to check for errors? And the path to the working directories so I can check to see that the job subdirectories are working as intended?

@malloryprow
Copy link
Contributor

Yes the log files are in /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/EVS/dev/drivers/scripts/stats/mesoscale.

I ran the snowfall jobs too for vhr=00,06,12 but waiting until 18Z to do vhr=18.

@PerryShafran-NOAA
Copy link
Contributor Author

@malloryprow Everything looks good from what I've seen thus far.

@malloryprow
Copy link
Contributor

COMOUT is /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/evs/v2.0/stats/mesoscale.

1. jevs_mesoscale_nam_snowfall_stats.sh

Log Files: /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/EVS/dev/drivers/scripts/stats/mesoscale/jevs_mesoscale_nam_snowfall_stats_00.o*
DATA: /lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_nam_snowfall_stats_00.*

2. jevs_mesoscale_rap_snowfall_stats.sh

Log Files: /lfs/h2/emc/vpppg/noscrub/mallory.row/verification/EVS_PRs/pr670/EVS/dev/drivers/scripts/stats/mesoscale/jevs_mesoscale_rap_snowfall_stats_00.o*
DATA: /lfs/h2/emc/stmp/mallory.row/evs_test/prod/tmp/jevs_mesoscale_rap_snowfall_stats_00.*

@PerryShafran-NOAA
Copy link
Contributor Author

Snowfall is good! Final stats files match what is in emc.vpppg. All the individual job directories are being used. I think we're good with the snowfall.

@malloryprow
Copy link
Contributor

Agreed! I just got done looking through everything myself.

@PerryShafran-NOAA
Copy link
Contributor Author

@malloryprow I noticed that the gather job failed, and it was due to a missing curly bracket in the StatAnalysis config file. I corrected that and made the correction. Please re-run the vhr=23 job for VDATE=20250224, precip jobs.

@malloryprow
Copy link
Contributor

Rerun!

@PerryShafran-NOAA
Copy link
Contributor Author

Ah, excellent! Precip is good. The precip stats files match what is in emc.vpppg. The job subdirectories look good too as well.

@malloryprow
Copy link
Contributor

Yup, looked good to me too!

Copy link
Contributor

@malloryprow malloryprow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes are good and testing successful.

Thank you @PerryShafran-NOAA!

@PerryShafran-NOAA
Copy link
Contributor Author

@AndrewBenjamin-NOAA Would you please review the code so we can get this PR merged?

Copy link

@AndrewBenjamin-NOAA AndrewBenjamin-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed the changes and approve this PR

@malloryprow malloryprow merged commit 565f641 into NOAA-EMC:develop Feb 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mesoscale - det: Address Bugzilla 1547 - MPMD processes share the same working directory
3 participants