Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #1019 USCRN #3049

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open

Feature #1019 USCRN #3049

wants to merge 25 commits into from

Conversation

JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway commented Jan 13, 2025

This PR changes about 86 different source code files, however the vast majority of them are unrelated and modified only to drive down the overall number of SonarQube code smells. Here's a description of these modified files:

  1. Documentation (1):
  • docs/Users_Guide/reformat_point.rst has minor ascii2nc updates to the User's Guide.
  1. Testing (1):
  • internal/test_unit/xml/unit_ascii2nc.xml adds 1 new unit test for -format uscrn.
  1. Application Code (7):
  • src/tools/other/ascii2nc/Makefile.am and .in to compile newly added USCRN source code.
  • src/tools/other/ascii2nc/ascii2nc.cc to support new -format uscrn option.
  • src/tools/other/ascii2nc/file_handler.h and .cc parent class to track observation unit and description strings.
  • src/tools/other/ascii2nc/uscrn_handler.h and .cc are NEW FILES which contain the meat of the work, defining how to read 7 different variations of this new data source!
  1. Library Code (13):
  • src/basic/vx_log/string_array.h and .cc add a new StringArray::all_empty() member function to check for an array of all empty strings which is used when deciding to write observation units and descriptions.
  • src/basic/vx_util/data_line.h and .cc adds new DataLine::AllowEmptyColumns variable that used when parsing .csv input files. Also remove unneeded Offset vector which is a holdover from prior to the switch to using STL.
  • src/basic/vx_util/observation.h and .cc and nc_points_obs_out.nc add members to store observation unit and description strings, needed to enhance ascii2nc to write them to the output
  • src/basic/vx_nc_obs/nc_obs_util.h and .cc update logic for writing observation unit and description strings.
  • src/libcode/vx_summary/summary_key.h and .cc and src/libcode/vx_summary/summary_obs.h and .cc update to store observation unit and description strings.
  1. Everything else (86 - 22 = 64):
  • Minor changes from using string::push_back() to string::emplace_back(), as recommended by SonarQube.
  • Changes of some class members from protected to private, as recommended by SonarQube.

Expected Differences

  • Do these changes introduce new tools, command line arguments, or configuration file options? [Yes]

    If yes, please describe:

  • Adds support to ascii2nc for the new -format uscrn option. There are actually 7 different flavors of USCRN data, and all 7 are supported based on the naming conventions of the filename prefix and suffix.

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [Yes]

    If yes, please describe:

  • This is small and subtle change. Previously, ascii2nc had the ability to write obs_units to the output NetCDF file, but this is only enabled for -format airnow data. When the obs_units variable is written to the output, then an obs_desc variable is also written for "descriptions". However, airnow doesn't populate those strings. This PR tweaks that logic, so that obs_units and obs_desc variables are only written if they are defined by non-empty strings. And that logic is handled independently for each. Note that -format airnow will now only write obs_units while -format uscrn will write obs_units and obs_desc. And this is the source of the unit test diffs described below.

Pull Request Testing

  • Describe testing already performed for these changes:

  • Ran manually several times on the command line and inspected the NetCDF output.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

  • @j-opatz please review the doc updates, new unit test output, and consider doing more throughly using seneca:/d1/projects/MET/MET_pull_requests/met-12.1.0/beta1/MET-feature_1019_USCRN. Also confirm the observation unit and description changes appearing the output.

  • @georgemccabe please focus on the actual code changes and consider impacts to any METplus Use Cases that use AirNow input data.

  • @anewman89 please validate the metadata added to uscrn_handler.cc (e.g. variable naming conventions, units, and descriptions) against the readme.txt files from the NCEI product subdirectories (https://www.ncei.noaa.gov/pub/data/uscrn/products/).

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]
    I updated the ascii2nc chapter.

  • Do these changes include sufficient testing updates? [Yes]
    I added a test to unit_ascii2nc.xml to demonstrate. It reads input for the Boulder location for all 7 supported data types and uses the -valid_beg/-valid_end options to extract observations only for 20240801 at 00Z.

  • Will this PR result in changes to the MET test suite? [Yes]

    If yes, describe the new output and/or changes to the existing output:

  • Adds new ascii2nc/USCRN_Boulder_20240801.nc unit test output file.

  • Modifies the following files by removing the obs_desc variable that only contains empty strings:

    • airnow/HourlyAQObs_20220312.nc
    • airnow/HourlyData_20220312.nc
    • airnow/daily_data_v2_20220312.nc
  • Will this PR result in changes to existing METplus Use Cases? [Maybe]

    If yes, create a new Update Truth METplus issue to describe them.

  • Differences will be flagged for any use cases that run ascii2nc with Airnow inputs.

  • Do these changes introduce new SonarQube findings? [Yes or No]

    If yes, please describe:

  • While some SonarQube code smells in "new code" are flagged, changes for this PR reduce the overall number of code smells from 18,060 down to 18,037. I did review the "new code" ones and fixed all the (relatively) easy ones.

  • Please complete this pull request review by [Fri 1/17/25].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the version that will include these changes
    Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

…t I still need to make it work for the variety of USCRN inputs.
…input files. Need to complete support for other format types and handle the unit strings
…efore it's actually read so that an error in parsing the data will indicate which file caused it.
…g .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.
… including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units
…s for all the other ascii file types as well.
…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.
…oint observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.
…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!
…icated the logic for ignoring the first line from csv files.
…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.
@JohnHalleyGotway JohnHalleyGotway added this to the MET-12.1.0 milestone Jan 13, 2025
@JohnHalleyGotway JohnHalleyGotway linked an issue Jan 13, 2025 that may be closed by this pull request
… USCRN files are used to the determine the specific format.
@JohnHalleyGotway JohnHalleyGotway added the pull request: MODIFIES OUTPUT Changes that add new or modify existing output formats label Jan 14, 2025
… value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete.
@JohnHalleyGotway
Copy link
Collaborator Author

Needed some updates, but the SonarQube run is now working as it should. It did flag 41 "new" code smells and a total of 18,090 code smells overall, compared to 18,060 in the develop branch. So I'll do some work to reduce the overall number of code smells < 18,060.

@JohnHalleyGotway JohnHalleyGotway marked this pull request as ready for review January 15, 2025 15:59
Copy link
Collaborator

@georgemccabe georgemccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of changes here, but they look good as far as I can tell. In addition to adding support for the uscrn obs, I see many improvements as suggested by SonarQube.

I will keep an eye out for differences in METplus use case output based on these changes, but as far as I know there are no existing use cases that use airnow data.

Copy link
Contributor

@j-opatz j-opatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation changes are good, along with the general idea of the PR. I reviewed the differences from GHA and confirmed they were as expected. Checked output from ascii2nc run using USCRN data and confirmed that both obs_unit and obs_desc are both present in the output netCDF file. I approve the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pull request: MODIFIES OUTPUT Changes that add new or modify existing output formats
Projects
Status: 🔎 In review
Development

Successfully merging this pull request may close these issues.

Enhance ASCII2NC to read USCRN point observations.
3 participants