-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature #1019 USCRN #3049
base: develop
Are you sure you want to change the base?
Feature #1019 USCRN #3049
Conversation
…t I still need to make it work for the variety of USCRN inputs.
…input files. Need to complete support for other format types and handle the unit strings
…efore it's actually read so that an error in parsing the data will indicate which file caused it.
…g .csv files. Get rid of the unneeded Offsets vector. Add AllowEmptyColumns option to the DataLine class so that multiple delimiters in a row will be treated as separate columns. Since the default delim is whitespace, it makes sense that you'd want to parse multiple delims in a group. But for .csv files, each comma indicates a new column.
… including .csv files. This required updates to the DataLine and LineDataFile classes to parse the .csv data properly. Still need to enhance ascii2nc to write units
…s for all the other ascii file types as well.
…a list of all empty strings. This is used in ascii2nc to determine if observation units and descriptions should be written.
…oint observation descriptions. Previously, if units were present then descriptions (usually empty ones) were added. Now, units and descriptions and handled independently.
…he USCRN website.
…o make this work. Seems like we should ADD these numbers where needed rather than subtracting them everywhere else!
…icated the logic for ignoring the first line from csv files.
…les, just skip any lines where the station ID begins with 'WBAN'. That'll handle files being concatenated together and is simpler logic.
… USCRN files are used to the determine the specific format.
… value since it conflicts with the initialization. While the GHA compiler is fine with it, the SonarQube one is not. These changes should enble to SonarQube build to complete.
Needed some updates, but the SonarQube run is now working as it should. It did flag 41 "new" code smells and a total of 18,090 code smells overall, compared to 18,060 in the develop branch. So I'll do some work to reduce the overall number of code smells < 18,060. |
…place_back() which SonarQube prefers for efficiency.
…he overall number of them lower than what's in the develop branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of changes here, but they look good as far as I can tell. In addition to adding support for the uscrn obs, I see many improvements as suggested by SonarQube.
I will keep an eye out for differences in METplus use case output based on these changes, but as far as I know there are no existing use cases that use airnow data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation changes are good, along with the general idea of the PR. I reviewed the differences from GHA and confirmed they were as expected. Checked output from ascii2nc run using USCRN data and confirmed that both obs_unit and obs_desc are both present in the output netCDF file. I approve the PR.
This PR changes about 86 different source code files, however the vast majority of them are unrelated and modified only to drive down the overall number of SonarQube code smells. Here's a description of these modified files:
docs/Users_Guide/reformat_point.rst
has minorascii2nc
updates to the User's Guide.internal/test_unit/xml/unit_ascii2nc.xml
adds 1 new unit test for-format uscrn
.src/tools/other/ascii2nc/Makefile.am and .in
to compile newly added USCRN source code.src/tools/other/ascii2nc/ascii2nc.cc
to support new-format uscrn
option.src/tools/other/ascii2nc/file_handler.h and .cc
parent class to track observation unit and description strings.src/tools/other/ascii2nc/uscrn_handler.h and .cc
are NEW FILES which contain the meat of the work, defining how to read 7 different variations of this new data source!src/basic/vx_log/string_array.h and .cc
add a newStringArray::all_empty()
member function to check for an array of all empty strings which is used when deciding to write observation units and descriptions.src/basic/vx_util/data_line.h and .cc
adds newDataLine::AllowEmptyColumns
variable that used when parsing.csv
input files. Also remove unneededOffset
vector which is a holdover from prior to the switch to using STL.src/basic/vx_util/observation.h and .cc
andnc_points_obs_out.nc
add members to store observation unit and description strings, needed to enhanceascii2nc
to write them to the outputsrc/basic/vx_nc_obs/nc_obs_util.h and .cc
update logic for writing observation unit and description strings.src/libcode/vx_summary/summary_key.h and .cc
andsrc/libcode/vx_summary/summary_obs.h and .cc
update to store observation unit and description strings.string::push_back()
tostring::emplace_back()
, as recommended by SonarQube.protected
toprivate
, as recommended by SonarQube.Expected Differences
Do these changes introduce new tools, command line arguments, or configuration file options? [Yes]
If yes, please describe:
Adds support to
ascii2nc
for the new-format uscrn
option. There are actually 7 different flavors of USCRN data, and all 7 are supported based on the naming conventions of the filename prefix and suffix.Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [Yes]
If yes, please describe:
This is small and subtle change. Previously,
ascii2nc
had the ability to writeobs_units
to the output NetCDF file, but this is only enabled for-format airnow
data. When theobs_units
variable is written to the output, then anobs_desc
variable is also written for "descriptions". However,airnow
doesn't populate those strings. This PR tweaks that logic, so thatobs_units
andobs_desc
variables are only written if they are defined by non-empty strings. And that logic is handled independently for each. Note that-format airnow
will now only writeobs_units
while-format uscrn
will writeobs_units
andobs_desc
. And this is the source of the unit test diffs described below.Pull Request Testing
Describe testing already performed for these changes:
Ran manually several times on the command line and inspected the NetCDF output.
Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:
@j-opatz please review the doc updates, new unit test output, and consider doing more throughly using
seneca:/d1/projects/MET/MET_pull_requests/met-12.1.0/beta1/MET-feature_1019_USCRN
. Also confirm the observation unit and description changes appearing the output.@georgemccabe please focus on the actual code changes and consider impacts to any METplus Use Cases that use AirNow input data.
@anewman89 please validate the metadata added to
uscrn_handler.cc
(e.g. variable naming conventions, units, and descriptions) against thereadme.txt
files from the NCEI product subdirectories (https://www.ncei.noaa.gov/pub/data/uscrn/products/).Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]
I updated the ascii2nc chapter.
Do these changes include sufficient testing updates? [Yes]
I added a test to
unit_ascii2nc.xml
to demonstrate. It reads input for the Boulder location for all 7 supported data types and uses the-valid_beg/-valid_end
options to extract observations only for 20240801 at 00Z.Will this PR result in changes to the MET test suite? [Yes]
If yes, describe the new output and/or changes to the existing output:
Adds new
ascii2nc/USCRN_Boulder_20240801.nc
unit test output file.Modifies the following files by removing the
obs_desc
variable that only contains empty strings:airnow/HourlyAQObs_20220312.nc
airnow/HourlyData_20220312.nc
airnow/daily_data_v2_20220312.nc
Will this PR result in changes to existing METplus Use Cases? [Maybe]
If yes, create a new Update Truth METplus issue to describe them.
Differences will be flagged for any use cases that run ascii2nc with Airnow inputs.
Do these changes introduce new SonarQube findings? [Yes or No]
If yes, please describe:
While some SonarQube code smells in "new code" are flagged, changes for this PR reduce the overall number of code smells from 18,060 down to 18,037. I did review the "new code" ones and fixed all the (relatively) easy ones.
Please complete this pull request review by [Fri 1/17/25].
Pull Request Checklist
See the METplus Workflow for details.
Select: Reviewer(s) and Development issue
Select: Milestone as the version that will include these changes
Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases