Skip to content

Commit

Permalink
Merge pull request nationalparkservice#135 from RobLBaker/main
Browse files Browse the repository at this point in the history
expand test_missing_data acceptable missing data codes
  • Loading branch information
RobLBaker authored Feb 6, 2024
2 parents 984cc6e + 7ea5143 commit 6c294ec
Show file tree
Hide file tree
Showing 10 changed files with 70 additions and 138 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

2024-02-05
* Fix bug in `test_date_range()` that was adding UTC to temporalCoverage
* `test_missing_data()` now also handles the missing data codes "blank" and "empty".
* Update `test_missing_data()` to default to flag whole files, not each column that has undocumented missing data. This condenses the error output when running `run_congruence_checks()`. When trouble shooting and attempting to pinpoint data that lack missing values, `test_missing_data()` can be run with the parameter detail_level = "columns".
2024-01-26
* Bugfixes for `test_dates_parse()` and `test_date_range()`: now ignore files that have times but no dates or date times.
Expand Down
10 changes: 6 additions & 4 deletions R/tabular_data_congruence.R
Original file line number Diff line number Diff line change
Expand Up @@ -591,11 +591,11 @@ test_fields_match <- function(directory = here::here(), metadata = load_metadata

#' Looks for undocumented missing data (NAs)
#'
#' @description `test_missing_data` scans the data package for common missing data (blanks). If there are no blanks or if missing data coded as NA is documented as missing data in the metadata, the test passes. If missing data (blanks or NA) are found but not documented in the metadata the test fails with an error.
#' @description `test_missing_data` scans the data package for common missing data (blanks/empty cells or NA in a cell). If there are no blanks or NAs, the test passes. If missing data are found and properly documented (missingValueCode is either "NA", "empty", or "blank"), the test passes. If any missing data is detected but not properly documented in the metadata, the test fails with an error.
#'
#' Commonly, R will interpret blank cells as missing and fill in NA. To pass this test, you will need to either delete columns with missing data (if they are completely blank) or add NA as a missing data code during metadata creation.
#' Commonly, R will interpret blank cells as missing and fill in NA. To pass this test, you will need to either delete columns or tables with missing data (if they are completely blank), or add the appropriate as a missing data code during metadata creation (in the corresponding attributes.txt file).
#'
#' This is a fairly simple test and ONLY checks for NA. Although there are many common missing data codes (-99999, "Missing", "NaN" etc) we cannot anticipate all of them.
#' This is a fairly simple test and ONLY checks for NA and blanks. Although there are many common missing data codes (-99999, "Missing", "NaN" etc) we cannot anticipate all of them.
#'
#' When running `test_missing_data()` via `run_congruence_checks()`, the default for "detail_level" will be used and only file-level information about undocumented missing values will be reported to condense the error message output. When attempting to identify specifically which data have undocumented missing values, it may be helpful to run `test_missing_data()` with the parameter "detail_level" set to "columns". This will output a list of all columns within each file with undocumented missing data.
#'
Expand Down Expand Up @@ -638,6 +638,8 @@ test_missing_data <- function(directory = here::here(),

#load files and test for NAs
error_log <- NULL
#acceptable missing data codes if NA (or blank) cells found:
missing_types <- c("NA", "blank", "empty")
for (i in seq_along(data_files)) {
#load each file
dat <- suppressMessages(readr::read_csv(paste0(directory,
Expand All @@ -649,7 +651,7 @@ test_missing_data <- function(directory = here::here(),
#look for NAs; if NAs found, look for correct missing data codes
if (sum(is.na(dat[,j])) > 0) {
missing <- data_tbl[[i]][["attributeList"]][["attribute"]][[j]][["missingValueCode"]][["code"]]
if(is.null(missing) || ("NA" != missing)) {
if(is.null(missing) || sum(missing != missing_types) < 1) {
#file level error message output:
if (detail_level == "files") {
error_log <- append(error_log,
Expand Down
2 changes: 1 addition & 1 deletion docs/news/index.html

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion docs/pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ pkgdown: 2.0.7
pkgdown_sha: ~
articles:
DPchecker: DPchecker.html
last_built: 2024-02-06T18:38Z
last_built: 2024-02-06T19:16Z

Loading

0 comments on commit 6c294ec

Please sign in to comment.