-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
221 lines (171 loc) · 11.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# imdccal
<!-- badges: start -->
<!-- badges: end -->
The goal of imdccal is to extract data from [CCAL](https://ccal.oregonstate.edu/) water chemistry lab deliverables, convert it to a machine readable format, and begin processing it into the EQuIS EDD format.
## Installation
You can install the development version of imdccal from [its GitHub repository](https://github.com/nationalparkservice/imd-ccal) with:
``` r
# install.packages("remotes")
remotes::install_github("nationalparkservice/imd-ccal")
```
## Example Data
To demonstrate the functionality of imdccal, we created two fictitious CCAL deliverables that are downloaded with the package.
The story behind the data is that it is 2099 and the NPS now has an Outer Space Network with monitoring locations on the Moon, Mars, Mercury, and Saturn.
Users can find the file paths to the example data by running the following code chunk.
Since the example data is installed with the package, you can run any of the code in this ReadMe on your own computer.
```{r, eval = FALSE}
use_example_data(file_names = use_example_data())
```
## Example: Creating Machine-Readable CCAL Data
Read and tidy the data and work with it in R, without writing the data to any files.
```{r}
# Load package
library(imdccal)
# Create tidied CCAL data from demo data stored in the imdccal package
tidy_ccal <- getCCALData(use_example_data(file_names = "SPAC_080199.xlsx"))
data <- tidy_ccal$`SPAC_080199.xlsx`$data # Get the data for a single set of lab results
meta <- tidy_ccal$`SPAC_080199.xlsx`$metadata # Get the metadata for the same set of results
```
Write the tidied data that is currently stored in the R environment to file.
Note that destination_folder must already exist.
```{r, eval = FALSE}
# Write data stored in environment to file
write_data(all_data = tidy_ccal,
format = "xlsx", # alternatively, "csv"
destination_folder = "ccal_tidy", # must already exist
overwrite = TRUE,
suffix = "_tidy",
num_tables = 4)
```
Instead of the two step process of creating the tidied R object and writing it to file, users may use the `machineReadableCCAL()` function to create and write the tidied data to file in one step.
Here is an example of reading data from a single file of CCAL lab data and writing the machine-readable version to an Excel file or set of CSV files.
```{r, eval=FALSE}
# Write tidied CCAL data to file from demo data stored in imdccal package
machineReadableCCAL(use_example_data(file_names = "SPAC_080199.xlsx"),
destination_folder = "ccal_tidy") # Write tidied data to a new .xlsx
machineReadableCCAL(use_example_data(file_names = "SPAC_081599.xlsx"),
format = "csv", destination_folder = "ccal_tidy") # Write tidied data to a folder of CSV files
```
All of these functions also work when supplied with a vector of file paths to multiple CCAL deliverables.
Here is an example of reading data from *multiple* files of CCAL lab data and writing the machine-readable version to an Excel files or sets of CSV files.
```{r, eval=FALSE}
# Get file paths
all_files <- use_example_data(file_names = use_example_data())
# Write to xlsx
machineReadableCCAL(all_files, destination_folder = "ccal_tidy") # Write one file of tidied data per input file
# Write to csv
machineReadableCCAL(all_files, format = "csv", destination_folder = "ccal_tidy") # Write one folder of tidied CSV data per input file
```
By default, these functions create separate tables and files for each CCAL deliverable supplied as input.
To concatenate the results together, set the `concat` argument in `getCCALData()` or `machineReadableCCAL()` to TRUE.
```{r, eval = FALSE}
machineReadableCCAL(all_files, destination_folder = "ccal_tidy", concat = TRUE)
```
## Example: Creating the Results Table of the EQuIS EDD
In addition to converting CCAL lab deliverables to a machine readable format, imdccal also provides functionality to begin processing the machine readable data into the EQuIS EDD format.
Specifically, the `format_results()` function begins to format the data into the Results table for the EDD.
**Uses:**
- Process data into one of the tables in the format accepted by EQuIS
- Censor values less than or equal to the MDL
- Raise the J-R flag for observations greater than the MDL but less than or equal to the LQL
**Limitations:**
- Users still need to create Activities table for every deliverable and the Projects and Locations tables when they require edits.
- Within the Results table, users still need to define Activity_ID, modify Result_Status (set to Pre-Cert since additional QC is needed), add flags, and conduct the rest of their own QC processes
Here we demonstrate the useage of this function.
Read the data, create the Results table of the EDD, and work with it in R without writing the data to any files:
```{r}
# Create results table from demo data stored in the imdccal package
results_incomplete <- format_results(use_example_data(file_names = "SPAC_080199.xlsx"))
```
If you inspect the table created above, you will notice that there are so many columns without data that the result is not useful.
That's because our code relies on a table containing detection limits for each analyte.
To accommodate changes to detection limits over time, the table includes StartDate and EndDate columns.
The issue we have above is that none of the limits are applicable after December 31, 2024, and our example data is from 2099.
By default, `format_results()` uses the version of the table that is stored within the package.
However, users may provide their own version of the table as an argument in the function if that better suits their needs.
This may occur, for example, if detection limits change or if a user needs to include an analyte that is not yet in the table.
To provide a different version of the table, start with the version stored in the package, and add rows or edit columns as you see fit.
It may be helpful to review the documentation of the limits table (run `?imdccal::limits`) as you make your edits.
For example, we edit the EndDate for all relevant rows to December 31, 2099 in the code below.
```{r}
# Edit limits table
limits <- imdccal::limits |>
dplyr::mutate(EndDate = dplyr::if_else(EndDate == "2024-12-31", lubridate::ymd("2099-12-31"), EndDate))
```
Users may also save the table to a csv or xlsx file and make changes directly to a spreadsheet.
To use one's updated table in `format_results()`, simply supply the function with the modified table (as an R object) as the limits argument.
We do so with our example data below.
```{r}
# Create results table from demo data stored in the imdccal package
results_complete <- format_results(use_example_data(file_names = "SPAC_080199.xlsx"),
limits = limits)
```
Users may also provide their own version of the qualifiers table, which contains the lookup_code and remark columns from [NPS_EQuIS_WQX_Reference_Values](https://doimspp.sharepoint.com/:x:/r/sites/nps-nrss-wrdiv/_layouts/15/Doc.aspx?sourcedoc=%7B897FC8B3-2F68-4353-BB79-C0FEE9C45991%7D&file=NPS_EQuIS_WQX_Reference_Values.xlsx&action=default&mobileredirect=true) after filtering for rows where the #lookup_type is "Result_Qualifier".
In this package the only flag we raise is "J-R", so unless this flag's description changes, the user has no need to use this argument.
However, it is worth familiarizing oneself with the table as it is useful for further data processing.
Run `?imdccal::qualifiers` to see the table's documentation.
In addition, users may benefit from reading through the [NPS EQuIS Resources website](https://doimspp.sharepoint.com/sites/nps-nrss-wrdiv/SitePages/DMEQuIS.aspx?xsdata=MDV8MDJ8fDE1NTcyM2EwNmEyODQ3NThhZDM5MDhkYzUzNDVjZjUwfDA2OTNiNWJhNGIxODRkN2I5MzQxZjMyZjQwMGE1NDk0fDB8MHw2Mzg0NzY4MDY0NzU3ODc1Nzd8VW5rbm93bnxWR1ZoYlhOVFpXTjFjbWwwZVZObGNuWnBZMlY4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazkwYUdWeUlpd2lWMVFpT2pFeGZRPT18MXxMMk5vWVhSekx6RTVPbTFsWlhScGJtZGZUbXBuTVU5RWF6UlBWR2QwV21wck5FMXBNREJPZWtreVRGUm9iRTFxWjNSYVZFWnBUMFJWTTAxNmJHaFBSRUpxUUhSb2NtVmhaQzUyTWk5dFpYTnpZV2RsY3k4eE56RXlNRGd6T0RRMk1qVXd8MDU3NTdhODQyYWI2NDI5MGFkMzkwOGRjNTM0NWNmNTB8Yjg5ZDU1YmI3ZjVhNDNhNmJlODQwMzU5NDgyNTM1ZmM%3D&sdata=TjVsUVVHQS84VDQ2NUlRZmNzMlE3R2hRRWI0MDdlaVRIL0hkNnAxMmRSQT0%3D&ovuser=0693b5ba-4b18-4d7b-9341-f32f400a5494%2Cavolk%40nps.gov&OR=Teams-HL&CT=1719255259520&clickparams=eyJBcHBOYW1lIjoiVGVhbXMtRGVza3RvcCIsIkFwcFZlcnNpb24iOiI0OS8yNDA1MTYyMjIyMyIsIkhhc0ZlZGVyYXRlZFVzZXIiOmZhbHNlfQ%3D%3D), which provides comprehensive guidelines on formatting data for EQuIS.
Like with the machine readable data, we can write the results table from our R environment to file with the `write_data()` function.
```{r, eval = FALSE}
# Write data stored in environment to file
write_data(all_data = results_complete,
format = "xlsx", # alternatively, "csv"
destination_folder = "ccal_tidy", # must already exist
overwrite = TRUE,
suffix = "_edd_results",
num_tables = 1)
```
Also in the same manner as the machine readable data, instead of a two step process of creating the results table R object and writing it to file, users may use the `write_results()` function to create and write the results table to file in one step.
Here is an example of reading data from a single file of CCAL lab data and writing the results table to an Excel or CSV file.
```{r, eval=FALSE}
# Write results table to xlsx from demo data stored in imdccal package
write_results(files = use_example_data(file_names = "SPAC_080199.xlsx"),
limits = limits,
format = "xlsx",
destination_folder = "ccal_tidy",
overwrite = TRUE)
# Write results table to csv from demo data stored in imdccal package
write_results(files = use_example_data(file_names = "SPAC_081599.xlsx"),
limits = limits,
format = "csv",
destination_folder = "ccal_tidy",
overwrite = TRUE)
```
As before, all of these functions work when supplied with a vector of file paths to multiple CCAL deliverables.
Here is an example of reading data from *multiple* files of CCAL lab data and writing the results tables to Excel or CSV files.
```{r, eval=FALSE}
# Get file paths
all_files <- use_example_data(file_names = use_example_data())
# Write to xlsx
write_results(files = all_files,
limits = limits,
destination_folder = "ccal_tidy",
overwrite = TRUE)
# Write to csv
write_results(files = all_files,
limits = limits,
format = "csv",
destination_folder = "ccal_tidy",
overwrite = TRUE)
```
By default, these functions create separate tables and files for each CCAL deliverable supplied as input.
To concatenate the results together, set the `concat` argument in `format_results()` or `write_results()` to TRUE.
```{r, eval = FALSE}
write_results(files = all_files,
limits = limits,
destination_folder = "ccal_tidy",
overwrite = TRUE,
concat = TRUE)
```