Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TADA_CreateParamRef() and TADA_CreateParamUseRef() #555

Open
wants to merge 62 commits into
base: develop
Choose a base branch
from

Conversation

wokenny13
Copy link
Collaborator

First 2 reference files draft functions have been pushed through to test.

I am still working on the Mod 3 vignettes, however the R document contains a detailed explanation of what the two functions' goals are, and can be reviewed in the meantime while I continue to work on the Mod 3 vignettes.
 

  • Check to see if using argument input, excel = TRUE, creates the myfileRef spreadsheet in your downloads folder path.
  • Test out different argument inputs and test on other datasets.
  • Test out any warning/error messages from running the functions due to any invalid inputs to ensure no additional bugs.
  • Test out the general usability and user interface of the 2 functions.

First 2 reference files draft functions have been pushed through.
@wokenny13
Copy link
Collaborator Author

working on addressing some check issues that were found

Copy link
Collaborator

@hillarymarler hillarymarler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments about requested changes are in-line or discussed in our working session call.

if (n > 100) {
message(paste0("There are ", n, " unique TADA.ComparableDataIdentifier names in your TADA data frame.
This may result in slow runtime for TADA_CreateParamRef() if you are generating an excel spreadsheet.
Consider filtering your .data TADA dataframe to a smaller subset of TADA.ComparableDataIdentifier first."))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might belong in documentation instead of in this message, but maybe some suggestions on how to create smaller subset of TADA.ComparableDataIdentifier would be helpful for users. For example, suggest working through characteristicType groups when setting up these ref files for the first time.

#' with the corresponding unique list of TADA/WQP Characteristic Names/TADA.ComparableDataIdentifier.
#'
#' If an ATTAINS parameter name is not listed as a prior domain value for your org from prior
#' ATTAINS assessment cycles, users should consider contacting the ATTAINS team to add this to the domain list.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. The ATTAINS team can't add a parameter to an org. Only the org can do that by associating that parameter with a particular assessment unit/designated use. Any org can use any parameter in ATTAINS (even if it was requested by a different org). Users should only contact the ATTAINS team to add a parameter name if there is not an acceptable/appropriate parameter name in ATTAINS.

As far as ATTAINS is concerned, there is no issue with an org using any ATTAINS-accepted parameter for the first time. So the ATTAINS team does not need to do anything in this situation.

#' Otherwise, users can still proceed by overriding the data validation by value pasting.
#' Users will be warned in the ATTAINS.FlagParameterName column if they choose to include an
#' ATTAINS.ParameterName that was not named in prior ATTAINS assessment cycles as:
#' 'Suspect: parameter name is not found as a prior parameter name for this organization'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a way to flag differentiate between "This is an accepted ATTAINS parameter, that has not been previously used by this organization" and "This is not an accepted ATTAINS parameter, contact the ATTAINS team to inquire about adding it" as from an ATTAINS perspective these are two very different scenarios.

Is there still an option for the validation in Excel to allow ALL ATTAINS parameters, not just the ones from a specific org? This may be an important option for users, especially for orgs (tribes? territories?) that in the process of growing their assessment programs and may not yet have many assessments completed. In these cases, they might rather be able to access the whole possible list of parameters from ATTAINS rather than be limited to the handful that may have been listed as causes for a small number of assessments.

They may also find that newer parameter names may be more appropriate than ones that have been used in the past if their have been changes to assessment methodologies or more specific parameter names have been added (for example: specific aquatic plant species vs. more general plant parameter or a specific cyanotoxin vs. a more general algal parameter). So while I think using previous causes as a starting point for the parameter crosswalk is helpful and makes the process less overwhelming, we shouldn't make it too difficult for users to apply other ATTAINS parameter names to their crosswalks if they want or need to.

#'
#' If an ATTAINS parameter name is not listed as a prior domain value for your org from prior
#' ATTAINS assessment cycles, users should consider contacting the ATTAINS team to add this to the domain list.
#' Otherwise, users can still proceed by overriding the data validation by value pasting.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the only way to override the data validation is by pasting (not just typing in a different entry) could we provide a sheet in the Excel file that lists all the current ATTAINS parameter names (without filtering them by organization)?

#' if there is trouble locating the file. The file will be named "myfileRef.xlsx".
#' The excel spreadsheet will highlight the cells in which users should input information. Users may need to
#' insert additional rows to capture certain ATTAINS.ParameterName that correspond with multiple TADA.CharacteristicName.
#' Example: If your organization defines pH as "pH, High" and "pH, Low" that correspond to the TADA.CharacteristicName 'PH'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an example of the opposite scenario than initially described as this is multiple ATTAINS parameter names to one TADA.CharacteristicName

#' # After running TADA_CreateParamRef() users will provide this as a function input. First example is only for the org Utah
#' paramUseRef_UT <- TADA_CreateParamUseRef(Data_Nutrients_UT, paramRef = paramRef_UT3, org_names = c("Utah"), excel = FALSE)
#'
#' # Users can include the EPA304a standards by itself or compared to their org(s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this example to show both org/304a and just 304a - really useful

}

if (is.null(org_names)) {
print("No organization name provided, users should provide a list of ATTAINS domain organization state or tribal name that pertains to their dataframe. Attempting to pull in organization names found in the TADA data frame.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistencies in printed message format. List function name first or not?

}

if (sum(!is.na(paramRef$ATTAINS.ParameterName)) == 0) {
stop("No values were found in ATTAINS.ParameterName. Please ensure that you have inputted all field values of interest in the ATTAINS.ParameterName column generated from TADA_CreateParamRef() function")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State more explicitly that the function cannot continue without some ATTAINS.ParameterName inputs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might want to change this to a warning. I am considering the case in which a user is only interested in the EPA304a standards. In this scenario, there is not a need for a user to provide a crosswalk between ATTAINS.ParameterName and TADA.ComparableDataIdentifier as it will only leverage the EPA304aPollutantName.

}

if (sum(is.na(paramRef$ATTAINS.ParameterName)) > 1) {
print("NAs were found in ATTAINS.ParameterName. Please ensure that you have inputted all field values of interest in the ATTAINS.ParameterName column generated from TADA_CreateParamRef() function")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to NA could a "No corresponding ATTAINS.ParameterName" option be added to the ATTAINS.ParameterName drop-down list? Then in situations where a user has thoroughly reviewed the param list, they would not get this message if they had reviewed and provided some input for each entry in the ATTAINS.ParameterName field

}

if (sum(is.na(paramRef$EPA304A.PollutantName)) > 1 && org_names == "EPA304a") {
print("NAs were found in EPA304A.PollutantName. Please ensure that you have inputted all field values of interest in the EPA304A.PollutantName column generated from TADA_CreateParamRef() function if you are interested in using the 304a recommended standards")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do users input this? Or is it done automatically? or a combination? Might be worth clarifying in the message and documentation that not all 304a pollutant names can be automatically crosswalked, so user review/input is required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question. I had these lines included in past development to consider allowing users to also make edits to the EPA304a.PollutantName. But was unsure if we wanted to give this option to allow these values to be changed?

Making it clear that 'not all 304a pollutant names can be automatically crosswalked, so user review/input is required' will be nice to include if we want to proceed with allowing users to make further edits to the efforts we made with this crosswalk if an org feels one is more appropriate.

Added a tab for org_name filtered paramter and use names from ATTAINS

Modified return values of ATTAINS.FlagParameterName
@wokenny13
Copy link
Collaborator Author

@cristinamullin @hillarymarler There are still some edits and formatting I would like to make, but I think this is in a good spot to review now. I will be out of the office tomorrow but will be back on Friday the 20th.

#' TADA_CreateParamUseRef() as the basis for the pulling in prior ATTAINS
#' parameter names and use name by organization name. This helps to filter
#' selections of drop down values and creating parameter and use combination
#' summaries that will need to be defined.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could remove "that will need to be defined"

#' ATTAINS Parameter and Use Name by Organization Reference Key
#'
#' Function downloads and returns the newest available ATTAINS domain values
#' reference dataframe summarized by parameter, use and organization.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add clarifying language about parameters (listed as cause by org in previous assessments)

#' TADA_CreateParamUseRef() as the basis for the pulling in EPA304a recommended
#' pollutant name and use_name for assessment under the CWA.
#'
#' (Currently only numeric priority characteristic in TADA are the focus.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add info on where to find list of TADA priority chars?

wokenny13 and others added 30 commits January 2, 2025 15:22
Mod 3 notes, additional comments, and a minor fix in removing duplicate rows for dataframe output in CreateParamRef()
CreateAUIDRef() is in development and was not meant to be included in the prior commit
- update WQXCharacteristicRef
- Generate new example data Data_WV_Mod1_Output for Mod 3 vignette (Mod 1 output)
- Update TADA_AutoFilter (needs more work)
- Fix documentation issues
- Switch default to clean = FALSE for method flag function
-Update threshold functions (incomplete). Still need to add new flags once WQX QAQC domain table issues are addressed
- add way to standardize pH units to bottom of TADA_ConvertResultUnits
- remove old notes from autoclean
- Update TADA_RunKeyFlagFunctions (needs more work)
- add new test for pH unit harmonization to test-Utilities.R
- update mod 1 and 3 vignette
CST update to include unit name
Incorporated the TADA_CreateAURef() function. This is in active development.

Users will be responsible for determining what methods they want to map ML to AU. Whether this is from TADA_GetATTAINS() or their own methods.

Users will also not need to provide this crosswalk as an option, if they are only interested in summary by monitoring location sites level.
The next pr will focus on the AU crosswork and Defining Magnitude Summary functions.

This branch and new updates reflect some modification to mod 3 vignette workflow and some edits to TADA_CreateParamUseRef()
flags MOLE/L and ph
Modified return value of a Flag output in CreateParamRef()

print out datatable output in mod 3 vignette for working group
This button when clicked allows you to download dataframe as csv, xslx or pdf.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants