Skip to content

Commit

Permalink
Merge pull request #424 from stitam/issue422
Browse files Browse the repository at this point in the history
Update function URLs to fix a number of tests (SRS, ChemSpider)
  • Loading branch information
stitam authored Dec 23, 2024
2 parents 308054c + 8449b52 commit 2e875ec
Show file tree
Hide file tree
Showing 21 changed files with 444 additions and 105 deletions.
4 changes: 2 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ Suggests:
rmarkdown,
plot.matrix,
usethis,
vcr
RoxygenNote: 7.2.3
vcr (>= 0.6.0)
RoxygenNote: 7.3.2
VignetteBuilder: knitr
Config/testthat/edition: 3
Config/testthat/parallel: true
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@
## BUG FIXES

* `pc_prop()` returned `NA` without much further explanation if any of the queries were not positive integers. The updated function attempts to coerce queries to positive integers, only progresses valid queries, and prints informative messages along the way if verbose messages are enabled.
* `srs_query()` broke because the URL was no longer working. We have updated the URL.
* `is.inchikey(type = "chemspider")` broke because the URL was no longer working. We have updated the URL but the function now requires an API key like all other ChemSpider functions.

# webchem 1.3.0

Expand Down
20 changes: 20 additions & 0 deletions R/jagst.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#' Organic plant protection products in the river Jagst / Germany in 2013
#'
#' This dataset comprises environmental monitoring data of organic plant protection products
#' in the year 2013 in the river Jagst, Germany.
#' The data is publicly available and can be retrieved from the
#' LUBW Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg.
#' It has been preprocessed and comprises measurements of 34 substances.
#' Substances without detects have been removed.
#' on 13 sampling occasions.
#' Values are given in ug/L.
#'
#' @format A data frame with 442 rows and 4 variables:
#' \describe{
#' \item{date}{sampling data}
#' \item{substance}{substance names}
#' \item{value}{concentration in ug/L}
#' \item{qual}{qualifier, indicating values < LOQ}
#' }
#' @source \url{https://udo.lubw.baden-wuerttemberg.de/public/pages/home/index.xhtml}
"jagst"
15 changes: 15 additions & 0 deletions R/lc50.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#' Acute toxicity data from U.S. EPA ECOTOX
#'
#' This dataset comprises acute ecotoxicity data of 124 insecticides.
#' The data is publicly available and can be retrieved from the EPA ECOTOX database
#' (\url{https://cfpub.epa.gov/ecotox/})
#' It comprises acute toxicity data (D. magna, 48h, Laboratory, 48h) and has been
#' preprocessed (remove non-insecticides, aggregate multiple value, keep only numeric data etc).
#'
#' @format A data frame with 124 rows and 2 variables:
#' \describe{
#' \item{cas}{CAS registry number}
#' \item{value}{LC50value}
#' }
#' @source \url{https://cfpub.epa.gov/ecotox/}
"lc50"
20 changes: 10 additions & 10 deletions R/pubchem.R
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@
#' \code{<xref>}, \code{"sourceid/<source id>"} or \code{"sourceall"}.}
#' \item{\code{assay}: \code{"aid"}, \code{<assay target>}.}
#' }
#' @details <structure search> is assembled as "{\code{substructure} |
#' \code{superstructure} | \code{similarity} | \code{identity}} / {\code{smiles}
#' | \code{inchi} | \code{sdf} | \code{cid}}", e.g.
#' @details <structure search> is assembled as "(\code{substructure} |
#' \code{superstructure} | \code{similarity} | \code{identity}) / (\code{smiles}
#' | \code{inchi} | \code{sdf} | \code{cid})", e.g.
#' \code{from = "substructure/smiles"}.
#' @details \code{<xref>} is assembled as "\code{xref}/\{\code{RegistryID} |
#' @details \code{<xref>} is assembled as "\code{xref}/(\code{RegistryID} |
#' \code{RN} | \code{PubMedID} | \code{MMDBID} | \code{ProteinGI},
#' \code{NucleotideGI} | \code{TaxonomyID} | \code{MIMID} | \code{GeneID} |
#' \code{ProbeID} | \code{PatentID}\}", e.g. \code{from = "xref/RN"} will query
#' \code{ProbeID} | \code{PatentID})", e.g. \code{from = "xref/RN"} will query
#' by CAS RN.
#' @details <fast search> is either \code{fastformula} or it is assembled as
#' "{\code{fastidentity} | \code{fastsimilarity_2d} | \code{fastsimilarity_3d} |
#' \code{fastsubstructure} | \code{fastsuperstructure}}/{\code{smiles} |
#' \code{smarts} | \code{inchi} | \code{sdf} | \code{cid}}", e.g.
#' "(\code{fastidentity} | \code{fastsimilarity_2d} | \code{fastsimilarity_3d} |
#' \code{fastsubstructure} | \code{fastsuperstructure})/(\code{smiles} |
#' \code{smarts} | \code{inchi} | \code{sdf} | \code{cid})", e.g.
#' \code{from = "fastidentity/smiles"}.
#' @details \code{<source id>} is any valid PubChem Data Source ID. When
#' \code{from = "sourceid/<source id>"}, the query is the ID of the substance in
Expand All @@ -46,8 +46,8 @@
#' depositor names. Depositor names are not case sensitive.
#' @details Depositor names and Data Source IDs can be found at
#' \url{https://pubchem.ncbi.nlm.nih.gov/sources/}.
#' @details \code{<assay target>} is assembled as "\code{target}/\{\code{gi} |
#' \code{proteinname} | \code{geneid} | \code{genesymbol} | \code{accession}\}",
#' @details \code{<assay target>} is assembled as "\code{target}/(\code{gi} |
#' \code{proteinname} | \code{geneid} | \code{genesymbol} | \code{accession})",
#' e.g. \code{from = "target/geneid"} will query by GeneID.
#' @references Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public
#' Information System for
Expand Down
6 changes: 3 additions & 3 deletions R/srs.R
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ srs_query <-
if (!ping_service("srs")) stop(webchem_message("service_down"))
names(query) <- query
from <- match.arg(from)
entity_url <- "https://cdxnodengn.epa.gov/cdx-srs-rest/"
entity_url <- "https://cdxapps.epa.gov/oms-substance-registry-services/rest-api"
if (from == "cas"){
query <- as.cas(query, verbose = verbose)
}
Expand All @@ -55,12 +55,12 @@ srs_query <-
}
if (verbose) message(httr::message_for_status(response))
if (response$status_code == 200) {
text_content <- httr::content(response, "text")
text_content <- httr::content(response, "text", encoding = "utf-8")
if (text_content == "[]") {
if (verbose) webchem_message("not_available")
return(NA)
} else {
jsonlite::fromJSON(text_content)
tibble::as_tibble(jsonlite::fromJSON(text_content))
}
} else {
return(NA)
Expand Down
64 changes: 44 additions & 20 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
#' @param x character; input InChIKey
#' @param type character; How should be checked? Either, by format (see above)
#' ('format') or by ChemSpider ('chemspider').
#' @param apikey character; your API key. If NULL (default),
#' \code{cs_check_key()} will look for it in .Renviron or .Rprofile. Only
#' used when `type = "chemspider"`.
#' @param verbose logical; print messages during processing to console?
#' @return a logical
#'
Expand All @@ -31,24 +34,32 @@
#' is.inchikey('BQJCRHHNABKAKU/KBQPJGBKSA/N')
#' is.inchikey('BQJCRHHNABKAKU-KBQPJGBKXA-N')
#' is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSB-N')
is.inchikey = function(x, type = c('format', 'chemspider'),
verbose = getOption("verbose")) {
is.inchikey = function(
x,
type = c('format', 'chemspider'),
apikey = NULL,
verbose = getOption("verbose")
) {
# x <- 'BQJCRHHNABKAKU-KBQPJGBKSA-N'
if (length(x) > 1) {
stop('Cannot handle multiple input strings.')
}

type <- match.arg(type)
out <- switch(type,
format = is.inchikey_format(x, verbose = verbose),
chemspider = is.inchikey_cs(x, verbose = verbose))
out <- switch(
type,
format = is.inchikey_format(x, verbose = verbose),
chemspider = is.inchikey_cs(x, apikey = apikey, verbose = verbose)
)
return(out)
}


#' Check if input is a valid inchikey using ChemSpider API
#'
#' @param x character; input string
#' @param apikey character; your API key. If NULL (default),
#' \code{cs_check_key()} will look for it in .Renviron or .Rprofile.
#' @param verbose logical; print messages during processing to console?
#' @return a logical
#'
Expand All @@ -65,9 +76,15 @@ is.inchikey = function(x, type = c('format', 'chemspider'),
#' is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKXA-N')
#' is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSB-N')
#' }
is.inchikey_cs <- function(x, verbose = getOption("verbose")){

if (!ping_service("cs_web")) stop(webchem_message("service_down"))
is.inchikey_cs <- function(
x,
apikey = NULL,
verbose = getOption("verbose")
){
if (is.null(apikey)) {
apikey <- cs_check_key()
}
if (!ping_service("cs")) stop(webchem_message("service_down"))

if (length(x) > 1) {
stop('Cannot handle multiple input strings.')
Expand All @@ -76,13 +93,20 @@ is.inchikey_cs <- function(x, verbose = getOption("verbose")){
if (verbose) webchem_message("na")
return(NA)
}
baseurl <- 'http://www.chemspider.com/InChI.asmx/IsValidInChIKey?'
qurl <- paste0(baseurl, 'inchi_key=', x)
webchem_sleep(type = 'scrape')
qurl <- 'https://api.rsc.org/compounds/v1/tools/validate/inchikey'
headers <- c(
"Accept" = "application/json",
"Content-Type" = "application/json",
"apikey" = apikey
)
body <- list("inchikey" = x) |> jsonlite::toJSON(auto_unbox = TRUE)
webchem_sleep(type = 'API')
if (verbose) webchem_message("query", x, appendLF = FALSE)
res <- try(httr::RETRY("GET",
qurl,
httr::user_agent(webchem_url()),
res <- try(httr::RETRY("POST",
url = qurl,
httr::add_headers(.headers = headers),
body = body,
encode = "json",
terminate_on = 404,
quiet = TRUE), silent = TRUE)
if (inherits(res, "try-error")) {
Expand All @@ -91,13 +115,13 @@ is.inchikey_cs <- function(x, verbose = getOption("verbose")){
}
if (verbose) message(httr::message_for_status(res))
if (res$status_code == 200){
h <- xml2::read_xml(res)
out <- as.logical(xml_text(h))
return(out)
}
else {
return(NA)
out <- as.logical(httr::content(res))
} else if (res$status_code == 400) {
out <- FALSE
} else {
out <- NA
}
return(out)
}


Expand Down
43 changes: 1 addition & 42 deletions R/webchem-package.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,48 +4,7 @@
#' of web APIs for chemical information.
#'
#' @docType package
#' @name webchem
#' @importFrom methods is
#' @importFrom utils globalVariables
if (getRversion() >= "2.15.1")
globalVariables(c("."))
"_PACKAGE"



#' Organic plant protection products in the river Jagst / Germany in 2013
#'
#' This dataset comprises environmental monitoring data of organic plant protection products
#' in the year 2013 in the river Jagst, Germany.
#' The data is publicly available and can be retrieved from the
#' LUBW Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg.
#' It has been preprocessed and comprises measurements of 34 substances.
#' Substances without detects have been removed.
#' on 13 sampling occasions.
#' Values are given in ug/L.
#'
#' @format A data frame with 442 rows and 4 variables:
#' \describe{
#' \item{date}{sampling data}
#' \item{substance}{substance names}
#' \item{value}{concentration in ug/L}
#' \item{qual}{qualifier, indicating values < LOQ}
#' }
#' @source \url{https://udo.lubw.baden-wuerttemberg.de/public/pages/home/index.xhtml}
"jagst"


#' Acute toxicity data from U.S. EPA ECOTOX
#'
#' This dataset comprises acute ecotoxicity data of 124 insecticides.
#' The data is publicly available and can be retrieved from the EPA ECOTOX database
#' (\url{https://cfpub.epa.gov/ecotox/})
#' It comprises acute toxicity data (D. magna, 48h, Laboratory, 48h) and has been
#' preprocessed (remove non-insecticides, aggregate multiple value, keep only numeric data etc).
#'
#' @format A data frame with 124 rows and 2 variables:
#' \describe{
#' \item{cas}{CAS registry number}
#' \item{value}{LC50value}
#' }
#' @source \url{https://cfpub.epa.gov/ecotox/}
"lc50"
1 change: 1 addition & 0 deletions R/zzz.R
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
if (getRversion() >= "2.15.1") utils::globalVariables(c("."))
20 changes: 10 additions & 10 deletions man/get_cid.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions man/is.inchikey.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 4 additions & 1 deletion man/is.inchikey_cs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/jagst.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/lc50.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 2e875ec

Please sign in to comment.