-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Release V0.1.0 (as submitted to CRAN)
- Loading branch information
0 parents
commit f96b0ba
Showing
23 changed files
with
2,469 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
Package: ECOTOXr | ||
Type: Package | ||
Title: Download and Extract Data from US EPA's ECOTOX Database | ||
Version: 0.1.0 | ||
Date: 2021-10-03 | ||
Authors@R: c(person("Pepijn", "de Vries", role = c("aut", "cre", "dtc"), | ||
email = "[email protected]")) | ||
Author: | ||
Pepijn de Vries [aut, cre, dtc] | ||
Maintainer: Pepijn de Vries <[email protected]> | ||
Description: The US EPA ECOTOX database is a freely available database | ||
with a treasure of aquatic and terrestrial ecotoxicological data. | ||
As the online search interface doesn't come with an API, this | ||
package provides the means to easily access and search the database | ||
in R. To this end, all raw tables are downloaded from the EPA website | ||
and stored in a local SQLite database. | ||
Depends: | ||
R (>= 3.5.0), | ||
RSQLite | ||
Imports: | ||
crayon, | ||
dplyr, | ||
rappdirs, | ||
readr, | ||
rvest, | ||
stringr, | ||
utils | ||
Suggests: | ||
testthat (>= 3.0.0), | ||
webchem | ||
URL: https://github.com/pepijn-devries/ECOTOXr | ||
BugReports: https://github.com/pepijn-devries/ECOTOXr/issues | ||
License: GPL (>= 3) | ||
Encoding: UTF-8 | ||
LazyData: true | ||
RoxygenNote: 7.1.2 | ||
Config/testthat/edition: 3 |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(build_ecotox_sqlite) | ||
export(check_ecotox_availability) | ||
export(cite_ecotox) | ||
export(dbConnectEcotox) | ||
export(dbDisconnectEcotox) | ||
export(download_ecotox_data) | ||
export(get_ecotox_info) | ||
export(get_ecotox_path) | ||
export(get_ecotox_sqlite_file) | ||
export(list_ecotox_fields) | ||
export(search_ecotox) | ||
export(search_query_ecotox) | ||
importFrom(RSQLite,dbConnect) | ||
importFrom(RSQLite,dbDisconnect) | ||
importFrom(RSQLite,dbExecute) | ||
importFrom(RSQLite,dbWriteTable) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
ECOTOXr v0.1.0 (Release date: 2021-10-03) | ||
============= | ||
|
||
* Inital release which can: | ||
|
||
* Download raw ECOTOX database tables from the EPA website | ||
* Build an SQLite database from those files | ||
* Search and extract data from the created local database |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
#' Package description | ||
#' | ||
#' Everything you need to know when you start using the ECOTOXr package. | ||
#' | ||
#' The ECOTOXr provides the means to efficiently search, extract and analyse \href{https://www.epa.gov/}{US EPA} | ||
#' \href{https://cfpub.epa.gov/ecotox/}{ECOTOX} data, with a focus on reproducible results. Although the package | ||
#' creator/maintainer is confident in the quality of this software, it is the end users sole responsibility to | ||
#' assure the quality of his or her work while using this software. As per the provided license terms the package | ||
#' maintainer is not liable for any damage resulting from its usage. That being said, below we present some tips | ||
#' for generating reproducible results with this package. | ||
#' | ||
#' @section How do I get started?: | ||
#' Installing this package is only the first step to get things started. You need to perform the following steps | ||
#' in order to use the package to its full capacity. | ||
#' | ||
#' \itemize{ | ||
#' \item{ | ||
#' First download a copy of the complete EPA database. This can be done by calling \code{\link{download_ecotox_data}}. | ||
#' This may not always work on all machines as R does not always accept the website SSL certificate from the EPA. | ||
#' In those cases the zipped archive with the database files can be downloaded manually with a different (more | ||
#' forgiving) browser. The files from the zip archive can be extracted to a location of choice. | ||
#' } | ||
#' \item{ | ||
#' Next, an SQLite database needs to be build from the downloaded files. This will be done automatically when | ||
#' you used \code{\link{download_ecotox_data}} in the previous step. When you have manually downloaded the files | ||
#' you can call \code{\link{build_ecotox_sqlite}} to build the database locally. | ||
#' } | ||
#' \item{ | ||
#' When the previous steps have been performed successfully, you can now search the database by calling | ||
#' \code{\link{search_ecotox}}. You can also use \code{\link{dbConnectEcotox}} to open a connection to the | ||
#' database. You can query the database using this connection and any of the methods provided from the | ||
#' \link[DBI:DBI]{DBI} or \link[RSQLite:RSQLite]{RSQLite} packages. | ||
#' } | ||
#' } | ||
#' | ||
#' @section How do I obtain reproducible results?: | ||
#' Each individual user is responsible for evaluating the reproducibility of his or her work. Although | ||
#' this package offers instruments to achieve reproducibility, it is not guaranteed. In order to increase the | ||
#' chances of generating reproducible results, one should adhere at least to the following rules: | ||
#' \itemize{ | ||
#' \item{ | ||
#' Always use an official release from CRAN, and cite the version used in your analyses (\code{citation("ECOTOXr")}). | ||
#' Different versions, may produce different end results (although we will strive for backward compatibility). | ||
#' } | ||
#' \item{ | ||
#' Make sure you are working with a clean (unaltered) version of the database. When in doubt, download and build | ||
#' a fresh copy of the database (\code{\link{download_ecotox_data}}). Also cite the (release) version of the downloaded | ||
#' database (\code{\link{cite_ecotox}}), and the system operating system in which the local database was build | ||
#' \code{\link{get_ecotox_info}}). Or, just make sure that you never modify the database (e.g., write data to it, delete | ||
#' data from it, etc.) | ||
#' } | ||
#' \item{ | ||
#' In order to avoid platform dependencies it is advised to only include non-accented alpha-numerical characters in | ||
#' search terms. See also \link{search_ecotox} and \link{build_ecotox_sqlite}. | ||
#' } | ||
#' \item{ | ||
#' When trying to reproduce database extractions from earlier database releases, filter out additions after | ||
#' that specific release. This can be done by adding output fields 'tests.modified_date', 'tests.created_date' and | ||
#' 'tests.published_date' to your search and compare those with the release date of the database you are trying to | ||
#' reproduce results from. | ||
#' } | ||
#' } | ||
#' | ||
#' @section Why isn't the database included in the package?: | ||
#' This package doesn't come bundled with a copy of the database which needs to be downloaded the first time the | ||
#' package is used. Why is this? There are several reasons: | ||
#' \itemize{ | ||
#' \item{ | ||
#' The database is maintained and updated by the \href{https://www.epa.gov/}{US EPA}. This process is and should be | ||
#' outside the sphere of influence of the package maintainer. | ||
#' } | ||
#' \item{ | ||
#' Packages on CRAN are not allowed to contain large amounts of data. Publication on CRAN is key to control | ||
#' the quality of this package and therefore outweighs the convenience of having the data bundled with the package. | ||
#' } | ||
#' \item{ | ||
#' The user has full control over the release version of the database that is being used. | ||
#' } | ||
#' } | ||
#' | ||
#' @section Why doesn't this package search the online ECOTOX database?: | ||
#' Although this is possible, there are several reasons why we opted for creating a local copy: | ||
#' \itemize{ | ||
#' \item{ | ||
#' The user would be restricted to the search options provided on the website (\href{https://cfpub.epa.gov/ecotox/}{ECOTOX}). | ||
#' } | ||
#' \item{ | ||
#' The online database doesn't come with an API that would allow for convenient interface. | ||
#' } | ||
#' \item{ | ||
#' The user is not limited by an internet connection and its bandwidth. | ||
#' } | ||
#' \item{ | ||
#' Not all database fields can be retrieved from the online interface. | ||
#' } | ||
#' } | ||
#' @docType package | ||
#' @name ECOTOXr | ||
#' @author Pepijn de Vries | ||
#' @references | ||
#' Official US EPA ECOTOX website: | ||
#' \url{https://cfpub.epa.gov/ecotox/} | ||
NULL |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
#' @rdname get_path | ||
#' @name get_ecotox_sqlite_file | ||
#' @export | ||
get_ecotox_sqlite_file <- function(path = get_ecotox_path(), version) { | ||
if (missing(version)) { | ||
version <- NULL | ||
} else { | ||
if (length(version) != 1) stop("Argument 'version' should hold a single element!") | ||
version <- as.Date(version, format = "%m_%d_%Y") | ||
} | ||
files <- attributes(.fail_on_missing(path))$files | ||
results <- nrow(files) | ||
files <- files[which(files$date == ifelse(is.null(version), max(files$date)[[1]], version)),] | ||
if (results > 1 && is.null(version)) { | ||
warning(sprintf("Multiple versions of the database found and not one specified. Using the most recent version (%s)", | ||
format(files$date, "%Y-%m-%d"))) | ||
} | ||
return(file.path(files$path, files$database)) | ||
} | ||
|
||
#' Open or close a connection to the local ECOTOX database | ||
#' | ||
#' Wrappers for \code{\link[RSQLite:SQLite]{dbConnect}} and \code{\link[RSQLite:SQLite]{dbDisconnect}} methods. | ||
#' | ||
#' Open or close a connection to the local ECOTOX database. These functions are only required when you want | ||
#' to send custom queries to the database. For most searches the \code{\link{search_ecotox}} function | ||
#' will be adequate. | ||
#' | ||
#' @param path A \code{character} string with the path to the location of the local database (default is | ||
#' \code{\link{get_ecotox_path}()}). | ||
#' @param version A \code{character} string referring to the release version of the database you wish to locate. | ||
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by | ||
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. | ||
#' @param conn An open connection to the ECOTOX database that needs to be closed. | ||
#' @param ... Arguments that are passed to \code{\link[RSQLite:SQLite]{dbConnect}} method | ||
#' or \code{\link[RSQLite:SQLite]{dbDisconnect}} method. | ||
#' @return A database connection in the form of a \code{\link[DBI]{DBIConnection-class}} object. | ||
#' The object is tagged with: a time stamp; the package version used; and the | ||
#' file path of the SQLite database used in the connection. These tags are added as attributes | ||
#' to the object. | ||
#' @rdname dbConnectEcotox | ||
#' @name dbConnectEcotox | ||
#' @examples | ||
#' \dontrun{ | ||
#' ## This will only work when a copy of the database exists: | ||
#' con <- dbConnectEcotox() | ||
#' | ||
#' ## check if the connection works by listing the tables in the database: | ||
#' dbListTables(con) | ||
#' | ||
#' ## Let's be a good boy/girl and close the connection to the database when we're done: | ||
#' dbDisconnectEcotox(con) | ||
#' } | ||
#' @author Pepijn de Vries | ||
#' @export | ||
dbConnectEcotox <- function(path = get_ecotox_path(), version, ...) { | ||
f <- get_ecotox_sqlite_file(path, version) | ||
return(.add_tags(RSQLite::dbConnect(RSQLite::SQLite(), f, ...), f)) | ||
} | ||
|
||
#' @rdname dbConnectEcotox | ||
#' @name dbDisconnectEcotox | ||
#' @export | ||
dbDisconnectEcotox <- function(conn, ...) { | ||
RSQLite::dbDisconnect(conn, ...) | ||
} | ||
|
||
#' Cite the downloaded copy of the ECOTOX database | ||
#' | ||
#' Cite the downloaded copy of the ECOTOX database and this package for reproducible results. | ||
#' | ||
#' When you download a copy of the EPA ECOTOX database using \code{\link{download_ecotox_data}()}, a BibTex file | ||
#' is stored that registers the database release version and the access (= download) date. Use this function | ||
#' to obtain a citation to that specific download. | ||
#' | ||
#' In order for others to reproduce your results, it is key to cite the data source as accurately as possible. | ||
#' @param path A \code{character} string with the path to the location of the local database (default is | ||
#' \code{\link{get_ecotox_path}()}). | ||
#' @param version A \code{character} string referring to the release version of the database you wish to locate. | ||
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by | ||
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. | ||
#' @return Returns a \code{vector} of \code{\link{bibentry}}'s, containing a reference to the downloaded database | ||
#' and this package. | ||
#' @rdname cite_ecotox | ||
#' @name cite_ecotox | ||
#' @examples | ||
#' \dontrun{ | ||
#' ## In order to cite downloaded database and this package: | ||
#' cite_ecotox() | ||
#' } | ||
#' @author Pepijn de Vries | ||
#' @export | ||
cite_ecotox <- function(path = get_ecotox_path(), version) { | ||
db <- get_ecotox_sqlite_file(path, version) | ||
bib <- gsub(".sqlite", "_cit.txt", db, fixed = T) | ||
if (!file.exists(bib)) stop("No bibentry reference to database download found!") | ||
result <- utils::readCitationFile(bib) | ||
return(c(result, utils::citation("ECOTOXr"))) | ||
} | ||
|
||
#' Get information on the local ECOTOX database when available | ||
#' | ||
#' Get information on how and when the local ECOTOX database was build. | ||
#' | ||
#' Get information on how and when the local ECOTOX database was build. This information is retrieved | ||
#' from the log-file that is (optionally) stored with the local database when calling \code{\link{download_ecotox_data}} | ||
#' or \code{\link{build_ecotox_sqlite}}. | ||
#' @param path A \code{character} string with the path to the location of the local database (default is | ||
#' \code{\link{get_ecotox_path}()}). | ||
#' @param version A \code{character} string referring to the release version of the database you wish to locate. | ||
#' It should have the same format as the date in the EPA download link, which is month, day, year, separated by | ||
#' underscores ("\%m_\%d_\%Y"). When missing, the most recent available copy is selected automatically. | ||
#' @return Returns a \code{vector} of \code{character}s, containing a information on the selected local ECOTOX database. | ||
#' @rdname get_ecotox_info | ||
#' @name get_ecotox_info | ||
#' @examples | ||
#' \dontrun{ | ||
#' ## Show info on the current database (only works when one is downloaded and build): | ||
#' get_ecotox_info() | ||
#' } | ||
#' @author Pepijn de Vries | ||
#' @export | ||
get_ecotox_info <- function(path = get_ecotox_path(), version) { | ||
default <- "No information available\n" | ||
inf <- tryCatch({ | ||
db <- get_ecotox_sqlite_file(path, version) | ||
gsub(".sqlite", ".log", db, fixed = T) | ||
}, error = function(e) return(default)) | ||
if (file.exists(inf)) { | ||
inf <- readLines(inf) | ||
} else { | ||
inf <- default | ||
} | ||
cat(paste(inf, collapse = "\n")) | ||
return(invisible(inf)) | ||
} | ||
|
||
#' List the field names that are available from the ECOTOX database | ||
#' | ||
#' List the field names (table headers) that are available from the ECOTOX database | ||
#' | ||
#' This can be useful when specifying a \code{\link{search_ecotox}}, to identify which fields | ||
#' are available from the database, for searching and output. | ||
#' @param which A \code{character} string that specifies which fields to return. Can be any of: | ||
#' '\code{default}': returns default output field names; '\code{all}': returns all fields; or | ||
#' '\code{full}': returns all except fields from table 'dose_response_details'. | ||
#' @param include_table A \code{logical} value indicating whether the table name should be included | ||
#' as prefix. Default is \code{TRUE}. | ||
#' @return Returns a \code{vector} of type \code{character} containing the field names from the ECOTOX database. | ||
#' @rdname list_ecotox_fields | ||
#' @name list_ecotox_fields | ||
#' @examples | ||
#' ## Fields that are included in search results by default: | ||
#' list_ecotox_fields("default") | ||
#' | ||
#' ## All fields that are available from the ECOTOX database: | ||
#' list_ecotox_fields("all") | ||
#' | ||
#' ## All except fields from the table 'dose_response_details' | ||
#' ## that are available from the ECOTOX database: | ||
#' list_ecotox_fields("all") | ||
#' @author Pepijn de Vries | ||
#' @export | ||
list_ecotox_fields <- function(which = c("default", "full", "all"), include_table = TRUE) { | ||
which <- match.arg(which) | ||
result <- .db_specs$field_name | ||
if (include_table) result <- paste(.db_specs$table, result, sep = ".") | ||
if (which == "default") result <- result[.db_specs$default_output] | ||
if (which == "full") result <- result[.db_specs$table != "dose_response_details"] | ||
return(result) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
.add_tags <- function(x, sqlite) { | ||
if (missing(sqlite)) sqlite <- attributes(x)$database_file | ||
attributes(x)$date_created <- Sys.Date() | ||
attributes(x)$created_with <- sprintf("Package ECOTOXr v%s", utils::packageVersion("ECOTOXr")) | ||
attributes(x)$database_file <- sqlite | ||
return(x) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
.onAttach <- function(libname, pkgname){ | ||
packageStartupMessage({ | ||
if (check_ecotox_availability()) { | ||
crayon::green("ECOTOX database file located, you are ready to go!\n") | ||
} else { | ||
crayon::red("ECOTOX database file not present! Invoke download and database build using 'download_ecotox_data()'\n") | ||
} | ||
}) | ||
} | ||
|
||
#' @importFrom RSQLite dbExecute dbConnect dbDisconnect dbWriteTable | ||
NULL |
Oops, something went wrong.