Skip to content

Commit

Permalink
Merge pull request #153 from ropensci/152-support-entities
Browse files Browse the repository at this point in the history
Support entity lists (datasets)
  • Loading branch information
florianm authored Mar 15, 2024
2 parents e2d452d + 4b15682 commit d04a25b
Show file tree
Hide file tree
Showing 29 changed files with 1,569 additions and 166 deletions.
29 changes: 22 additions & 7 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,26 +36,41 @@ Git and GitHub.
For more general info about contributing to `ruODK`, see the
[Resources](#resources) at the end of this document.

### Naming conventions
ruODK names functions after ODK Central endpoints. If there are aliases, such as
"Dataset" and "Entity List", choose the alias that is shown to Central users
(here, choose "Entity List") over internally used terms.

Function names combine the object name (`project`, `form`, `submission`,
`attachment`, `entitylist`, `entity`, etc.) with the action (`list`, `detail`,
`patch`) as snake case, e.g. `project_list()`.
In case of any uncertainty, discussion is welcome.

In contrast, `pyODK` uses a class based approach with the pluralised object name
separated from the action `client.entity_lists.list()`.

Documentation should capitalise ODK Central object names: Project, Form,
Submission, Entity.

### Prerequisites
To test the package, you will need valid credentials for the ODK Central instance
used as a test server.
Create an [account request issue](https://github.com/ropensci/ruODK/issues/new/choose).
To test the package, you will need valid credentials for an existing ODK Central
instance to be used as a test server.

Before you do a pull request, you should always file an issue and make sure
the maintainers agree that it is a problem, and is happy with your basic proposal
for fixing it.
If you have found a bug, follow the issue template to create a minimal
[reprex](https://www.tidyverse.org/help/#reprex).
[reprex](https://www.tidyverse.org/help/#reprex) if you can do so without
revealing sensitive information. Never include credentials in your reprex.

### Checklists
Some changes have intricate internal and external dependencies, which are easy
to miss and break. These checklists aim to avoid these pitfalls.

Test and update reverse dependencies (wastdr, etlTurtleNesting, etc.).

#### Adding a dependency
* Update DESCRIPTION
* Update GH Actions install workflows - do R package deps have system deps? Can GHA install them in all environments?
* Update GH Actions install workflows - do R package deps have system deps?
Can GHA install them in all environments?
* Update Dockerfile
* Update binder install.R
* Update installation instructions
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: ruODK
Title: An R Client for the ODK Central API
Version: 1.4.2
Version: 1.4.9.9002
Authors@R:
c(person(given = c("Florian", "W."),
family = "Mayer",
Expand Down
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ export(encryption_key_list)
export(enexpr)
export(enquo)
export(ensym)
export(entitylist_detail)
export(entitylist_download)
export(entitylist_list)
export(entitylist_update)
export(expr)
export(exprs)
export(form_detail)
Expand Down
10 changes: 7 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# ruODK 1.5.0
## Major changes
* Support Entities and Entity Lists (Datasets) (#152)

# ruODK 1.4.2
This release migrates the `ruODK` test suite to a new test server
`ruodk.getodk.cloud` which was generously sponsored by GetODK.
Expand All @@ -16,9 +20,9 @@ This release fixes a few compatibility issues and bumps dependencies to R (4.1)
and imported/suggested packages.
Upgrade carefully and revert to 1.3.12 if things go awry.

* Update to new tidyselect syntax: Using vectors of names to select makes
tidyselect complain (WARN, soon ERROR). We wrap all programmatic selections of
variable names in `dplyr::all_of()` where we expect a single variable to be
* Update to new `tidyselect` syntax: Using vectors of names to select makes
`tidyselect` complain (WARN, soon ERROR). We wrap all programmatic selections
of variable names in `dplyr::all_of()` where we expect a single variable to be
selected, and `dplyr::any_of()` where we select using fuzzy matching
(e.g. `dplyr::starts_with()`). (#146)
* Make `ruODK::form_list()` robust against `reviewState` missing from outdated
Expand Down
93 changes: 93 additions & 0 deletions R/entitylist_detail.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#' Show Entity List details.
#'
#' `r lifecycle::badge("maturing")`
#'
#' An Entity List is a named collection of Entities that have the same
#' properties.
#' Entity List can be linked to Forms as Attachments.
#' This will make it available to clients as an automatically-updating CSV.
#'
#' This function is supported from ODK Central v2022.3 and will warn if the
#' given odkc_version is lower.
#'
#' @template param-pid
#' @template param-did
#' @template param-url
#' @template param-auth
#' @template param-retries
#' @template param-odkcv
#' @template param-orders
#' @template param-tz
#' @return A list of lists following the exact format and naming of the API
#' response. Since this nested list is so deeply nested and irregularly shaped
#' it is not trivial to rectangle the result into a tibble.
# nolint start
#' @seealso \url{ https://docs.getodk.org/central-api-dataset-management/#datasets}
# nolint end
#' @family entity-management
#' @export
#' @examples
#' \dontrun{
#' # See vignette("setup") for setup and authentication options
#' # ruODK::ru_setup(svc = "....svc", un = "[email protected]", pw = "...")
#'
#' ds <- entitylist_list(pid = get_default_pid())
#' ds1 <- entitylist_detail(pid = get_default_pid(), did = ds$name[1])
#'
#' ds1 |> listviewer::jsonedit()
#' ds1$linkedForms |>
#' purrr::list_transpose() |>
#' tibble::as_tibble()
#' ds1$sourceForms |>
#' purrr::list_transpose() |>
#' tibble::as_tibble()
#' ds1$properties |>
#' purrr::list_transpose() |>
#' tibble::as_tibble()
#' }
entitylist_detail <- function(pid = get_default_pid(),
did = NULL,
url = get_default_url(),
un = get_default_un(),
pw = get_default_pw(),
retries = get_retries(),
odkc_version = get_default_odkc_version(),
orders = c(
"YmdHMS",
"YmdHMSz",
"Ymd HMS",
"Ymd HMSz",
"Ymd",
"ymd"
),
tz = get_default_tz()) {
yell_if_missing(url, un, pw, pid = pid)

if (is.null(did)) {
ru_msg_abort("entitylist_detail requires the Entity List name as 'did=\"name\"'.")
}

if (odkc_version |> semver_lt("2022.3")) {
ru_msg_warn("entitylist_detail is supported from v2022.3")
}

ds <- httr::RETRY(
"GET",
httr::modify_url(url,
path = glue::glue(
"v1/projects/{pid}/datasets/",
"{URLencode(did, reserved = TRUE)}"
)
),
httr::add_headers(
"Accept" = "application/json",
"X-Extended-Metadata" = "true"
),
httr::authenticate(un, pw),
times = retries
) |>
yell_if_error(url, un, pw) |>
httr::content(encoding = "utf-8")
}

# usethis::use_test("entitylist_detail") # nolint
213 changes: 213 additions & 0 deletions R/entitylist_download.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
#' Download an Entity List as CSV.
#'
#' `r lifecycle::badge("maturing")`
#'
#' The downloaded CSV file is named after the entity list name.
#' The download location defaults to the current workdir, but can be modified
#' to a different folder path which will be created if it doesn't exist.
#'
#' An Entity List is a named collection of Entities that have the same
#' properties.
#' Entity List can be linked to Forms as Attachments.
#' This will make it available to clients as an automatically-updating CSV.
#'
#' Entity Lists can be used as Attachments in other Forms, but they can also be
#' downloaded directly as a CSV file.
#' The CSV format closely matches the OData Dataset (Entity List) Service
#' format, with columns for system properties such as `__id` (the Entity UUID),
#' `__createdAt`, `__creatorName`, etc., the Entity Label label, and the
#' Dataset (Entity List )/Entity Properties themselves.
#' If any Property for an given Entity is blank (e.g. it was not captured by
#' that Form or was left blank), that field of the CSV is blank.
#'
#' The ODK Central `$filter` querystring parameter can be used to filter on
#' system-level properties, similar to how filtering in the OData Dataset
#' (Entity List) Service works.
#' Of the [OData filter specs](https://docs.oasis-open.org/odata/odata/v4.01/odata-v4.01-part1-protocol.html#_Toc31358948)
#' ODK Central implements a [growing set of features
#' ](https://docs.getodk.org/central-api-odata-endpoints/#data-document).
#' `ruODK` provides the parameter `filter` (str) which, if set, will be passed
#' on to the ODK Central endpoint as is.
#'
#' The ODK Central endpoint supports the [`ETag` header
#' ](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ETag), which can
#' be used to avoid downloading the same content more than once.
#' When an API consumer calls this endpoint, the endpoint returns a value in
#' the `ETag` header.
#' If you pass that value in the [`If-None-Match` header
#' ](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-None-Match)
#' of a subsequent request,
#' then if the Entity List has not been changed since the previous request,
#' you will receive 304 Not Modified response; otherwise you'll get the new
#' data.
#' `ruODK` provides the parameter `etag` which can be set from the output of
#' a previous call to `entitylist_download()`. `ruODK` strips the `W/\"` and
#' `\"` from the returned etag and expects the stripped etag as parameter.
#'
#' @template param-pid
#' @template param-did
#' @template param-url
#' @template param-auth
#' @param local_dir The local folder to save the downloaded files to,
#' default: \code{here::here}.
#' If the folder does not exist it will be created.
#' @param etag (str) The etag value from a previous call to
#' `entitylist_download()`. The value must be stripped of the `W/\"` and `\"`,
#' which is the format of the etag returned by `entitylist_download()`.
#' If provided, only new entities will be returned.
#' If the same `local_dir` is chosen and `overwrite` is set to `TRUE`,
#' the downloaded CSV will also be overwritte, losing the Entities downloaded
#' earlier.
#' Default: NULL (no filtering, all entities returned).
#' @param filter (str) A valid filter string.
#' Default: NULL (no filtering, all entities returned).
#' @param overwrite Whether to overwrite previously downloaded file,
#' default: FALSE
#' @template param-retries
#' @template param-odkcv
#' @template param-orders
#' @template param-tz
#' @template param-verbose
#' @return A list of four items:
#' - entities (tbl_df) The Entity List as tibble
#' - http_status (int) The HTTP status code of the response.
#' 200 if OK, 304 if a given etag finds no new entities created.
#' - etag (str) The ETag to use in subsequent calls to `entitylist_download()`
#' - downloaded_to (fs_path) The path to the downloaded CSV file
#' - downloaded_on (POSIXct) The time of download in the local timezome
# nolint start
#' @seealso \url{https://docs.getodk.org/central-api-dataset-management/#datasets}
# nolint end
#' @family entity-management
#' @export
#' @examples
#' \dontrun{
#' # See vignette("setup") for setup and authentication options
#' # ruODK::ru_setup(svc = "....svc", un = "[email protected]", pw = "...")
#'
#' ds <- entitylist_list(pid = get_default_pid())
#' ds1 <- entitylist_download(pid = get_default_pid(), did = ds$name[1])
#' # ds1$entities
#' # ds1$etag
#' # ds1$downloaded_to
#' # ds1$downloaded_on
#'
#' ds2 <- entitylist_download(
#' pid = get_default_pid(),
#' did = ds$name[1],
#' etag = ds1$etag
#' )
#' # ds2$http_status == 304
#'
#' newest_entity_date <- as.Date(max(ds1$entities$`__createdAt`))
#' ds3 <- entitylist_download(
#' pid = get_default_pid(),
#' did = ds$name[1],
#' filter = glue::glue("__createdAt le {newest_entity_date}")
#' )
#' }
entitylist_download <- function(pid = get_default_pid(),
did = NULL,
url = get_default_url(),
un = get_default_un(),
pw = get_default_pw(),
local_dir = here::here(),
filter = NULL,
etag = NULL,
overwrite = TRUE,
retries = get_retries(),
odkc_version = get_default_odkc_version(),
orders = c(
"YmdHMS",
"YmdHMSz",
"Ymd HMS",
"Ymd HMSz",
"Ymd",
"ymd"
),
tz = get_default_tz(),
verbose = get_ru_verbose()) {
# Gatecheck params
yell_if_missing(url, un, pw, pid = pid)

if (is.null(did)) {
ru_msg_abort(
"entitylist_download requires the Entity List name as 'did=\"name\"'."
)
}

# Gatecheck ODKC version
if (odkc_version |> semver_lt("2022.3")) {
ru_msg_warn("entitylist_download is supported from v2022.3")
}

# Download file destination directory
if (!fs::dir_exists(local_dir)) {
fs::dir_create(local_dir)
}

# Downloaded file path
pth <- fs::path(local_dir, glue::glue("{did}.csv"))

# Emit message
if (fs::file_exists(pth)) {
if (overwrite == TRUE) {
"Overwriting previous entity list: \"{pth}\"" %>%
glue::glue() %>%
ru_msg_success(verbose = verbose)
} else {
"Keeping previous entity list: \"{pth}\"" %>%
glue::glue() %>%
ru_msg_success(verbose = verbose)
}
} else {
"Downloading entity list \"{did}\" to {pth}" %>%
glue::glue() %>%
ru_msg_success(verbose = verbose)
}

# Headers: accept CSV, set ETag if given
headers <- c(Accept = "text/csv; charset=utf-8")
if (!is.null(etag)) {
if (odkc_version |> semver_lt("2023.3")) {
ru_msg_warn("entitylist_download ETag is supported from v2023.3")
}
headers <- c(headers, c("If-None-Match" = etag))
}

# Query: filter
query <- NULL
if (!is.null(filter)) {
query <- list("$filter" = utils::URLencode(filter, reserved = TRUE))
}

res <- httr::RETRY(
"GET",
httr::modify_url(
url,
path = glue::glue(
"v1/projects/{pid}/datasets/",
"{utils::URLencode(did, reserved = TRUE)}/entities.csv"
),
query = query
),
httr::add_headers(.headers = headers),
httr::authenticate(un, pw),
httr::write_disk(pth, overwrite = overwrite),
times = retries
)
# yell_if_error(url, un, pw) # allow HTTP 304 for no new submissions

list(
entities = httr::content(res, encoding = "utf-8"),
etag = res$headers$etag |>
stringr::str_remove_all(stringr::fixed("W/\"")) |>
stringr::str_remove_all(stringr::fixed("\"")),
http_status = res$status_code,
downloaded_to = pth,
downloaded_on = isodt_to_local(res$date, orders = orders, tz = tz)
)
}


# usethis::use_test("entitylist_download") # nolint
Loading

0 comments on commit d04a25b

Please sign in to comment.