Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge changes in Rc/v2.1.0 to main branch #7

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
532aaf5
Refactor code in the R/ folder
mingstat Oct 17, 2024
ff5d202
Update examples
mingstat Oct 18, 2024
432e835
Print the directory path and file names
mingstat Oct 18, 2024
9ec6f69
Add an arugment to print file paths if requested
mingstat Oct 21, 2024
bda40b4
Do not export get_base_dir()
mingstat Oct 21, 2024
b155b1a
Add get_nfs_path() function back
mingstat Oct 21, 2024
f7601bf
Keep the metadata format unchanged
mingstat Oct 21, 2024
6570461
Update README
mingstat Oct 22, 2024
488d679
Update package vignettes
mingstat Oct 22, 2024
1f256bc
Add sample data for docs and tests
mingstat Oct 22, 2024
088feb5
Update Rd files
mingstat Oct 23, 2024
30619ab
Update tests
mingstat Oct 23, 2024
8298ec2
Update test setup file
mingstat Oct 23, 2024
9cb51ac
Update integration guide
mingstat Oct 23, 2024
01de803
Make get_base_dir internal function
mingstat Oct 23, 2024
b111f9a
Export get_file_paths and load_data_files
mingstat Oct 23, 2024
a480124
Update DESCRIPTION file
mingstat Oct 23, 2024
8722c94
Update changelog
mingstat Oct 23, 2024
8bcefd3
Update roxygen examples
mingstat Oct 23, 2024
5b719dd
Fix lintr issues
mingstat Oct 23, 2024
62545e9
Fix styler issues
mingstat Oct 23, 2024
2bbf397
Try to identify issue from tests
mingstat Oct 23, 2024
a02950e
Use pattern matching to find files when no file extension is provided
mingstat Oct 23, 2024
51ba3ce
Fix styler issues
mingstat Oct 23, 2024
6e4a18a
Save demo data to a temp dir
mingstat Nov 1, 2024
89413a7
Remove pharmaverseadam data from package
mingstat Nov 1, 2024
4665a23
Remove comments inside functions
mingstat Nov 6, 2024
19ccc3d
Remvoe get_base_dir() and keep get_nfs_path()
mingstat Nov 6, 2024
d164c10
Update function docs
mingstat Nov 6, 2024
912957f
Update examples in README
mingstat Nov 6, 2024
52d058b
Update example in vignettes
mingstat Nov 6, 2024
c06a26a
Update R document files
mingstat Nov 6, 2024
b8d3bc3
Fix styler issues
mingstat Nov 6, 2024
abe10d6
Check argument print_file_paths
mingstat Nov 13, 2024
cf48a94
Remove unused code
mingstat Nov 13, 2024
6dc6e0e
Updated changelog
mingstat Nov 13, 2024
a14c500
Update Rd file
mingstat Nov 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: dv.loader
Type: Package
Title: Data loading module
Version: 2.0.0
Version: 2.1.0
Authors@R: c(
person( "Boehringer-Ingelheim Pharma GmbH & Co.KG", role = c("cph", "fnd")),
person( given = "Ming", family = "Yang", role = c("aut", "cre"), email = "[email protected]"),
Expand All @@ -13,10 +13,13 @@ License: Apache License (>= 2)
Encoding: UTF-8
LazyData: true
Depends: R (>= 3.5.0)
Imports: haven
Imports:
checkmate,
haven
Suggests:
testthat,
testthat (>= 3.0.0),
knitr,
rmarkdown
RoxygenNote: 7.3.0
VignetteBuilder: knitr
Config/testthat/edition: 3
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Generated by roxygen2: do not edit by hand

export(get_cre_path)
export(get_file_paths)
export(get_nfs_path)
export(load_data)
export(load_data_files)
8 changes: 8 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
# dv.loader 2.1.0

- Refactored code to improve readability and maintainability.

- Fixed issue of partial matching when the `file_names` argument contains no file extensions.
ml-ebs-ext marked this conversation as resolved.
Show resolved Hide resolved

- Added arguments `env_var` and `print_file_paths` in `load_data()` function to provide more flexibility and control.

# dv.loader 2.0.0

- GitHub release with QC report
Expand Down
171 changes: 124 additions & 47 deletions R/dvloader.R
Original file line number Diff line number Diff line change
@@ -1,61 +1,138 @@
#' gets the NFS base path from an env var
#' It assumes there is an env var
#' called RXD_DATA which holds the path suffix.
#' @return the NFS base path
#' @export
get_nfs_path <- function() {
base_path <- Sys.getenv("RXD_DATA")
# check that RXD_DATA is set
if (base_path == "") {
stop("Usage: get_nfs_path: RXD_DATA must be set")
#' Get Base Directory Path
#'
#' This function retrieves the base directory path from a specified environment variable.
#' It checks if the environment variable is set and if the directory exists.
#'
#' @param env_var [character(1)] The name of the environment variable containing the base directory path.
#'
#' @return [character(1)] The normalized path to the base directory.
#'
#' @examples
#' # Create a temporary directory
#' temp_dir <- tempdir()
#'
#' # Set the BASE_DIR environment variable
#' Sys.setenv(BASE_DIR = temp_dir)
#'
#' # Get the base directory path
#' dv.loader:::get_base_dir("BASE_DIR")
#'
#' @keywords internal
get_base_dir <- function(env_var) {
# Ensure env_var is a single character string
checkmate::assert_character(env_var, len = 1)
ml-ebs-ext marked this conversation as resolved.
Show resolved Hide resolved

# Get the value of the environment variable
base_dir <- Sys.getenv(env_var)

# Stop if the environment variable is not set
if (base_dir == "") {
stop("Environment variable ", env_var, " is not set")
}
return(base_path)

# Ensure the directory exists
checkmate::assert_directory_exists(base_dir)

# Return the normalized path
return(normalizePath(base_dir))
}

#' gets the NFS base path from an env var
#' alias for get_nfs_path to maintain backwards compatibility
#' Get NFS Path
#'
#' This function retrieves the path to the NFS (Network File System) directory.
#'
ml-ebs-ext marked this conversation as resolved.
Show resolved Hide resolved
#' @param env_var [character(1)] The environment variable name for the base directory. Default is "RXD_DATA".
#'
#' @return [character(1)] The path to the NFS directory.
#'
#' @export
get_cre_path <- get_nfs_path
ml-ebs-ext marked this conversation as resolved.
Show resolved Hide resolved

#' Loads data into memory based on study directory and one or more file_names.
#' @param sub_dir A relative directory/folder that will be appended to a base path defined by `Sys.getenv("RXD_DATA")`.
#' If the argument is left as NULL, the function will load data from the working directory `getwd()`.
#' @param file_names Study file or file_names name(s) - can be a vector of strings.
#' This is the only required argument.
#' @param use_wd for "use working directory" - a flag used when importing local files
#' not on NFS - default value is FALSE
#' @param prefer_sas if set to TRUE, imports sas7bdat files first before looking for
#' RDS files (the opposite of default behavior)
#' @return a list of dataframes
get_nfs_path <- function(env_var = "RXD_DATA") {
get_base_dir(env_var = env_var)
}


#' Get CRE Path
#'
#' This function retrieves the path to the CRE (Clinical Research Environment) directory.
#' It uses the "RXD_DATA" environment variable as the base directory.
#'
#' @return [character(1)] The path to the CRE directory.
#'
#' @export
get_cre_path <- function() {
get_base_dir(env_var = "RXD_DATA")
}


#' Load Data Files
#'
#' This function loads data files from a specified directory or the current working directory.
#' It supports loading both RDS and SAS7BDAT files.
#'
#' @param sub_dir [character(1)] Optional character string specifying a subdirectory. Default is NULL.
#' @param file_names [character(1+)] Character vector of file names to load (without extension).
#' @param use_wd [logical(1)] Logical indicating whether to use the current working directory. Default is FALSE.
#' @param prefer_sas [logical(1)] Logical indicating whether to prefer SAS7BDAT files over RDS. Default is FALSE.
#' @param env_var [character(1)] The environment variable name for the base directory. Default is "RXD_DATA".
#' @param print_file_paths [logical(1)] Logical indicating whether to print the directory path and file names.
#' Default is FALSE.
#'
#' @return A named list of data frames, where each name corresponds to a loaded file.
#'
#' @examples
#' \dontrun{
#' test_data_path <- "../inst/extdata/"
#' data_list <- load_data(
#' sub_dir = test_data_path,
#' file_names = "dummyads2",
#' use_wd = TRUE
#' )
#' }
load_data <- function(sub_dir = NULL, file_names, use_wd = FALSE, prefer_sas = FALSE) {
if (is.null(file_names)) {
stop("Usage: load_data: file_names: Must supply at least one file name")
#' # Get the current value of the RXD_DATA environment variable
#' base_dir <- Sys.getenv("RXD_DATA")
#'
#' # Set the RXD_DATA environment variable to the path of the haven package
#' Sys.setenv(RXD_DATA = find.package("haven"))
#'
#' data_list <- load_data(sub_dir = "examples", file_names = c("iris.sas7bdat"))
#' str(data_list)
#'
#' # Reset the RXD_DATA environment variable to its original value
#' Sys.setenv(RXD_DATA = base_dir)
#'
#' @export
load_data <- function(
sub_dir = NULL,
file_names,
use_wd = FALSE,
prefer_sas = FALSE,
env_var = "RXD_DATA",
print_file_paths = FALSE) {
ml-ebs-ext marked this conversation as resolved.
Show resolved Hide resolved
# Input validation
checkmate::assert_character(sub_dir, len = 1, null.ok = TRUE)
checkmate::assert_character(file_names, min.len = 1)
checkmate::assert_logical(use_wd, len = 1)
checkmate::assert_logical(prefer_sas, len = 1)
checkmate::assert_character(env_var, len = 1)

# Determine the base directory
if (use_wd) {
base_dir <- getwd()
} else {
base_dir <- get_base_dir(env_var = env_var)
}

study_path <- "" # will be built using args
# Construct the full directory path
dir_path <- if (is.null(sub_dir)) base_dir else file.path(base_dir, sub_dir)

if (is.null(sub_dir)) {
study_path <- getwd()
} else {
if (use_wd) {
study_path <- file.path(getwd(), sub_dir)
} else {
study_path <- file.path(get_cre_path(), sub_dir)
}
# Determine the file extension based on preference
file_ext <- if (prefer_sas) "sas7bdat" else "rds"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is not used anywhere. I think it can be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've removed the unused file_ext variable since it's now handled in the get_file_paths() function.

# Get the full file paths
file_paths <- get_file_paths(dir_path = dir_path, file_names = file_names, prefer_sas = prefer_sas)

# Print the directory path and file names if requested
if (isTRUE(print_file_paths)) {
cat("Loading data from", dir_path, "\n")
cat("Loading data file(s):", basename(file_paths), "\n")
}

# create the output
data_list <- create_data_list(study_path, file_names, prefer_sas) # nolint
# Load the data files
data_list <- load_data_files(file_paths)

names(data_list) <- file_names

return(data_list)
}
Loading
Loading