-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
112 lines (83 loc) · 3.3 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/"
)
options(digits = 4, width = 120)
```
# datadict: Data dictionary tools for the OCA data sharing platform
<!-- badges: start -->
[![Lifecycle: experimental](https://img.shields.io/badge/lifecycle-experimental-orange.svg)](https://www.tidyverse.org/lifecycle/#experimental)
[![R-CMD-check](https://github.com/epicentre-msf/datadict/workflows/R-CMD-check/badge.svg)](https://github.com/epicentre-msf/datadict/actions)
[![Codecov test coverage](https://codecov.io/gh/epicentre-msf/datadict/branch/main/graph/badge.svg)](https://codecov.io/gh/epicentre-msf/datadict?branch=main)
<!-- badges: end -->
### Installation
Install from GitHub with:
```{r, eval=FALSE}
# install.packages("remotes")
remotes::install_github("epicentre-msf/datadict")
```
### Example usage
#### Generate data dictionary from ODK template
The `dict_from_odk()` function can be used to generate an OCA-style data
dictionary from an ODK template (both the 'survey' and 'options' sheets of the
ODK template are required as inputs).
```{r}
library(datadict)
library(readxl)
# path to example ODK template (a WHO mortality survey)
path_data <- system.file("extdata", package = "datadict")
path_odk_template <- file.path(path_data, "WHOVA2016_v1_5_3_ODK.xlsx")
# read 'survey' sheet and 'choices' sheet
odk_survey <- readxl::read_xlsx(path_odk_template, sheet = "survey")
odk_choices <- readxl::read_xlsx(path_odk_template, sheet = "choices")
# derive OCA-style data dictionary
dict <- dict_from_odk(odk_survey, odk_choices)
# examine first few rows/cols
dict[1:5,1:4]
```
#### Generate data dictionary from REDCap template
The `dict_from_redcap()` function can be used to generate an OCA-style data
dictionary from a REDCap data dictionary. The input dictionary can be exported
directly from a REDCap project website or fetched via the API using e.g. the R
package [redcap](https://github.com/epicentre-msf/redcap).
```{r}
# path to example REDCap template
path_data <- system.file("extdata", package = "datadict")
path_redcap_dict <- file.path(path_data, "REDCapDataDictionaryDemo.csv")
# read dictionary
redcap_dict <- read.csv(path_redcap_dict)
# derive OCA-style data dictionary
dict <- dict_from_redcap(redcap_dict)
# examine first few rows/cols
dict[1:5,1:5]
```
#### Generate data dictionary template from a dataset
The `dict_from_data()` function can be used to generate a template OCA-style
data dictionary (which may require further processing) from a dataset. Data types
are based on the class of each column within in the input dataset, e.g.:
| Column class in R | Dictionary data type |
| ------------------|----------------------|
| Date | Date |
| POSIX | Datetime |
| logical | Logical |
| integer | Numeric |
| numeric | Numeric |
| factor | Coded list |
| character | Coded list or Free text (see argument `factor_threshold`) |
```{r}
# path to example dataset
path_data <- system.file("extdata", package = "datadict")
path_linelist <- file.path(path_data, "linelist_cleaned.xlsx")
# read data
dat <- readxl::read_xlsx(path_linelist)
# derive OCA-style data dictionary template
dict <- dict_from_data(dat)
# examine first few rows/cols
dict[1:7,1:5]
```