-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathpepr.Rmd
227 lines (166 loc) · 9.13 KB
/
pepr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
title: "DeGAUSS PEPR Guide"
---
<a href='https://degauss-org.github.io/DeGAUSS/'><img src='figs/degauss_hex.png' align="right" height="138.5" /></a>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)
```
```{r}
library(tidyverse)
library(knitr)
library(kableExtra)
```
This is an example of the workflow a PEPR study site might use to add geomarkers to their data with [DeGAUSS](https://degauss.org/).
If you have used DeGAUSS, would you mind providing us some feedback and completing a short [survey](https://redcap.link/gvhbxfjd)?
In steps 2 through 6:
+ A DeGAUSS container is used to add geomarker(s) to the input file (columns added in each step are highlighted in gray).
+ The input file is the CSV created in the previous step, and the output file will be the input in the next step.
# Step 0: Install Docker
See [Installing Docker](https://degauss.org/installing_docker).
> <font size="3.5"> **_Note about Docker Settings:_** </font> <br> <font size="2.75"> After installing Docker, but before running containers, go to **Docker Settings > Advanced** and change **memory** to greater than 4000 MB (or 4 GiB) <br> <center> <img width=50% src="figs/docker_settings_memory.PNG"> </center> <br> If you are using a Windows computer, also set **CPUs** to 1. <br> <center> <img width=50% src="figs/docker_settings_cpu.png"> </center> Click **Apply** and wait for Docker to restart. </font>
# Step 1: Preparing Your Input File
The input file must be a CSV file with a column called `address` containing an address string. Other columns may be present and will be returned in the output file, but should be kept to a minimum to reduce file size.
An example input CSV file (called `my_address_file.csv`) might look like:
```{r}
my_adds_geo <- read_csv("example_data/my_address_file_geocoded.csv")
sample_ids <- c("11200020024", "54000600136", "13100070229")
my_adds_geo <- filter(my_adds_geo, id %in% sample_ids)
my_adds_geo %>%
select(-(bad_address:dep_index)) %>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped")
```
Refer to the DeGAUSS [geocoding wiki](https://degauss.org/geocoding) for more information about the input file and address string formatting.
# Step 2: Geocoding and Deprivation Index
Open a shell (i.e., terminal on Mac or CMD on Windows). We will use this shell for the rest of the steps in this example.
Navigate to the directory where the CSV file to be geocoded is located. See [here](http://linuxcommand.org/lc3_lts0020.php) for help on navigating a filesystem using the command line.
For those unfamiliar with the command line, the simplest approach might be to put the file to be geocoded on the desktop and then navigate to your desktop folder after starting the Docker Quickstart Terminal with `cd Desktop`.
Example call:
```
docker run --rm=TRUE -v "$PWD":/tmp degauss/cchmc_batch_geocoder my_address_file.csv
```
Replace `my_address_file.csv` with the name of the CSV file to be geocoded and run the call in the shell.
<br>
> **_Note for Windows Users:_** <br> In this and all following docker calls in this example, replace `"$PWD"` with `"%cd%"`. Refer to the DeGAUSS [Windows Troubleshooting](https://degauss.org/windows_troubleshooting) page for more information.
The **output file** is written to the same directory and in our example, will be called `my_address_file_geocoded.csv`.
Example output:
```{r}
my_adds_geo %>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped") %>%
column_spec(3:17, bold = T, color = "white", background = "gray") %>%
scroll_box(width = "900px", height = "300px")
```
# Step 3: Roadways
> <font size="3"> **_Note:_** Prior to November 2019, this step involved running a different container called "Distance to Major Roadway". The old version added one new geomarker column, but this version adds 4 new columns. The new column called "dist_to_1100" should be very similar to the "dist_to_major_road" column produced by the old version. </font>
Example call:
```
docker run --rm=TRUE -v "$PWD":/tmp degauss/pepr_roadways:0.2 my_address_file_geocoded.csv
```
Replace `my_address_file_geocoded.csv` with the name of the geocoded CSV file created in Step 2 and run.
**_Note:_** This container could take longer than what users may be used to with other DeGAUSS containers due to the large size of the S1200 roadways shapefile. For example, using Docker to run this container for 193 geocoded addresses took 20-30 minutes on an HP laptop running Windows 10.
The **output file** is written to the same directory and, in our example, will be called `my_address_file_geocoded_pepr_roads_300m_buffer.csv`.
Example output:
```{r}
dist_road <- read_csv("example_data/my_address_file_geocoded_pepr_roads_300m_buffer.csv") %>%
filter(id %in% sample_ids)
dist_road%>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped") %>%
column_spec(18:21, bold = T, color = "white", background = "gray") %>%
scroll_box(width = "900px", height = "300px")
```
# Step 4: Drive Time and Distance to Care Center
Example call:
```
docker run --rm -v "$PWD":/tmp degauss/pepr_drivetime:0.6 my_address_file_geocoded_pepr_roads_300m_buffer.csv cchmc
```
Replace `my_address_file_geocoded_pepr_roads_300m_buffer.csv` with the name of the CSV file created in Step 3, and replace `cchmc` with the abbrevation for your care center from this list:
| **Name** | **Abbreviation** |
|--------------------|-------------------|
Children's Hospital of Philadelphia | `chop`
Riley Hospital for Children, Indiana University | `riley`
Seattle Children's Hospital | `seattle`
Children's Mercy Hospital | `mercy`
Emory University | `emory`
Johns Hopkins University | `jhu`
Cleveland Clinic | `cc`
Levine Children's | `levine`
St. Louis Children's Hospital | `stl`
Oregon Health and Science University | `ohsu`
University of Michigan Health System | `umich`
Children's Hospital of Alabama | `al`
Cincinnati Children's Hospital Medical Center | `cchmc`
Nationwide Children's Hospital | `nat`
University of California, Los Angeles | `ucla`
Boston Children's Hospital | `bch`
Medical College of Wisconsin | `mcw`
St. Jude's Children's Hospital | `stj`
Martha Eliot Health Center | `mehc`
Ann & Lurie Children's / Northwestern | `nwu`
Lurie Children's Center in Northbrook | `lccn`
Lurie Children's Center in Lincoln Park | `lcclp`
Lurie Children's Center in Uptown | `lccu`
Dr. Lio's and Dr. Aggarwal's Clinics | `lac`
Recruited from Eczema Expo 2018 | `expo`
The **output file** is written to the same directory and in our example, will be called `my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc.csv`.
Example output:
```{r}
drivetime <- read_csv("example_data/my_address_file_geocoded_pepr_drivetime_cchmc.csv") %>%
filter(id %in% sample_ids) %>%
select(id, drive_time, distance)
drivetime <- dist_road %>%
right_join(drivetime, by="id")
drivetime %>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped") %>%
column_spec(22:23, bold = T, color = "white", background = "gray") %>%
scroll_box(width = "900px", height = "300px")
```
# Step 5: Greenspace
Example call:
```
docker run --rm -v "$PWD":/tmp degauss/pepr_greenspace:0.1 my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc.csv
```
Replace `my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc.csv` with the name of the CSV file created in Step 4 and run.
The **output file** is written to the same directory and in our example, will be called `my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc_pepr_greenspace.csv`.
Example output:
```{r}
greenspace <- read_csv("example_data/my_address_file_geocoded_pepr_greenspace.csv") %>%
filter(id %in% sample_ids) %>%
select(id, evi_500:evi_2500)
greenspace <- drivetime %>%
right_join(greenspace, by="id")
greenspace %>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped") %>%
column_spec(24:26, bold = T, color = "white", background = "gray") %>%
scroll_box(width = "900px", height = "300px")
```
# Step 6: Crime
Example call:
```
docker run --rm -v "$PWD":/tmp degauss/pepr_crime:0.1 my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc_pepr_greenspace.csv
```
Replace `my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc_pepr_greenspace.csv` with the name of the CSV file created in Step 5 and run.
The **output file** is written to the same directory and in our example, will be called `my_address_file_geocoded_pepr_roads_300m_buffer_pepr_drivetime_cchmc_pepr_greenspace_pepr_crime.csv`.
Example output:
```{r}
crime <- read_csv("example_data/my_address_file_geocoded_pepr_crime.csv") %>%
filter(id %in% sample_ids) %>%
select(id, fips_block_group_id:motor_vehicle_theft)
crime <- greenspace %>%
right_join(crime, by="id")
crime %>%
kable() %>%
kable_styling(full_width = FALSE, bootstrap_options = "striped") %>%
column_spec(27:37, bold = T, color = "white", background = "gray") %>%
scroll_box(width = "900px", height = "300px")
```
# Step 7: Removing PHI
Before sharing your data, remove the following columns:
+ `address`
+ `lat`
+ `lon`
+ `fips_tract_id`
+ `fips_block_group_id`