-
Notifications
You must be signed in to change notification settings - Fork 0
/
create-map.Rmd
233 lines (166 loc) · 10.4 KB
/
create-map.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: Create simple maps in R with ggplot and sf
author: Amanda Devine
date: 29 April 2020
output:
html_notebook:
toc: true
toc_depth: 2
toc_float: true
collapsed: false
theme: cosmo
---
This R Notebook demonstrates how to work with the R packages `sf` and `ggplot2` to create basic maps. For this demo, we will create two maps showing the distribution of yellowjacket occurrences across the contiguous 48 United States.
Disclaimer: I'm still very new to working with geospatial data. Take this demo with a grain of salt, and consult more detailed tutorials and GIS courses to get a more thorough udnerstanding of principles. My aim with this demo is to demonstrate two common tasks I've encountered working with museum data: plotting latitude/longitude points on a map, and creating heatmaps to show how counts vary across states or countries.
## Import packages
```{r}
library(tidyverse)
library(data.table) # Read/write large tabular data files
library(sf) # Classes and functions for vector mapping data
```
## Geographic data and R objects
For this demo, we will be working with two geographic datasets: a shapefile containing data on the US states, and a CSV containing georeferenced occurrence data for yellowjackets in the United States.
A **shapefile** is a geospatial vector data format commonly used with GIS software (like ArcGIS). For this demo, we can think of it as a set of related files containing data on set of spatial features (geographic places) and their associated geometrical representations (points, lines, polygons).
There are different packages and different R objects for working with geospatial data. We will be using a package called `sf`, which creates a type of R object called a **simple feature** object. This object acts much like a data frame, where every row contains data on a geographic place (or **feature**). In addition, each row has an associated **geometry**, which contains the geometric representation of that place on a map (as a point, line, or polygon). Simple feature objects can be read into R from existing geospatial data sources (like shapefiles), or created from other file types containing geospatial data (like CSV files).
To plot our maps, we will use the `ggplot2`package and its built-in geom `geom_sf()`. This geom is designed to plot sf object data.
### Coordinate Reference Systems (CRS)
**Coordinate reference systems (CRS)** are systems of equations and visual representations cartographers use to represent the 3D spherical Earth as 2D images. For this demo, we will use a CRS called **WGS 84**. We need to make sure that all our spatial data are using the WGS 84 CRS, or else our different datasets will not align properly.
WGS 84 is represented with the EPSG code 4326.
```{r}
wgs84 <- 4326 # WGS 84, used in GPS. https://epsg.io/4326
```
For a good explanation of CRSes:
- https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/intro-to-coordinate-reference-systems/
- https://www.earthdatascience.org/courses/earth-analytics/spatial-data-r/geographic-vs-projected-coordinate-reference-systems-UTM/
## United States state geographic data
The US Census Bureau publishes shapefiles containing spatial data for different United States regions. These shapefiles are available here: https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html
For this demo, we will be using the file **cb_2018_us_state_20m.zip**, which contains the names and low-resolution geographic boundaries for the US states and territories.
This file was unzipped and placed in a folder called `data`. The shapefile itself uses the file extension `.shp`, but all the files in the zip archive are required in order to successfully access the data.
### Read shapefile data as sf object
We can use the sf function `st_read()` to read the data in as an sf object.
```{r}
us_states <- st_read('data/cb_2018_us_state_20m.shp')
```
When we read the shapefile, we see that the current CRS (+datum) is NAD83, which is a projected coordinate system that centers on North America. We will convert the `us_states` data to the WGS 84 instead.
```{r}
us_states <- st_transform(us_states, crs = wgs84)
```
### Plot state data
We can use ggplot2 to do a quick plot to see how our state data are looking. When we plot sf data, we will use a geom called `geom_sf()`, which plots the geometry for each row in our data.
```{r}
ggplot()+ geom_sf(data = us_states)
```
For this demo, we will focus on just the contiguous 48 states. We can simply filter our sf object as we would a normal data frame.
```{r}
non_contiguous <- c('Alaska', 'Hawaii', 'Puerto Rico')
us_states_con48 <- filter(us_states, !(NAME %in% non_contiguous))
```
We can take a look again to see our filtered US states data.
```{r}
ggplot()+ geom_sf(data = us_states_con48)
```
We can also create an object containing some theme settings for our map. We can then include this object for future ggplot calls.
```{r}
map_theme <- theme_bw() +
theme(axis.line = element_blank(), axis.text.x = element_blank(),
axis.text.y = element_blank(), axis.ticks = element_blank(),
axis.title.x = element_blank(), axis.title.y = element_blank(),
panel.border = element_blank(), panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
text = element_text(size = 16), legend.text = element_text(size = 16),
legend.title = element_text(face = "bold"))
```
```{r}
ggplot() +
geom_sf(data = us_states_con48) +
map_theme
```
## GBIF Occurrence data
Our data on yellowjacket occurrences were downloaded from the Global Biodiversity Information Facility (GBIF). Each occurrence record contains taxonomic information, occurrence type information, and a georeferenced coordinate.
> Dataset citation: GBIF.org (29 April 2020) GBIF Occurrence Download https://doi.org/10.15468/dl.44t6at
### Import GBIF data
We can import the GBIF data as a standard data frame. I like to use the `data.table` package (and its `fread()` function), which works really well for parsing large files. I will also filter out records where `decimalLatitude` or `decimalLongitude` were not supplied.
```{r}
gbif <- fread('data/gbif_yellowjackets.csv') %>%
filter(!is.na(decimalLatitude),
!is.na(decimalLongitude))
```
### Convert GBIF data to sf
We next need to convert this data frame to a geospatial `sf` object. The `st_as_sf()` function requires that the coordinate columns be specified (x = longitude, y = latitude). The CRS can also be specified. Most GBIF data is assumed to use the WGS 84 coordinate system.
```{r}
gbif_sf <- st_as_sf(gbif,
coords = c("decimalLongitude", "decimalLatitude"),
crs = wgs84,
remove = FALSE) # retains original lat/long columns
gbif_sf
```
### Plot GBIF data
We can now plot our GBIF data. We'll use two `geom_sf()` calls - the first will plot our US state data, and then second will plot our GBIF data on top of it.
```{r}
ggplot() +
geom_sf(data = us_states_con48) +
geom_sf(data = gbif_sf) +
map_theme
```
We can see some data points that seem to be occurring in Alaska and Hawaii. Since we're limiting our map to the contiguous 48 states, we'd like to remove these data points.
## Geospatially join sf objects to filter features
We can use the `us_states_48con` sf object to filter our GBIF data, returning only the GBIF points that occur within one of the contiguous 48 states. We can do this with a function called `st_join()`, which joins the data from two sf objects according to their geometries and retains the geometry for the first sf object. Here, we take each GBIF point and join the corresponding US state data in which it occurs. `left = FALSE` specifies that we are performing an inner join, so any point that does not find a corresponding state is removed.
```{r}
gbif_con48 <- st_join(gbif_sf, us_states_con48, left = FALSE)
```
```{r}
ggplot() +
geom_sf(data = us_states_con48) +
geom_sf(data = gbif_con48) +
map_theme
```
## Aesthetic mapping
Like in any ggplot figure, we can use aesthetic mapping to provide additional information. For example, we could color our GBIF data by species of yellowjacket.
```{r}
species_map <- ggplot() +
geom_sf(data = us_states_con48, fill = "white") +
geom_sf(data = gbif_con48,
mapping = aes(color = species),
alpha = 0.4,
size = 2,
show.legend = FALSE) +
map_theme +
labs(title = 'Yellowjacket records by species',
subtitle = "GBIF data for Vespula and Dolichovespula, April 2020")
species_map
```
## Create heatmap
We can also create heatmaps (or choropleth maps) of our states, coloring each state according to one of its corresponding counts. For example, we might be interested in the number of occurrences found in each state.
To do this, we can again join our US state and GBIF sf objects. This time, we will join them in the reverse order, retaining the geographic information for our states and attaching GBIF data for each corresponding record. Then we can group and summarize (as we would with a normal data frame) to get the occurrence counts for each area.
I created two summary columns - records (the count of records), and the square root of record count, which we can use for plotting.
```{r}
state_records <- st_join(us_states_con48, gbif_sf) %>%
group_by(STUSPS) %>%
summarize(records = n(),
records_sqrt = sqrt(records))
```
This time when we plot our map, we will only use one `geom_sf()` call, because we only need to plot our state data. We will also call `geom_sf_text()`, which will allow us to put labels on each of our states.
```{r}
state_heatmap <- ggplot(state_records, aes(fill = records_sqrt, label = records)) +
geom_sf(show.legend = FALSE) +
geom_sf_text(size = 4) +
map_theme +
scale_fill_gradient(low = 'white', high = 'goldenrod1') +
labs(title = "Yellowjacket records by US state",
subtitle = "GBIF data for Vespula and Dolichovespula, April 2020")
state_heatmap
```
## Save maps as files
Once we are happy with our maps, we can save them as image files with ggplot's `ggsave()`.
```{r}
ggsave(
'figures/records_species_map.png',
plot = species_map,
width = 12, height = 7, units = 'in')
```
```{r}
ggsave(
'figures/state_counts_map.png',
plot = state_heatmap,
width = 12, height = 7, units = 'in')
```