-
Notifications
You must be signed in to change notification settings - Fork 1
/
rgeo_workshop.Rmd
671 lines (441 loc) · 35.5 KB
/
rgeo_workshop.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
---
title: "Introduction to geospatial data analysis in R"
author: "Philippe Marchand, Université du Québec en Abitibi-Témiscamingue"
date: August 18, 2019
output:
html_document:
theme: united
toc: true
toc_float: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, collapse = TRUE)
```
---
This workshop provides an overview of tools available in R for the analysis of geolocated data. This type of data is becoming increasingly common in various fields (e.g. aerial photos, satellite imagery, census data, geolocated posts on social networks, etc.). There are two broad categories of geospatial data:
* raster data represent variables defined at every point of a grid covering the full extent of the data (as in satellite images);
* vector data associate variables (sometimes also called attributes) to discrete geometrical objects located in space (such as the position of cities and highways on a road map).
At first, using programming commands to process geographical data can seem less intuitive, compared with the graphical user interface of GIS software. Here are a few advantages of scripted analyses:
* It is easy to repeat the analysis for new data by re-running the script.
* It is easy for other researchers to reproduce the methods if they have access to the same programming language.
* When using R specifically, the spatial data can be extracted and merged with other datasets for statistical analyses in a single programming environment.
## Objectives
* Become familiar with the main packages for processing and simple visualization of vector and raster data in R (the ***sf*** and ***stars*** packages, respectively).
* Perform common data transformation operations using functions from those packages.
* Create more complex static maps (with ***ggplot2***) and interactive maps (with ***mapview***).
### Note on packages
The set of packages available for spatial analysis in R has evolved rapidly. A few years ago, the ***sp*** and ***raster*** packages were the main tools for vector and raster data processing, respectively. ***sf*** and ***stars*** are part of a recent initiative to overhaul R spatial tools ([https://www.r-spatial.org/](https://www.r-spatial.org/)). The ***sf*** package represents spatial data frames with a standard format based on open-source geodatabases, and integrates well with popular R packages for data manipulation and visualization (such as ***dplyr*** and ***ggplot2***). The ***stars*** package is newer but already shows a few improvements over ***raster***. A [previous version](https://pmarchand1.github.io/atelier_rgeo/rgeo_workshop.html) of this workshop used ***raster*** instead of ***stars***.
## Contents
* [Explore a vector dataset](#vect)
* [Coordinate reference systems and transformations](#crs)
* [Customize maps with ggplot2](#geomsf)
* [Geometric operations on vector data](#vectop)
* [Raster datasets](#rast)
* [Interactive maps with *mapview*](#mapview)
* [Additional references](#ref)
* [Data sources](#data)
* [Exercise solutions](#sol)
---
## Explore a vector dataset {#vect}
All datasets used in this workshop can be found in the *data* folder. The *mrc* dataset contains information on Québec regional county municipalities (MRCs) in a *ESRI shapefile* format. Note that a single shapefile dataset is spread across multiple files, which share a name but differ in their file extension (*mrc.dbf*, *mrc.prj*, *mrc.shp* and *mrc.shx*).
To load this dataset in R, we call the `st_read` function from ***sf***. (All ***sf*** package functions start with the prefix `st_`, standing for spatiotemporal.) The first argument to `st_read` is a path to the data; here it is sufficient to specify the *.shp* file name. The argument `stringsAsFactors = FALSE` prevents R from converting character variables to factors in the resulting data frame.
```{r read_mrc}
library(sf)
mrc <- st_read("data/mrc.shp", stringsAsFactors = FALSE)
```
The output text indicates the main properties of the loaded dataset, including the geometric type (MULTIPOLYGON), the spatial extent of the data (*bbox*) and the coordinate reference system (CRS) in use. That CRS is described in two formats: a EPSG code and a *proj4string* character string. We will discuss coordinate systems in the next section. For now, we note that "+proj=longlat" in the *proj4string* means that the data is in longitude and latitude coordinates (with units of decimal degrees).
The bounding box and CRS can be accessed separately using the `st_bbox` and `st_crs` functions:
```{r prop_mrc}
st_bbox(mrc)
st_crs(mrc)
```
Let us look at the first few rows of the data:
```{r class_head}
class(mrc)
head(mrc)
```
An `sf` object is a specialized `data.frame` where each line contains data associated with a geometry element (or *simple feature*, hence the package name), which is described in the `geometry` column. The most common feature types are:
* POINT: Coordinates (*x*, *y*) of a point.
* LINESTRING: Sequence of points connected by length segments.
* POLYGON: Sequence of points creating a closed simple polygon.
* MULTIPOINT, MULTILINESTRING ou MULTIPOLYGON: Dataset where each feature can be composed of multiple points, linestrings or polygons.
The `plot` function applied to an `sf` object creates a map for each field in the dataset.
```{r plot_sf}
plot(mrc)
```
To plot a single variable, you need to select the corresponding column. To show a map without data variables, you can select the *geometry* column. The `axes = TRUE` parameter tells R to show coordinate axes.
```{r plot_geom}
plot(mrc["geometry"], axes = TRUE)
```
You can select a subset of rows or columns of an `sf` object just like a regular data frame.
```{r sf_subset}
# Select the 5th row
mrc[5, ]
# Select the MRC name and population columns for MRCs with a population over 200,000
mrc[mrc$pop2016 > 200000, c("mrc_name", "pop2016")]
```
Note that the column containing the spatial information (*geometry*) is always retained, even if not explicitly selected. To discard that column and convert the `sf` object to a regular (non-spatial) data frame, you can use the function `st_drop_geometry`.
### Exercise 1 {#retour1}
Select the MRCs in the Bas-St-Laurent (*reg_id*: 01) and Gaspesie (*reg_id*: 11) regions, then display their 2016 population on a map.
*Hint*: The operator `%in%` can check if a variable has one value within a set, for example `x %in% c(1, 3)` returns TRUE if *x* is equal to 1 or 3.
[Solution](#sol1)
### Integration with the dplyr package
The data manipulation functions from the ***dplyr*** package work on `sf` objects as well. For example, we could rewrite the example above (select the name and population of MRCs with populations over 200,000) using `filter` and `select`.
```{r sf_dplyr}
library(dplyr)
filter(mrc, pop2016 > 200000) %>%
select(mrc_name, pop2016)
```
When performing a grouped summary, the individual features are also aggregated in a single feature by group. For example, let us aggregate the MRCs and their population by region.
```{r sf_groupby}
regions <- group_by(mrc, reg_name) %>%
summarize(pop2016 = sum(pop2016))
head(regions)
plot(regions["pop2016"])
```
### Create a spatial object from a data frame
The `plots.csv` file contains data from forest inventory plots of the Québec Department of Forests, Wildlife and Parks (MFFP), including the plot ID, latitude and longitude, survey date, cover type (deciduous, mixed or coniferous) and canopy height class.
```{r load_plots}
plots <- read.csv("data/plots.csv", stringsAsFactors = FALSE)
head(plots)
```
We can convert this data to an `sf` object with `st_as_sf`. The `coords` argument specifies which columns hold the X and Y coordinates, while the `crs` argument defines the coordinate reference system (here, it is set to the same CRS as the MRC dataset).
```{r df_to_sf}
plots <- st_as_sf(plots, coords = c("long", "lat"), crs = st_crs(mrc))
plot(plots["geometry"])
```
### Review
* Spatial vector data associate data fields to localized geometric features such as points, lines and polygons. The ***sf*** package can be used to work with those datasets in R.
* To read a vector dataset: `st_read`.
* To convert a regular `data.frame` into a spatial object: `st_as_sf`.
* All basic `data.frame` operations, as well as ***dplyr*** package operations, also apply to `sf` objects.
* The `plot` function applied to an `sf` object displays one or many data fields on a map.
---
## Coordinate reference systems and transformations {#crs}
Until now, we worked with data using a geographic coordinate system, with positions described as degrees of longitude and latitude. Those coordinates are based on a model that approximates the irregular surface of the Earth's mean sea level (the geoid) as an ellipsoid (a slightly flattened sphere). That model is specified as a *datum* in the CRS description. The `mrc` shapefile uses the NAD83 (North American) datum, whereas many world maps are based on the WGS84 datum.
```{r st_crs}
st_crs(mrc)
```
A projection converts geographical coordinates in cartesian or rectangular (X, Y) coordinates. Since it is impossible to provide an exact representation of a curved surface on a plane, specialized projections were developed for different regions of the world and different analytical applications.
For example, the images below show how identical circular areas appear at different points of the Earth under a Mercator projection (which preserves shapes) and a Lambert equal-area projection (which preserves areas).
![Mercator projection](images/Mercator_distortion.png) ![Lambert equal-area projection](images/Lambert_distortion.png)
We will convert the `mrc` polygons into a Lambert conical conformal projection centered on Quebec ([EPSG:6622](https://epsg.io/6622)), using `st_transform`.
```{r transform}
mrc_proj <- st_transform(mrc, crs = 6622)
st_crs(mrc_proj)
```
EPSG codes are useful to quickly specify a projection, whereas the *proj4string* is more informative, describing the type of projection (here, *lcc* stands for Lambert conical conformal) and its detailed parameters. Note that the projected coordinates are expressed in metres ("+units=m" in the *proj4string*). These coordinates are relative to a point of origin specific to the projection.
```{r plot_geom_proj}
plot(mrc_proj["geometry"], axes = TRUE)
```
To create a map with latitude and longitude lines superposed on projected data, we can specify a geographical coordinate system (here, based on the original data) to the `graticule` argument:
```{r plot_graticule}
plot(mrc_proj["geometry"], axes = TRUE, graticule = st_crs(mrc))
```
It is important to always use `st_transform` to convert datasets between different coordinate systems. A common error consists in modifying the coordinate system of a dataset (for example, with `st_crs`) without transforming the data themselves.
### Review
* Geographic coordinate systems are based on a *datum* (model of the Earth's shape) and describe the position in terms of spherical coordinates (longitude, latitude) measured in degrees.
* Projected coordinate systems convert spherical coordinates into rectangular coordinates (*x*, *y*) measured in a unit of length such as metres.
* `st_crs` returns the coordinate system of an `sf` object; `st_transform` converts the data from one coordinate system to another.
---
## Customize maps with ggplot2 {#geomsf}
While the `plot` function is useful to get an overview of a spatial dataset, other packages provide more customization options to create publication-quality maps. In this section, we will see how a widely-used R graphics package, ***ggplot2***, also supports mapping of spatial datasets.
For those not familiar with ***ggplot2***, a comprehensive introduction can be found in the [Data Visualisation](https://r4ds.had.co.nz/data-visualisation.html) chapter of *R for Data Science* by Wickham and Grolemund. Producing any type of graph with ***ggplot2*** requires a similar sequence of steps:
* Specify the dataset as well as aesthetic mappings (`aes`), which associate variables in the data to graphical elements (*x* and *y* axes, color or size scale, etc.);
* Add `geom_` layers, which specify the type of graph;
* Optionally, specify additional customization options such as axis names and limits, color themes, and more.
For example, the following code creates a bar plot (`geom_bar`) from the forest inventory plots data (`data = plots`), showing the number of forest plots by height class (*x* axis) and by cover type (different fill colors of the bars). The `labs` function defines custom labels for the title, axes and legend.
```{r ggplot_bar}
library(ggplot2)
ggplot(data = plots, aes(x = height_cls, fill = cover_type)) +
geom_bar() +
labs(title = "Forest inventory plots", x = "Height class",
y = "Count", fill = "Cover type")
```
When plotting a vector dataset from an `sf` object, we use the `geom_sf` layer to display the spatial features on a map. It is not necessary to specify *x* and *y* mappings in `aes`, since these are defined by the `sf` object itself. The graticule lines are also automatically drawn.
```{r geom_sf}
ggplot(data = mrc_proj) +
geom_sf()
```
To add multiple spatial layers to the same map, we simply add more `geom_sf` layers, which can be based on different datasets (specifying the `data` argument in each `geom`). In the code below, we add a point layer for the forest inventory plots, assigning the color aesthetic to cover type. We also change from the default grey theme to the black and white theme, `theme_bw`.
```{r geom_sf_mult}
ggplot() +
geom_sf(data = mrc_proj) +
geom_sf(data = plots, aes(color = cover_type), size = 1) +
theme_bw()
```
When a graphical element is set to a constant outside of the `aes` function, it is applied to the whole layer; therefore `size = 1` means that all points will be of size 1.
Notice that the two spatial datasets plotted above use different CRS:
```{r}
st_crs(mrc_proj)
st_crs(plots)
```
In this case, ***ggplot2*** automatically transforms all layers to the same CRS before plotting; by default, it is the CRS of the first dataset, but a different CRS can be specified with the `coord_sf` function.
```{r ggplot_crs}
ggplot(data = plots) +
geom_sf() +
coord_sf(crs = 6622)
```
In addition, `coord_sf` can be used to set coordinate axis limits and zoom in a portion of the map.
```{r coord_sf}
ggplot(data = regions) +
geom_sf(aes(fill = pop2016)) +
geom_sf_label(aes(label = reg_name)) +
coord_sf(xlim = c(-75, -70), ylim = c(45, 47))
```
This last example introduced a new geom `geom_sf_label`, which adds a text label to each feature based on a value defined by the `label` aesthetic. The geom `geom_sf_text` works the same way, but does not draw a white box around the text label.
### Exercise 2 {#retour2}
Create a map of the MRCs with different fill colors for each region.
[Solution](#sol2)
### Review
* A ***ggplot2*** graph starts with a call to the `ggplot()` function, followed by specific `geom` defining each graph layer, followed by optional customization functions.
* The `data` argument specifies the dataset to plot and the `aes` function associates variables in that dataset to graphical elements. These can be defined in the `ggplot` function (if they apply to all layers) or in specific `geom` layers.
* The `geom_sf` layer plots an `sf` object on a map.
* The `geom_sf_text` or `geom_sf_label` layers can be used to add textual data to each spatial feature on a map.
* The `coord_sf` function defines axis limits and the CRS to use, transforming all spatial features to that CRS. By default, the CRS of the first plotted spatial dataset is used.
### Other mapping packages
The ***tmap*** package (see [this tutorial](https://cran.r-project.org/web/packages/tmap/vignettes/tmap-getstarted.html)) is another option for producing maps in R. It was developed before ***ggplot2*** supported `sf` objects and functions with a similar layering logic.
---
## Geometric operations on vector data {#vectop}
The ***sf*** package includes a number of geometric operations for vector data, which are similar to those found in geodatabases or GIS software. These operations can be grouped into three classes:
* predicates, or tests which output TRUE or FALSE (e.g. is geometry A inside B?);
* measures, which produce a scalar quantity (e.g. length of a line, area of a polygon, distance between two geometries);
* geometry-generating functions which produce output geometries based on input (e.g. distance buffer around a point, centroid of a polygon, intersection of two lines or polygons).
In this workshop, we will present a few examples of each class. For a more detailed presentation, see Chapter 5 of the *Spatial Data Science* book listed in the [additional references](#ref).
First, we use `st_area` to calculate the area of each MRC in the original dataset.
```{r st_area}
areas <- st_area(mrc)
head(areas)
```
Note that the answer is in square metres, even though `mrc` uses geographical coordinates. Three measure functions: `st_area`, `st_length` and `st_distance` implement geodetic measures which take into account the curvature of the Earth. This is not the case for other operations, as we will see below.
To make it easier to read the results, we can convert them to a different unit.
```{r units_areas}
units(areas) <- "km^2"
head(areas)
```
As an example of a spatial predicate, let us now find where the points in `plots` and the polygons in `mrc` *intersect*, i.e. which individual features have points in common.
```{r st_intersects}
inters <- st_intersects(plots, mrc)
inters[1:3] # look at the first 3 elements in the output
```
When comparing two spatial objects with a predicate like `st_intersects`, the result is a list of the same length as the first object (here, `plots`). Each element of that list contains the indices of the features in the second object for which the predicate is true. In this example, the first list element (`[[1]]`) indicates that plot 1 intersects with MRC 94, the third element indicates that plot 3 intersects with MRC 53, etc. Here each plot intersects with a single MRC, but in general an element of the intersection could be empty (if that feature in the first object has no intersection with the second object) or contain many indices (if that feature overlaps with multiple ones in object 2).
The warning text:
"although coordinates are longitude/latitude, st_intersects assumes that they are planar",
indicates that this function treats geographical coordinates as if they were X-Y coordinates on a plane. In particular, the boundaries of a polygon are not the shortest lines between its vertices, since they ignore the curvature of the Earth. The difference is usually minor unless the line segments are very long, if they are near a pole or the international date line (where longitude jumps from -180 to 180 degrees).
From the results of `st_intersects` above, we could look up the indices to find the name of the MRC in which each inventory plot is located. Fortunately, there is a spatial join function `st_join` that automates this process, by appending to one `sf` object the data fields from a second `sf` object where the features intersect.
```{r st_join}
plots_mrc <- st_join(plots, mrc)
head(plots_mrc)
```
By default, `st_join` performs a "left" join, meaning that it keeps all rows in the first dataset, and adds *NA* values to the extra fields when there is no match in the second dataset. We can see this by joining the `plots` data with the subset of MRC for regions 01 and 11 (see Exercise 1).
```{r st_join_left}
mrc_01_11 <- mrc[mrc$reg_id %in% c("01", "11"), ]
plots_01_11 <- st_join(plots, mrc_01_11)
head(plots_01_11)
```
With the optional argument `left = FALSE`, we can keep only the plots located in the two target regions.
```{r st_join_inner}
mrc_01_11 <- mrc[mrc$reg_id %in% c("01", "11"), ]
plots_01_11 <- st_join(plots, mrc_01_11, left = FALSE)
ggplot() +
geom_sf(data = mrc_01_11) +
geom_sf(data = plots_01_11) +
theme_bw()
```
### Exercise 3 {#retour3}
The shapefile *data/tbe2016_gaspe.shp* contains a map of areas defoliated by the spruce budworm in the Bas-St-Laurent and Gaspesie regions in 2016. The defoliation level is represented by an integer: 1 = Light, 2 = Moderate and 3 = Severe.
a) How many forest inventory plots in these regions are affected at each defoliation level? *Hint*: The `table` function could be useful to get counts of each value in a column.
b) Plot the defoliated areas located in the MRC of Kamouraska, along with the MRC border.
[Solution](#sol3)
Finally, we consider a few geometry-generating functions. The `st_buffer` function creates a buffer at a set distance from each geometry in an object. For example, we can define a 5 km radius around each point in `plots_01_11`. This function does not work with geographical coordinates (longitude and latitude), so we first project the plots in EPSG 6622.
```{r st_buffer}
plots_proj <- st_transform(plots_01_11, crs = 6622)
plots_buffer <- st_buffer(plots_proj, dist = 5000)
ggplot() +
geom_sf(data = plots_buffer, linetype = "dotted", fill = NA) +
geom_sf(data = plots_proj) +
theme_bw()
```
*Notes*
* The buffer distance is set in the units of the CRS, in this case metres.
* If the original feature is a polygon, a negative buffer distance would create a buffer inside the polygon.
Next, we will see three functions based on set operations. If *A* and *B* are geometric features, their union is the area covered by A or B, their intersection is the area covered by both A and B, and the difference (A - B) is the area covered by A, but not B. In `sf`, these operations are implemented by `st_union`, `st_intersection` and `st_difference`. If they are applied to two `sf` objects (each of them containing multiple features in a column), then the function calculates the union, intersection or difference between all possible pairs of one feature from A and one feature from B.
When applied to a single `sf` object, the `st_union` function merges all features in that object. In the following example, `buffer_union` is a single geometric object, a multipolygon that covers all areas included in one of the single-plot buffers. The variables, or attributes, associated with individual features are lost in the merge.
```{r st_union}
buffer_union <- st_union(plots_buffer)
ggplot(buffer_union) +
geom_sf()
```
We can use `st_intersection` to extract portions of the `mrc_01_11` polygons within 5 km of a forest inventory plot, i.e. within the merged buffer created above, and use `st_difference` to extract areas of the MRC polygons outside that buffer.
```{r st_inters_diff}
mrc_01_11_proj <- st_transform(mrc_01_11, crs = 6622)
mrc_inters <- st_intersection(mrc_01_11_proj, buffer_union)
mrc_diff <- st_difference(mrc_01_11_proj, buffer_union)
ggplot(mrc_inters) +
geom_sf() +
theme_bw()
ggplot(mrc_diff) +
geom_sf() +
theme_bw()
```
Note that `st_intersection` and `st_difference` copy the data fields from the original datasets (here, only from `mrc_01_11_proj`, since `buffer_union` has no associated data). The warning ("attribute variables are assumed to be spatially constant") reminds us that those variables might not match the new geometries. For example, the *pop2016* variable in `mrc_inters` refers to the original MRC, not the portion extracted by `st_intersection`.
### Review
* The ***sf*** package includes *measure* functions for the area of polygons (`st_area`), the length of a line (`st_length`) or the distance between pairs of geometric features (`st_distance`). Those functions work with either geographic (long, lat) or projected coordinate systems.
* All other geometric operations in ***sf*** are based on planar geometry. They treat longitude and latitude as if they were perpendicular axes (*x*, *y*).
* `st_intersects(A, B)` is an example of a spatial *predicate*: for each element in *A*, the function returns the indices of elements with *B* that intersect with it.
* `st_join(A, B)` takes an `sf` object *A* and appends the data fields from a second object *B* for each case where the feature in *A* intersects with a feature in *B*. Contrary to `st_intersection` below, the geometric features themselves do not change; the result retains the features from *A*.
* `st_intersection(A, B)` produces a dataset containing all regions where features in *A* and *B* overlap.
* `st_difference(A, B)` produces a dataset containing the set differences (portion of *A* not in *B*) for each pair of features in *A* and *B*.
* `st_union(A, B)` produces a dataset containing the unions (area covered by either *A* or *B*) for each pair of features in *A* and *B*. When given a single `sf` object as input, `st_union` merges all features from that object within a single one.
* `st_buffer` produces new geometric features that buffer the input features by a given distance.
---
## Raster datasets {#rast}
The *data* folder contains a raster file covering sections 022B and 022C from the [Canadian Digital Elevation Model](https://ouvert.canada.ca/data/fr/dataset/7f245e4d-76c2-4caa-951a-45d1d2051333) or CDEM.
The CDEM is a raster dataset; the area of Canada is covered by a regular grid and the model associates an elevation value (in metres) to each pixel on that grid. This type of data is analogous to a digital image (rectangular array of pixels), to which metadata is added (resolution, spatial extent and coordinate system) so that each pixel can be associated to geographical coordinates.
The base resolution of the CDEM is 1/4800th of a degree and data are available in sections of 2 degrees of longitude by 1 degree of latitude. The data file we use in this workshop contains two sections (4 degrees of longitude by 1 degree of latitude), but was aggregated to a lower resolution (1/1200th of a degree, or 3 arc-seconds) to reduce processing time.
We first load the CDEM file with the `read_stars` function of the ***stars*** package (the name is an acronym for **s**patio**t**emporal **ar**ray**s**). This function associates the dataset to a `stars` object. Typing the object's name at the command line shows a summary of the data and metadata.
```{r read_stars}
library(stars)
cdem <- read_stars("data/cdem_022BC_3s.tif")
cdem
```
The CDEM data has one *attribute* (variable), the elevation, which ranges from 2 to 1179 m for this particular section. The *dimensions* table includes a row for each array dimension, here *x* and *y*. The *from* and *to* columns indicate the range of indices in each dimension (4801 cells in *x* by 1201 cells in *y*), the *offset* column contains the coordinates of the top-left corner of the raster (70 degrees West, 49 degrees North), the *delta* column indicates the size of each raster cell (1/1200 $\approx$ 0.000833 degree), and the *refsys* column describes the coordinate reference system.
*Notes*
* The negative *delta* for *y* means that latitude decreases from the top (north) to the bottom (south) of the raster.
* While we limit ourselves to two-dimensional rasters in this workshop, one advantage of the ***stars*** package is that it easily incorporates additional dimensions beyond the two spatial dimensions, such as time or spectral band, which are common in remote sensing data.
We can also determine the extent and coordinate system of the object with the same methods we used for `sf` objects.
```{r bbox_crs_stars}
st_bbox(cdem)
st_crs(cdem)
```
We can extract the elevation values as a regular R array with `cdem[[1]]`, which pulls the first (in this case the only) variable in the raster file. However, this removes the associated metadata.
```{r}
elev <- cdem[[1]]
str(elev)
```
### Plot a raster
The `plot` function creates a quick 2D image (or heat map) of the raster data.
```{r plot_stars}
plot(cdem)
```
We can also display a `stars` object in ***ggplot2*** with the `geom_stars` function. Because rasters can contain many more pixels than are visible at once on the screen, it is useful to apply a downsampling factor to speed up plotting. Here, `downsample = 5` means that 1 of every 5 pixels is shown.
```{r geom_stars}
ggplot() +
geom_stars(data = cdem, downsample = 5) +
geom_sf(data = mrc_01_11, color = "white", fill = NA) +
scale_fill_viridis_c() +
coord_sf(xlim = c(-70, -66), ylim = c(48, 49)) +
theme_bw()
```
The `plot` function above automatically downsampled the data according to the screen's resolution.
### Work with large raster files
For raster files that are too large to load in memory, you can use the `proxy = TRUE` argument in `read_stars`. In that case, R loads a `stars_proxy` object containing the metadata, but not the pixel values. All raster operations can be applied to `stars_proxy` objects as well, but the calculations are not actually performed until the result is plotted (in which case only a fraction of pixels are processed due to downsampling) or the object is saved to disk (with `write_stars`).
### Raster operations
To crop a rectangular section of the raster, we can use the `filter` function from ***dplyr*** along one or multiple dimensions. For example, here is the portion of the raster east of 67 degrees West.
```{r}
cdem_part <- filter(cdem, x > -67)
plot(cdem_part)
```
To crop a raster's extent along the boundaries of a `sf` object, we use the `crop` function. The following code shows the elevation of points in the MRC of La Matapedia. Note that we converted the polygon to the CRS of the raster prior to cropping.
```{r st_crop}
matap <- filter(mrc_01_11, mrc_name == "La Matapedia")
matap <- st_transform(matap, st_crs(cdem))
cdem_matap <- st_crop(cdem, matap)
plot(cdem_matap)
```
Since `stars` objects are fundamentally arrays of values, we can apply mathematical operations to each pixel just like we would for a regular array in R.
```{r raster_math}
# Convert elevation values to km
cdem_km <- cdem / 1000
plot(cdem_km)
# Display points above 500 m in elevation
cdem_500 <- cdem > 500
plot(cdem_500)
```
### Exercise 4 {#retour4}
a) Show a map of the points in the MRC of La Mitis with an elevation between 5 and 100 m.
b) What is the highest elevation in that MRC?
[Solution](#sol4)
### Extract values from raster at points
A common use of raster data is to extract values at points of interest. For example, we might want elevation values for each of the forest inventory plots in `plots_01_11`.
Since the relatively new ***stars*** package does not currently have a fast option for point extraction, we will use the ***raster*** package and its `extract` function.
```{r extract_pt}
library(raster)
cdem_r <- as(cdem, "Raster") # convert from stars to raster format
plots_elev <- extract(cdem_r, plots_01_11)
plots_01_11$elev <- plots_elev # save in a new column of the sf object
```
*Note*: The ***velox*** package provides even faster extraction of values from rasters, if the full raster file fits in memory.
### Review
* A raster dataset associates a value to each pixel in a regular grid. The ***stars*** package allows us to process this type of data in R.
* The `read_stars` function loads a raster file in R. For large files, specify `proxy = TRUE` to avoid loading the full raster in memory.
* The `stars` object can be plotted by itself with `plot`, or added to a `ggplot` with `geom_stars`.
* `filter` crops a `stars` object along the specified dimensions, whereas `st_crop` crops it within the boundaries of an `sf` object.
* Arithmetic (`+`, `-`, etc.) and comparison operators (`<`, `==`, etc.) are applied to each pixel of the `stars` object.
* The `extract` function from the ***raster*** package extracts raster values at specific points specified by an `sf` object.
---
## Interactive maps with mapview {#mapview}
The ***mapview*** package provides visualizations of `sf` and `raster` layers on an interactive map (similar to Google Maps) with different basemap options (e.g. OpenStreetMap, OpenTopoMap, ESRI World Imagery). We simply call the `mapview` function with the name of the spatial dataset. Multiple layers can be overlaid with the `+` operator.
There are a few optional arguments to control the plotting of each type of object. In the example below, we use the `zcol` argument to select the variable by which to color the points in `plots_01_11`.
```{r mapview, eval = FALSE}
library(mapview)
mapview(cdem_r) +
mapview(plots_01_11, zcol = "cover_type")
```
![Exemple mapview](images/mapview_example.png)
---
## Additional references {#ref}
* [Spatial Data Science](https://keen-swartz-3146c4.netlify.com/), a free online textbook by Edzer Pebesma and Roger Bivand that covers the ***sf*** and ***stars*** packages in more detail.
* The [R-spatial blog](https://www.r-spatial.org) presents news and tutorials on spatial packages in R. In particular, see this series ([1](https://www.r-spatial.org/r/2018/10/25/ggplot2-sf.html), [2](https://www.r-spatial.org/r/2018/10/25/ggplot2-sf-2.html), [3](https://www.r-spatial.org/r/2018/10/25/ggplot2-sf-3.html)) of posts by Mel Moreno and Mathieu Basille on mapping with *ggplot2*.
---
## Data sources {#data}
This workshop uses public data made available by Québec and Canada government agencies. The script `data_prep.R` shows the detailed steps to produce the workshop data files from the original sources.
* County regional municipality (CRM) [boundaries](https://mern.gouv.qc.ca/territoire/portrait/portrait-donnees-mille.jsp) were downloaded from the Québec Department of Energy and Natural Resources (MERN) and merged with [population data](https://www.stat.gouv.qc.ca/statistiques/population-demographie/structure/mrc-total.xlsx) from the Institut de la statistique du Québec. Column names were translated to English and diacritic marks (accents) were removed from all place names.
* Data on [forest inventory plots](https://www.donneesquebec.ca/recherche/fr/dataset/placettes-echantillons-permanentes-1970-a-aujourd-hui) and [spruce budworm defoliation maps](https://www.donneesquebec.ca/recherche/fr/dataset/donnees-sur-les-perturbations-naturelles-insecte-tordeuse-des-bourgeons-de-lepinette) from the Québec Department of Forests, Wildlife and Parks (MFFP) were downloaded from the Québec Open Data Portal. Some data fields were recoded and translated to English.
* [Canadian Digital Elevation Model](https://open.canada.ca/data/en/dataset/7f245e4d-76c2-4caa-951a-45d1d2051333) rasters from Natural Resources Canada were downloaded from the Canada Open Data Portal. Two 2x1 degree sections were merged to use in this workshop.
---
## Solutions {#sol}
### Exercise 1 {#sol1}
Select the MRCs in the Bas-St-Laurent (*reg_id*: 01) and Gaspesie (*reg_id*: 11) regions, then display their 2016 population on a map.
```{r sol1}
mrc_01_11 <- mrc[mrc$reg_id %in% c("01", "11"), ]
plot(mrc_01_11["pop2016"], axes = TRUE)
```
[Return to text](#retour1)
### Exercise 2 {#sol2}
Create a map of the MRCs with different fill colors for each region.
```{r sol2}
ggplot(data = mrc_proj, aes(fill = reg_name)) +
geom_sf()
```
[Return to text](#retour2)
### Exercise 3 {#sol3}
a) How many of the forest inventory plots in these regions are affected at each defoliation level?
```{r sol3a}
defo <- st_read("data/tbe2016_gaspe.shp")
plots_defo <- st_join(plots_01_11, defo)
table(plots_defo$level)
```
b) Plot the defoliated areas located in the MRC of Kamouraska by severity level, also displaying the MRC border.
```{r sol3b}
mrc_kam <- filter(mrc_01_11, mrc_name == "Kamouraska")
defo_kam <- st_join(defo, mrc_kam, left = FALSE)
ggplot() +
geom_sf(data = mrc_kam) +
geom_sf(data = defo_kam, color = NA, aes(fill = level))
```
[Return to text](#retour3)
### Exercise 4 {#sol4}
a) Show a map of the points in the MRC of La Mitis with an elevation between 5 and 100 m.
```{r sol4a}
mitis <- filter(mrc_01_11, mrc_name == "La Mitis")
mitis <- st_transform(mitis, st_crs(cdem))
cdem_mitis <- st_crop(cdem, mitis)
plot(cdem_mitis >= 5 & cdem_mitis <= 100)
```
b) What is the highest elevation in that MRC?
```{r sol4b}
cdem_mitis_vals <- cdem_mitis[[1]]
max(cdem_mitis_vals, na.rm = TRUE)
```
[Return to text](#retour4)