From f72e44e07f786d402237d73bc52b183e1bf6b058 Mon Sep 17 00:00:00 2001 From: Robin Lovelace Date: Sat, 28 Sep 2024 20:10:51 +0100 Subject: [PATCH] 1115 proofreading changes 3 (#1129) * Be more specific and accurate: raster and vector datasets *used* in many fields * as it -> , as it * 3-d -> three-d* * -0.1 -> $-0.1$ * (the Prime M...) -> (Prime M...) * peninsular -> peninsula * low level libraries -> low-level libraries * Fix-up wording about importance of GDAL and friends * Tweaks * 2-d -> two-d * Add missing comma * in Sec -> Sec * Remove hyperlink from the * Time of writing, * Update at the time of writing * re-iterate -> reiterate * CRSs abbreviation * A spatial -> Spatial * Shorten description of sticky * CRSs * this reason -> this reason, * Neither has * Fix grammar, mention border arg of plot() * Basic maps * A plot -> Plot * Base plot limits * Shift index entries * 7 -> seven * two-, three-, or four-dimensional space * From scratch, * Tweaks to 2.2.5 * Update hyperlink * Examples, the * Assorted typo fixes * Assorted changes * multilayered -> multi-layered and more * the Zion -> Zion * Mention fig:basic-new-raster-plot in the text * multilayer -> multi-layer * Tweaks to 2.4 * Spell-out UTM, finish c2 * Avoid superscript * , but * Italics * Boolean * 3 -> three * Continent * Complete fixes for c3 * Start on c4 * Fixes for c4 * Update 02-spatial-data.Rmd Co-authored-by: Jakub Nowosad * Assorted changes * Assorted changes to c4 * Tweak NDVI bit * Clarify na.rm argument in c4 * Fixes for c4 --------- Co-authored-by: Jakub Nowosad --- 01-introduction.Rmd | 6 +- 02-spatial-data.Rmd | 169 ++++++++++++------------ 03-attribute-operations.Rmd | 58 ++++---- 04-spatial-operations.Rmd | 105 +++++++-------- 05-geometry-operations.Rmd | 4 +- 06-raster-vector.Rmd | 6 +- 07-reproj.Rmd | 4 +- 08-read-write-plot.Rmd | 8 +- 09-mapping.Rmd | 4 +- 11-algorithms.Rmd | 6 +- 13-transport.Rmd | 8 +- 15-eco.Rmd | 6 +- 16-synthesis.Rmd | 2 +- README.Rmd | 4 +- README.md | 2 +- _03-ex.Rmd | 8 +- _04-ex.Rmd | 2 +- _15-ex.Rmd | 2 +- code/chapters/02-spatial-data.R | 12 +- code/chapters/03-attribute-operations.R | 8 +- code/chapters/04-spatial-operations.R | 14 +- code/chapters/05-geometry-operations.R | 2 +- code/chapters/06-raster-vector.R | 4 +- code/chapters/09-mapping.R | 8 +- code/chapters/11-algorithms.R | 2 +- code/chapters/15-eco.R | 2 +- code/chapters/_15-ex.R | 2 +- index.Rmd | 2 +- 28 files changed, 231 insertions(+), 229 deletions(-) diff --git a/01-introduction.Rmd b/01-introduction.Rmd index 201d6eb10..b992cd6f3 100644 --- a/01-introduction.Rmd +++ b/01-introduction.Rmd @@ -249,7 +249,7 @@ There are many ways to handle geographic data in R, with dozens of packages\inde An overview of R's spatial ecosystem can be found in the CRAN\index{CRAN} Task View on the Analysis of Spatial Data (see https://cran.r-project.org/view=Spatial). ] -In this book we endeavor to teach the state-of-the-art in the field whilst ensuring that the methods are future-proof. +In this book, we endeavor to teach the state-of-the-art in the field whilst ensuring that the methods are future-proof. Like many areas of software development, R's spatial ecosystem is rapidly evolving (Figure \@ref(fig:cranlogs)). Because R is open source, these developments can easily build on previous work, by 'standing on the shoulders of giants', as Isaac Newton put it in [1675](https://digitallibrary.hsp.org/index.php/Detail/objects/9792). This approach is advantageous because it encourages collaboration and avoids 'reinventing the wheel'. @@ -312,7 +312,7 @@ R's spatial capabilities have evolved substantially since then, but they still b Interfaces to GDAL\index{GDAL} and PROJ\index{PROJ}, for example, still power R's high-performance geographic data I/O and CRS\index{CRS} transformation capabilities, as outlined in Chapters \@ref(reproj-geo-data) and \@ref(read-write), respectively. **rgdal**, released in 2003, provided GDAL\index{GDAL} bindings for R which greatly enhanced its ability to import data from previously unavailable geographic data formats. -The initial release supported only raster drivers, but subsequent enhancements provided support for coordinate reference systems (via the PROJ library), reprojections and import of vector file formats. +The initial release supported only raster drivers, but subsequent enhancements provided support for CRSs (via the PROJ library), reprojections and import of vector file formats. Many of these additional capabilities were developed by Barry Rowlingson and released in the **rgdal** codebase in 2006, as described in @rowlingson_rasp:_2003 and the [R-help](https://stat.ethz.ch/pipermail/r-help/2003-January/028413.html) email list. The **sp** package, released in 2005, was a significant advancement in R's spatial capabilities. @@ -367,7 +367,7 @@ Additional ways of representing and working with geographic data in R since 2018 \index{lidR (package)} Such developments have been motivated by the emergence of new technologies, standards and software outside of the R environment [@bivand_progress_2021]. -Major updates to the PROJ library\index{PROJ} beginning in 2018 forced the replacement of 'proj-string' representations of coordinate reference systems with 'Well Known Text', as described in Section \@ref(crs-intro) and Chapter \@ref(reproj-geo-data). +Major updates to the PROJ library\index{PROJ} beginning in 2018 forced the replacement of 'proj-string' representations of CRSs with 'Well Known Text', as described in Section \@ref(crs-intro) and Chapter \@ref(reproj-geo-data). \index{rayshader (package)} Since the publication of the first version of Geocomputation with R in 2018, several packages for spatial data visualization have been developed and improved. diff --git a/02-spatial-data.Rmd b/02-spatial-data.Rmd index d3246dfb1..9918827f2 100644 --- a/02-spatial-data.Rmd +++ b/02-spatial-data.Rmd @@ -86,7 +86,7 @@ The answer likely depends on your domain of application: - Vector data tends to dominate the social sciences because human settlements tend to have discrete borders - Raster dominates many environmental sciences partially because of the reliance on remote sensing data -There is much overlap in some fields and raster and vector datasets can be used together: +Both raster and vector datasets are used in many fields and raster and vector datasets can be used together: ecologists and demographers, for example, commonly use both vector and raster data. Furthermore, it is possible to convert between the two forms (see Chapter \@ref(raster-vector)). Whether your work involves more use of vector or raster datasets, it is worth understanding the underlying data model before using them, as discussed in subsequent chapters. @@ -95,7 +95,7 @@ This book uses **sf** and **terra** packages to work with vector data and raster ## Vector data ```{block2 02-spatial-data-5, type="rmdnote"} -Take care when using the word 'vector' as it can have two meanings in this book: +Take care when using the word 'vector', as it can have two meanings in this book: geographic vector data and the `vector` class (note the `monospace` font) in R. The former is a data model, the latter is an R class just like `data.frame` and `matrix`. Still, there is a link between the two: the spatial coordinates which are at the heart of the geographic vector data model can be represented in R using `vector` objects. @@ -103,30 +103,31 @@ Still, there is a link between the two: the spatial coordinates which are at the The geographic vector data model\index{vector data model} is based on points located within a coordinate reference system\index{coordinate reference system|see {CRS}} (CRS\index{CRS}). Points can represent self-standing features (e.g., the location of a bus stop) or they can be linked together to form more complex geometries such as lines and polygons. -Most point geometries contain only two dimensions (much less prominent 3-dimensional geometries contain an additional $z$ value, typically representing height above sea level). +Most point geometries contain only two dimensions (much less prominent three-dimensional geometries contain an additional $z$ value, typically representing height above sea level). In this system, for example, London can be represented by the coordinates `c(-0.1, 51.5)`. -This means that its location is -0.1 degrees east and 51.5 degrees north of the origin. -The origin in this case is at 0 degrees longitude (the Prime Meridian) and 0 degrees latitude (the Equator) in a geographic ('lon/lat') CRS (Figure \@ref(fig:vectorplots), left panel). +This means that its location is $-0.1$ degrees east and $51.5$ degrees north of the origin. +The origin in this case is at 0 degrees longitude (Prime Meridian) and 0 degrees latitude (Equator) in a geographic ('lon/lat') CRS (Figure \@ref(fig:vectorplots), left panel). The same point could also be approximated in a projected CRS with 'Easting/Northing' values of `c(530000, 180000)` in the [British National Grid](https://en.wikipedia.org/wiki/Ordnance_Survey_National_Grid), meaning that London is located 530 km *East* and 180 km *North* of the $origin$ of the CRS. This can be verified visually: slightly more than 5 'boxes' --- square areas bounded by the gray grid lines 100 km in width --- separate the point representing London from the origin (Figure \@ref(fig:vectorplots), right panel). -The location of National Grid's\index{National Grid} origin, in the sea beyond South West Peninsular, ensures that all locations in the UK have positive Easting and Northing values.^[ +The location of National Grid's\index{National Grid} origin, in the sea beyond the South West peninsula, ensures that all locations in the UK have positive Easting and Northing values.^[ The origin we are referring to, depicted in blue in Figure \@ref(fig:vectorplots), is in fact the 'false' origin. The 'true' origin, the location at which distortions are at a minimum, is located at 2° W and 49° N. This was selected by the Ordnance Survey to be roughly in the center of the British landmass longitudinally. ] -There is more to CRSs, as described in Section \@ref(crs-intro) and Chapter \@ref(reproj-geo-data) but, for the purposes of this section, it is sufficient to know that coordinates consist of two numbers representing distance from an origin, usually in $x$ then $y$ dimensions. +There is more to CRSs, as described in Section \@ref(crs-intro) and Chapter \@ref(reproj-geo-data), +For this section, it is sufficient to know that coordinates consist of two numbers representing distance from an origin, usually in $x$ then $y$ dimensions. ```{r vectorplots-source, include=FALSE, eval=FALSE} source("https://github.com/geocompx/geocompr/raw/main/code/02-vectorplots.R") # generate subsequent figure ``` -```{r vectorplots, fig.cap="Illustration of vector (point) data in which the location of London (the red X) is represented with reference to an origin (the blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula.", out.width="49%", fig.show='hold', echo=FALSE, fig.scap="Illustration of vector (point) data."} +```{r vectorplots, fig.cap="Vector (point) data in which the location of London (red X) is represented with reference to an origin (blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula.", out.width="49%", fig.show='hold', echo=FALSE, fig.scap="Vector (point) data."} knitr::include_graphics(c("images/vector_lonlat.png", "images/vector_projected.png")) ``` -The **sf** package provides classes for geographic vector data and a consistent command line interface to important low level libraries for geocomputation: +The **sf** package provides classes for geographic vector data and a consistent command line interface to important low-level libraries for geocomputation: - [GDAL](https://gdal.org/)\index{GDAL}, for reading, writing and manipulating a wide range of geographic data formats, covered in Chapter \@ref(read-write) - [PROJ](https://proj.org/), a powerful library for coordinate system transformations, which underlies the content covered in Chapter \@ref(reproj-geo-data) @@ -134,24 +135,24 @@ The **sf** package provides classes for geographic vector data and a consistent - [S2](https://s2geometry.io/)\index{S2}, a spherical geometry engine written in C++ developed by Google, via the [**s2**](https://r-spatial.github.io/s2/) package, covered in Section \@ref(s2) below and in Chapter \@ref(reproj-geo-data) Information about these interfaces is printed by **sf** the first time the package is loaded: the message `r print(capture.output(sf:::.onAttach(), type = "message"))` that appears below the `library(sf)` command at the beginning of this chapter tells us the versions of linked GEOS, GDAL and PROJ libraries (these vary between computers and over time) and whether or not the S2\index{S2} interface is turned on. -Nowadays, we take it for granted, however, only the tight integration with different geographic libraries makes reproducible geocomputation possible in the first place. +We may these low-level libraries for granted, but without their tight integration with languages such as R much reproducible geocomputation would be impossible. A neat feature of **sf** is that you can change the default geometry engine used on unprojected data: 'switching off' S2\index{S2} can be done with the command `sf::sf_use_s2(FALSE)`, meaning that the planar geometry engine GEOS\index{GEOS} will be used by default for all geometry operations, including geometry operations on unprojected data. -As we will see in Section \@ref(s2), planar geometry is based on 2 dimensional space. -Planar geometry engines such as GEOS assume 'flat' (projected) coordinates while spherical geometry engines such as S2 assume unprojected (lon/lat) coordinates. +As we will see in Section \@ref(s2), planar geometry is based on two-dimensional space. +Planar geometry engines such as GEOS assume 'flat' (projected) coordinates, while spherical geometry engines such as S2 assume unprojected (lon/lat) coordinates. This section introduces **sf** classes in preparation for subsequent chapters (Chapters \@ref(geometry-operations) and \@ref(read-write) cover the GEOS and GDAL interface, respectively). -### An introduction to simple features {#intro-sf} +### Introduction to simple features {#intro-sf} -Simple features is an [open standard](http://portal.opengeospatial.org/files/?artifact_id=25355) developed and endorsed by the Open Geospatial Consortium (OGC), a not-for-profit organization whose activities we will revisit in a later chapter (in Section \@ref(file-formats)). +Simple features is an [open standard](http://portal.opengeospatial.org/files/?artifact_id=25355) developed and endorsed by the Open Geospatial Consortium (OGC), a not-for-profit organization whose activities we will revisit in a later chapter (Section \@ref(file-formats)). \index{simple features|see {sf}} Simple features is a hierarchical data model that represents a wide range of geometry types. -Of 18 geometry types supported by the specification, only 7 are used in the vast majority of geographic research (see Figure \@ref(fig:sf-ogc)); +Of 18 geometry types supported by the specification, only seven are used in the vast majority of geographic research (see Figure \@ref(fig:sf-ogc)); these core geometry types are fully supported by the R package **sf** [@pebesma_simple_2018].^[ The full OGC standard includes rather exotic geometry types including 'surface' and 'curve' geometry types, which currently have limited application in real-world applications. -You can find the whole list of possible feature types in [the PostGIS manual ](http://postgis.net/docs/using_postgis_dbmanagement.html). -All 18 types can be represented with the **sf** package, although at the time of writing (2024) plotting only works for the 'core 7'. +You can find the whole list of possible feature types in the [PostGIS manual ](http://postgis.net/docs/using_postgis_dbmanagement.html). +All 18 types can be represented with the **sf** package, although at the time of writing (2024), plotting only works for the 'core 7'. ] ```{r sf-ogc, fig.cap="Simple feature types fully supported by sf.", out.width="60%", echo=FALSE} @@ -164,13 +165,13 @@ knitr::include_graphics("images/sf-classes.png") **sf** also supports geometry collections, which can contain multiple geometry types in a single object. **sf** provides the same functionality (and more) previously provided in three packages --- **sp**\index{sp (package)} for data classes [@R-sp], **rgdal** for data read/write via an interface to GDAL and PROJ [@R-rgdal] and **rgeos** for spatial operations via an interface to GEOS [@R-rgeos]. -To re-iterate the message from Chapter 1, geographic R packages have a long history of interfacing with lower level libraries, and **sf** continues this tradition with a unified interface to recent versions of GEOS for geometry operations, the GDAL library for reading and writing geographic data files, and the PROJ library for representing and transforming projected coordinate reference systems. +To reiterate the message from Chapter 1, geographic R packages have a long history of interfacing with lower level libraries, and **sf** continues this tradition with a unified interface to recent versions of GEOS for geometry operations, the GDAL library for reading and writing geographic data files, and the PROJ library for representing and transforming projected CRSs. Through **s2**\index{S2}, an R interface to Google's spherical geometry library, [`s2`](https://s2geometry.io/), **sf** also has access to fast and accurate "measurements and operations on non-planar geometries" [@bivand_progress_2021]. Since **sf** version 1.0.0, launched in [June 2021](https://cran.r-project.org/src/contrib/Archive/sf/), **s2** functionality is now used by [default](https://r-spatial.org/r/2020/06/17/s2.html) on geometries with geographic (longitude/latitude) coordinate systems, a unique feature of **sf** that differs from spatial libraries that only support GEOS for geometry operations such as the Python package [GeoPandas](geopandas/geopandas/issues/2098). We will discuss **s2** in subsequent chapters. **sf**'s ability to integrate multiple powerful libraries for geocomputation into a single framework is a notable achievement that reduces 'barriers to entry' into the world of reproducible geographic data analysis with high-performance libraries. -**sf**'s functionality is well documented on its website at [r-spatial.github.io/sf/](https://r-spatial.github.io/sf/index.html) which contains 7 vignettes. +**sf**'s functionality is well documented on its website at [r-spatial.github.io/sf/](https://r-spatial.github.io/sf/index.html) which contains seven vignettes. These can be viewed offline as follows: ```{r 02-spatial-data-6, eval=FALSE} @@ -204,7 +205,7 @@ The contents of this `geom` column give `sf` objects their spatial powers: `worl Although part of R's default installation (base R), `plot()` is a [*generic*](https://adv-r.hadley.nz/s3.html#s3-methods) that is extended by other packages. **sf** contains the non-exported (hidden from users most of the time) `plot.sf()` function which is what is called behind the scenes in the following command, which creates Figure \@ref(fig:world-all). -```{r world-all, fig.cap="A spatial plot of the world using the sf package, with a facet for each attribute.", warning=FALSE, fig.scap="A spatial plot of the world using the sf package."} +```{r world-all, fig.cap="Map of the world using the sf package, with a facet for each attribute.", warning=FALSE, fig.scap="Map of the world using the sf package."} plot(world) ``` @@ -219,7 +220,7 @@ summary(world["lifeExp"]) ``` Although we have only selected one variable for the `summary()` command, it also outputs a report on the geometry. -This demonstrates the 'sticky' behavior of the geometry columns of **sf** objects, meaning the geometry is kept unless the user deliberately removes them, as we'll see in Section \@ref(vector-attribute-manipulation). +This demonstrates the 'sticky' behavior of the geometry columns of **sf** objects: they are kept unless the user deliberately removes them, as we'll see in Section \@ref(vector-attribute-manipulation). The result provides a quick summary of both the non-spatial and spatial data contained in `world`: the mean average life expectancy is 71 years (ranging from less than 51 to more than 83 years with a median of 73 years) across all countries. ```{block2 02-spatial-data-10, type='rmdnote'} @@ -250,7 +251,7 @@ In turn, `sfc` objects are composed of one or more objects of class `sfg`: simpl \index{simple feature columns|see {sf!sfc}} To understand how the spatial components of simple features work, it is vital to understand simple feature geometries. -For this reason we cover each currently supported simple features geometry type in Section \@ref(geometry) before moving on to describe how these can be represented in R using `sf` objects, which are based on `sfg` and `sfc` objects. +For this reason, we cover each currently supported simple features geometry type in Section \@ref(geometry) before moving on to describe how these can be represented in R using `sf` objects, which are based on `sfg` and `sfc` objects. ```{block2 assignment, type='rmdnote'} The preceding code chunk uses `=` to create a new object called `world_mini` in the command `world_mini = world[1:2, 1:3]`. @@ -266,7 +267,7 @@ Simple features is a widely supported data model that underlies data structures A major advantage of this is that using the data model ensures your work is cross-transferable to other setups, for example importing from and exporting to spatial databases. \index{sf!why simple features} -A more specific question from an R perspective is "why use the **sf** package"? +A more specific question from an R perspective is "why use the **sf** package?" There are many reasons (linked to the advantages of the simple features model): - Fast reading and writing of data @@ -287,15 +288,15 @@ class(world_tbl) ``` As described in Chapter \@ref(attr), which shows how to manipulate `sf` objects with **tidyverse** functions, **sf** is now the go-to package for analysis of spatial vector data in R. -**spatstat**, a package ecosystem which provides numerous functions for spatial statistics, and **terra** both have vector geographic data classes, but neither have the same level of uptake as **sf** does for working with vector data. +**spatstat**, a package ecosystem which provides numerous functions for spatial statistics, and **terra** both have vector geographic data classes, but neither has the same level of uptake as **sf** does for working with vector data. Many popular packages build on **sf**, as shown by the rise in its popularity in terms of number of downloads per day, as shown in Section \@ref(r-ecosystem) in the previous chapter. -### Basic map-making {#basic-map} +### Basic maps {#basic-map} -Basic maps are created in **sf** with `plot()`. -By default this creates a multi-panel plot, one sub-plot for each variable of the object, as illustrated in the left-hand panel in Figure \@ref(fig:sfplot). +Basic geographic visualizations (maps) are created in **sf** with base R's `plot()` function. +By default, this creates a multi-panel plot, one sub-plot for each variable of the object, as illustrated in the left-hand panel in Figure \@ref(fig:sfplot). A legend or 'key' with a continuous color is produced if the object to be plotted has a single variable (see the right-hand panel). -Colors can also be set with `col = `, although this will not create a continuous palette or a legend. +You can also set fixed colors in `plot()` commands with `col` and `border` arguments. \index{map-making!basic} ```{r sfplot, fig.cap="Plotting with sf, with multiple variables (left) and a single variable (right).", out.width="49%", fig.show='hold', warning=FALSE, fig.scap="Plotting with sf."} @@ -318,14 +319,14 @@ We can now plot the Asian continent over a map of the world. Note that the first plot must only have one facet for `add = TRUE` to work. If the first plot has a key, `reset = FALSE` must be used: -```{r asia, out.width='50%', fig.cap="A plot of Asia added as a layer on top of countries worldwide."} +```{r asia, out.width='50%', fig.cap="Plot of Asia added as a layer on top of countries worldwide."} plot(world["pop"], reset = FALSE) plot(asia, add = TRUE, col = "red") ``` ```{block2 plottingpacks, type='rmdnote'} Adding layers in this way can be used to verify the geographic correspondence between layers: -the `plot()` function is fast to execute and requires few lines of code, but does not create interactive maps with a wide range of options. +the `plot()` function is fast and requires few lines of code, but its functionality is limited. For more advanced map-making we recommend using dedicated visualization packages such as **tmap** [@tmap2018] (see Chapter \@ref(adv-map)). ``` @@ -386,22 +387,21 @@ See Section \@ref(static-maps) for other visualization techniques for representi Geometries are the basic building blocks of simple features. Simple features in R can take on one of the 18 geometry types supported by the **sf** package. -\index{geometry types|see {sf!geometry types}} -\index{sf!geometry types} In this chapter we will focus on the seven most commonly used types: `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON` and `GEOMETRYCOLLECTION`. +\index{geometry types|see {sf!geometry types}} \index{sf!geometry types} Generally, well-known binary (WKB) or well-known text (WKT) are the standard encoding for simple feature geometries. -\index{well-known text} -\index{WKT|see {well-known text}} -\index{well-known binary} -\index{WKB|see {well-known binary}} WKB representations are usually hexadecimal strings easily readable for computers. This is why GIS and spatial databases use WKB to transfer and store geometry objects. WKT, on the other hand, is a human-readable text markup description of simple features. Both formats are exchangeable, and if we present one, we will naturally choose the WKT representation. +\index{well-known text} +\index{WKT|see {well-known text}} +\index{well-known binary} +\index{WKB|see {well-known binary}} The basis for each geometry type is the point. -A point is simply a coordinate in 2D, 3D or 4D space (see `vignette("sf1")` for more information) such as (Figure \@ref(fig:sfcs), left panel): +A point is simply a coordinate in two-, three-, or four-dimensional space (see `vignette("sf1")` for more information) such as (Figure \@ref(fig:sfcs), left panel): \index{sf!point} - `POINT (5 2)` @@ -420,7 +420,7 @@ A polygon with a hole would be, for example, `POLYGON ((1 5, 2 2, 4 1, 4 4, 1 5) - Polygon without a hole: `POLYGON ((1 5, 2 2, 4 1, 4 4, 1 5))` -```{r sfcs, echo=FALSE, fig.cap="Illustration of point, linestring and polygon geometries.", fig.asp=0.4} +```{r sfcs, echo=FALSE, fig.cap="Point, linestring and polygon geometries.", fig.asp=0.4} old_par = par(mfrow = c(1, 3), pty = "s", mar = c(0, 3, 1, 0)) plot(st_as_sfc(c("POINT(5 2)")), axes = TRUE, main = "POINT") plot(st_as_sfc("LINESTRING(1 5, 4 4, 4 1, 2 2, 3 2)"), axes = TRUE, main = "LINESTRING") @@ -466,7 +466,7 @@ par(old_par) ### The sf class {#sf} Simple features consist of two main parts: geometries and non-geographic attributes. -Figure \@ref(fig:02-sfdiagram) shows how an sf object is created -- geometries come from an `sfc` object, while attributes are taken from a `data.frame` or `tibble`.^[To learn more about building sf geometries from scratch read the following Sections \@ref(sfg) and \@ref(sfc).] +Figure \@ref(fig:02-sfdiagram) shows how an sf object is created -- geometries come from an `sfc` object, while attributes are taken from a `data.frame` or `tibble`.^[To learn more about building sf geometries from scratch, see the following Sections \@ref(sfg) and \@ref(sfc).] ```{r 02-sfdiagram, fig.cap="Building blocks of sf objects.", echo=FALSE} # source("code/02-sfdiagram.R") @@ -475,8 +475,8 @@ knitr::include_graphics("images/02-sfdiagram.png") Non-geographic attributes represent the name of the feature or other attributes such as measured values, groups, and other things. \index{sf!class} -To illustrate attributes, we will represent a temperature of 25°C in London on June 21^st^, 2023. -This example contains a geometry (the coordinates), and three attributes with three different classes (place name, temperature and date).^[ +To illustrate attributes, we will represent a temperature of 25°C in London on June 21, 2023. +This example contains a geometry (coordinates), and three attributes with three different classes (place name, temperature and date).^[ Other attributes might include an urbanity category (city or village), or a remark if the measurement was made using an automatic station. ] Objects of class `sf` represent such data by combining the attributes (`data.frame`) with the simple feature geometry column (`sfc`). @@ -537,7 +537,7 @@ The `sfg` class represents the different simple feature geometry types in R: poi \index{simple feature geometries|see {sf!sfg}} Usually you are spared the tedious task of creating geometries on your own since you can simply import an already existing spatial file. -However, there are a set of functions to create simple feature geometry objects (`sfg`) from scratch if needed. +However, there are a set of functions to create simple feature geometry objects (`sfg`) from scratch, if needed. The names of these functions are simple and consistent, as they all start with the `st_` prefix and end with the name of the geometry type in lowercase letters: - A point: `st_point()` @@ -563,7 +563,7 @@ st_point(c(5, 2, 1), dim = "XYM") # XYM point st_point(c(5, 2, 3, 1)) # XYZM point ``` -The results show that XY (2D coordinates), XYZ (3D coordinates) and XYZM (3D with an additional variable, typically measurement accuracy) point types are created from vectors of length 2, 3, and 4, respectively. +The results show that XY (2D coordinates), XYZ (3D coordinates) and XYZM (3D with an additional variable, typically measurement accuracy) point types are created from vectors of lengths 2, 3, and 4, respectively. The XYM type must be specified using the `dim` argument (which is short for dimension). By contrast, use matrices in the case of multipoint (`st_multipoint()`) and linestring (`st_linestring()`) objects: @@ -620,7 +620,7 @@ st_geometrycollection(geometrycollection_list) ### Simple feature columns (sfc) {#sfc} One `sfg` object contains only a single simple feature geometry. -A simple feature geometry column (`sfc`) is a list of `sfg` objects, which is additionally able to contain information about the coordinate reference system in use. +A simple feature geometry column (`sfc`) is a list of `sfg` objects, which is additionally able to contain information about the CRS in use. For instance, to combine two simple features into one object with two features, we can use the `st_sfc()` function. \index{sf!simple feature columns (sfc)} This is important since `sfc` represents the geometry column in **sf** data frames: @@ -667,7 +667,7 @@ point_multilinestring_sfc = st_sfc(point1, multilinestring1) st_geometry_type(point_multilinestring_sfc) ``` -As mentioned before, `sfc` objects can additionally store information on the coordinate reference systems (CRS). +As mentioned before, `sfc` objects can additionally store information on the CRS. The default value is `NA` (*Not Available*), as can be verified with `st_crs()`: ```{r 02-spatial-data-29} @@ -703,7 +703,7 @@ st_crs(points_sfc_wgs) # print CRS (only first 4 lines of output shown) **sfheaders** is an R package that speeds-up the construction, conversion and manipulation of `sf` objects [@cooley_sfheaders_2020]. It focuses on building `sf` objects from vectors, matrices and data frames, rapidly, and without depending on the **sf** library; and exposing its underlying C++ code through header files (hence the name, **sfheaders**). This approach enables others to extend it using compiled and fast-running code. -Every core **sfheaders** function has a corresponding C++ implementation, as described in [the `Cpp` vignette](https://dcooley.github.io/sfheaders/articles/Cpp.html). +Every core **sfheaders** function has a corresponding C++ implementation, as described in the [`Cpp` vignette](https://dcooley.github.io/sfheaders/articles/Cpp.html). For most people, the R functions will be more than sufficient to benefit from the computational speed of the package. **sfheaders** was developed separately from **sf**, but aims to be fully compatible, creating valid `sf` objects of the type described in preceding sections. @@ -779,7 +779,7 @@ sfheaders::sf_linestring(obj = m) sfheaders::sf_polygon(obj = df) ``` -In each of these examples the CRS (coordinate reference system) is not defined. +In each of these examples, the CRS is not defined. If you plan on doing any calculations or geometric operations using **sf** functions, we encourage you to set the CRS (see Chapter \@ref(reproj-geo-data) for details): ```{r sfheaders-crs} @@ -793,13 +793,13 @@ Benchmarks, in the package's [documentation](https://dcooley.github.io/sfheaders ### Spherical geometry operations with S2 {#s2} -Spherical geometry engines are based on the fact that world is round while simple mathematical procedures for geocomputation, such as calculating a straight line between two points or the area enclosed by a polygon, assume planar (projected) geometries. +Spherical geometry engines are based on the fact that the world is round, while simple mathematical procedures for geocomputation, such as calculating a straight line between two points or the area enclosed by a polygon, assume planar (projected) geometries. Since **sf** version 1.0.0, R supports spherical geometry operations 'out of the box' (and by default), thanks to its interface to Google's S2 spherical geometry engine via the **s2** interface package \index{S2}. S2 is perhaps best known as an example of a Discrete Global Grid System (DGGS). Another example is the [H3](https://h3geo.org/) global hexagonal hierarchical spatial index [@bondaruk_assessing_2020]. -Although potentially useful for describing locations anywhere on Earth using character strings, the main benefit of **sf**'s interface to S2 is its provision of drop-in functions for calculations such as distance, buffer, and area calculations, as described in **sf**'s built in documentation which can be opened with the command [`vignette("sf7")`](https://r-spatial.github.io/sf/articles/sf7.html). +Although potentially useful for describing locations anywhere on Earth using character strings, the main benefit of **sf**'s interface to S2 is its provision of drop-in functions for calculations such as distance, buffer, and area calculations, as described in **sf**'s built-in documentation which can be opened with the command [`vignette("sf7")`](https://r-spatial.github.io/sf/articles/sf7.html). **sf** can run in two modes with respect to S2: on and off. By default the S2 geometry engine is turned on, as can be verified with the following command: @@ -835,9 +835,9 @@ tm2 = tm_shape(india_buffer_without_s2) + tmap_arrange(tm1, tm2, ncol = 2) ``` -The right panel of Figure \@ref(fig:s2example) is incorrect as the buffer of 1 degree does not return the equal distance around the `india` polygon (for more explanation of this issue, read Section \@ref(geom-proj)). +The right panel of Figure \@ref(fig:s2example) is incorrect, as the buffer of 1 degree does not return the equal distance around the `india` polygon (for more explanation of this issue, see Section \@ref(geom-proj)). -Throughout this book we will assume that S2 is turned on, unless explicitly stated. +Throughout this book, we will assume that S2 is turned on, unless explicitly stated. Turn it on again with the following command. ```{r} @@ -848,21 +848,21 @@ sf_use_s2(TRUE) Although the **sf**'s use of S2 makes sense in many cases, in some cases there are good reasons for turning S2 off for the duration of an R session or even for an entire project. As documented in issue [1771](https://github.com/r-spatial/sf/issues/1771) in **sf**'s GitHub repo, the default behavior can make code that would work with S2 turned off (and with older versions of **sf**) fail. These edge cases include operations on polygons that are not valid according to S2's stricter definition. -If you see error message such as `#> Error in s2_geography_from_wkb ...` it may be worth trying the command that generated the error message again, after turning off S2. -To turn off S2 for the entirety of a project you can create a file called .Rprofile in the root directory (the main folder) of your project containing the command `sf::sf_use_s2(FALSE)`. +If you see error messages such as `#> Error in s2_geography_from_wkb ...` it may be worth trying the command that generated the error message again, after turning off S2. +To turn off S2 for the entirety of a project, you can create a file called .Rprofile in the root directory (the main folder) of your project containing the command `sf::sf_use_s2(FALSE)`. ``` ## Raster data The spatial raster data model represents the world with the continuous grid of cells (often also called pixels; Figure \@ref(fig:raster-intro-plot):A)\index{raster data model}. This data model often refers to so-called regular grids, in which each cell has the same, constant size -- and we will focus on the regular grids in this book only. -However, several other types of grids exist, including rotated, sheared, rectilinear, and curvilinear grids (see Chapter 1 of @pebesma_spatial_2022 or Chapter 2 of @tennekes_elegant_2022). +However, several other types of grids exist, including rotated, sheared, rectilinear, and curvilinear grids (see chapter 1 of @pebesma_spatial_2022 or chapter 2 of @tennekes_elegant_2022). The raster data model usually consists of a raster header\index{raster!header} and a matrix (with rows and columns) representing equally spaced cells (often also called pixels; Figure \@ref(fig:raster-intro-plot):A).^[ -Depending on the file format the header is part of the actual image data file, e.g., GeoTIFF, or stored in an extra header or world file, e.g., ASCII grid formats. +Depending on the file format, the header is part of the actual image data file, e.g., GeoTIFF, or stored in an extra header or world file, e.g., ASCII grid formats. There is also the headerless (flat) binary raster format which should facilitate the import into various software programs.] -The raster header\index{raster!header} defines the coordinate reference system, the extent and the origin. +The raster header\index{raster!header} defines the CRS, the extent and the origin. \index{raster} \index{raster data model} The origin (or starting point) is frequently the coordinate of the lower left corner of the matrix (the **terra** package, however, uses the upper left corner, by default (Figure \@ref(fig:raster-intro-plot):B)). @@ -875,13 +875,13 @@ $$ $$ Starting from the origin, we can easily access and modify each single cell by either using the ID of a cell (Figure \@ref(fig:raster-intro-plot):B) or by explicitly specifying the rows and columns. -This matrix representation avoids storing explicitly the coordinates for the four corner points (in fact it only stores one coordinate, namely the origin) of each cell corner as would be the case for rectangular vector polygons. -This and map algebra (Section \@ref(map-algebra)) makes raster processing much more efficient and faster than vector data processing. +This matrix representation avoids storing explicitly the coordinates for the four corner points (in fact, it only stores one coordinate, namely the origin) of each cell corner as would be the case for rectangular vector polygons. +This and map algebra (Section \@ref(map-algebra)) make raster processing much more efficient and faster than vector data processing. -In contrast to vector data, the cell of one raster layer can only hold a single value.^[Thus to store many values for a single location we need to have many raster layers.] -The value might be continuous or categorical (Figure \@ref(fig:raster-intro-plot):C). +In contrast to vector data, the cell of one raster layer can only hold a single value.^[Thus, to store many values for a single location we need to have many raster layers.] +The value might be continuous or categorical (Figure \@ref(fig:raster-intro-plot)C). -```{r raster-intro-plot, echo = FALSE, fig.cap = "Raster data types: (A) cell IDs, (B) cell values, (C) a colored raster map.", fig.scap="Raster data types.", fig.asp=0.5, message=FALSE} +```{r raster-intro-plot, echo = FALSE, fig.cap = "Raster data types.", fig.scap="Raster data types.", fig.asp=0.5, message=FALSE} source("https://github.com/geocompx/geocompr/raw/main/code/02-raster-intro-plot.R", print.eval = TRUE) ``` @@ -890,7 +890,7 @@ Discrete features such as soil or land-cover classes can also be represented in Both uses of raster datasets are illustrated in Figure \@ref(fig:raster-intro-plot2), which shows how the borders of discrete features may become blurred in raster datasets. Depending on the nature of the application, vector representations of discrete features may be more suitable. -```{r raster-intro-plot2, echo=FALSE, fig.cap="Examples of continuous and categorical rasters.", warning=FALSE, message=FALSE} +```{r raster-intro-plot2, echo=FALSE, fig.cap="Examples of (A) continuous and (B) categorical rasters.", warning=FALSE, message=FALSE} source("code/02-raster-intro-plot2.R", print.eval = TRUE) # knitr::include_graphics("https://user-images.githubusercontent.com/1825120/146617327-45919232-a6a3-4d9d-a158-afa87f47381b.png") ``` @@ -900,12 +900,12 @@ source("code/02-raster-intro-plot2.R", print.eval = TRUE) Over the last two decades, several packages for reading and processing raster datasets have been developed. \index{raster (package)}\index{terra (package)}\index{stars (package)} As outlined in Section \@ref(history-of-r-spatial), chief among them was **raster**, which led to a step change in R's raster capabilities when it was launched in 2010 and the premier package in the space until the development of **terra** and **stars**. -Both more recently developed packages provide powerful and performant functions for working with raster datasets and there is substantial overlap between their possible use cases. -In this book we focus on **terra**, which replaces the older and (in most cases) slower **raster**. -Before learning about the how **terra**'s class system works, this section describes similarities and differences between **terra** and **stars**; this knowledge will help decide which is most appropriate in different situations. +Both more recently developed packages provide powerful and performant functions for working with raster datasets, and there is substantial overlap between their possible use cases. +In this book, we focus on **terra**, which replaces the older and (in most cases) slower **raster**. +Before learning about how **terra**'s class system works, this section describes similarities and differences between **terra** and **stars**; this knowledge will help decide which is most appropriate in different situations. First, **terra** focuses on the most common raster data model (regular grids), while **stars** also allows storing less popular models (including regular, rotated, sheared, rectilinear, and curvilinear grids). -While **terra** usually handles one or multilayered rasters^[It also has an additional class `SpatRasterDataset` for storing many collections of datasets.], the **stars** package provides ways to store raster data cubes -- a raster object with many layers (e.g., bands), for many moments in time (e.g., months), and many attributes (e.g., sensor type A and sensor type B). +While **terra** usually handles one or multi-layered rasters^[It also has an additional class `SpatRasterDataset` for storing many collections of datasets.], the **stars** package provides ways to store raster data cubes -- a raster object with many layers (e.g., bands), for many moments in time (e.g., months), and many attributes (e.g., sensor type A and sensor type B). Importantly, in both packages, all layers or elements of a data cube must have the same spatial dimensions and extent. Second, both packages allow to either read all of the raster data into memory or just to read its metadata -- this is usually done automatically based on the input file size. However, they store raster values very differently. @@ -919,14 +919,14 @@ On the other hand, **stars** uses some built-in functions (usually with names st Importantly, it is straightforward to convert objects from **terra** to **stars** (using `st_as_stars()`) and the other way round (using `rast()`). We also encourage you to read @pebesma_spatial_2022 for the most comprehensive introduction to the **stars** package. -### An introduction to terra +### Introduction to terra \index{terra (package)} The **terra** package supports raster objects in R. It provides an extensive set of functions to create, read, export, manipulate and process raster datasets. **terra**'s functionality is largely the same as the more mature **raster** package, but there are some differences: **terra** functions are usually more computationally efficient than **raster** equivalents. On the other hand, the **raster** class system is popular and used by many other packages. -You can seamlessly translate between the two types of object to ensure backwards compatibility with older scripts and packages, for example, with the functions [`raster()`](https://rspatial.github.io/raster/reference/raster.html), [`stack()`](https://rspatial.github.io/raster/reference/stack.html), and `brick()` in the **raster** package (see the previous chapter for more on the evolution of R packages for working with geographic data). +You can seamlessly translate between the two types of object to ensure backward compatibility with older scripts and packages, for example, with the functions [`raster()`](https://rspatial.github.io/raster/reference/raster.html), [`stack()`](https://rspatial.github.io/raster/reference/stack.html), and `brick()` in the **raster** package (see the previous chapter for more on the evolution of R packages for working with geographic data). In addition to functions for raster data manipulation, **terra** provides many low-level functions that can form a foundation for developing new tools for working with raster datasets. \index{terra (package)} @@ -934,7 +934,7 @@ In addition to functions for raster data manipulation, **terra** provides many l In this case, **terra** provides the possibility to divide the raster into smaller chunks, and processes these iteratively instead of loading the whole raster file into RAM. For the illustration of **terra** concepts, we will use datasets from the **spDataLarge** [@R-spDataLarge]. -It consists of a few raster objects and one vector object covering an area of the Zion National Park (Utah, USA). +It consists of a few raster objects and one vector object covering an area of Zion National Park (Utah, USA). For example, `srtm.tif` is a digital elevation model of this area (for more details, see its documentation `?srtm`). First, let's create a `SpatRaster` object named `my_rast`: @@ -950,7 +950,7 @@ Typing the name of the raster into the console, will print out the raster header my_rast ``` -Dedicated functions report each component: `dim()` returns the number of rows, columns and layers; `ncell()` the number of cells (pixels); `res()` the spatial resolution; `ext()` its spatial extent; and `crs()` its coordinate reference system (raster reprojection is covered in Section \@ref(reproj-ras)). +Dedicated functions report each component: `dim()` returns the number of rows, columns and layers; `ncell()` the number of cells (pixels); `res()` the spatial resolution; `ext()` its spatial extent; and `crs()` its CRS (raster reprojection is covered in Section \@ref(reproj-ras)). `inMemory()` reports whether the raster data is stored in memory or on disk, and `sources` specifies the file location. ```{block2 terrahelp, type='rmdnote'} @@ -960,6 +960,7 @@ Dedicated functions report each component: `dim()` returns the number of rows, c ### Basic map-making {#basic-map-raster} Similar to the **sf** package, **terra** also provides `plot()` methods for its own classes. +As shown in the following command, the `plot()` function creates a basic raster plot, resulting in Figure \@ref(fig:basic-new-raster-plot). \index{map-making!basic raster} ```{r basic-new-raster-plot, fig.cap="Basic raster plot."} @@ -987,7 +988,7 @@ single_rast = rast(raster_filepath) The **terra** package supports numerous drivers with the help of the GDAL library. Rasters from files are usually not read entirely into RAM, with an exception of their header and a pointer to the file itself. -Rasters can also be created from scratch using the same `rast()` function. +Rasters can also be created from scratch, using the same `rast()` function. This is illustrated in the subsequent code chunk, which results in a new `SpatRaster` object. The resulting raster consists of 36 cells (6 columns and 6 rows specified by `nrows` and `ncols`) centered around the Prime Meridian and the Equator (see `xmin`, `xmax`, `ymin` and `ymax` parameters). Values (`vals`) are assigned to each cell: 1 to cell 1, 2 to cell 2, and so on. @@ -1005,7 +1006,7 @@ The unit of the resolution is that of the underlying CRS. Here, it is degrees, because the default CRS of raster objects is WGS84. However, one can specify any other CRS with the `crs` argument. -The `SpatRaster` class also handles multiple layers, which typically correspond to a single multispectral satellite file or a time-series of rasters. +The `SpatRaster` class also handles multiple layers, which typically correspond to a single multi-spectral satellite file or a time-series of rasters. ```{r 02-spatial-data-45} multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge") @@ -1019,7 +1020,7 @@ multi_rast nlyr(multi_rast) ``` -For multilayer raster objects, layers can be selected with the `[[` and `$` operators, for example with commands `multi_rast[["landsat_1"]]` and `multi_rast$landsat_1`. +For multi-layer raster objects, layers can be selected with the `[[` and `$` operators, for example with commands `multi_rast[["landsat_1"]]` and `multi_rast$landsat_1`. The `terra::subset()` can also be used to select layers. It accepts a layer number or its name as the second argument: @@ -1048,20 +1049,20 @@ In these cases, there are two main possible solutions: (1) use of the `wrap()` f \index{CRS!introduction} Vector and raster spatial data types share concepts intrinsic to spatial data. -Perhaps the most fundamental of these is the Coordinate Reference System (CRS), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies). +Perhaps the most fundamental of these is the coordinate reference systems (CRSs), which defines how the spatial elements of the data relate to the surface of the Earth (or other bodies). CRSs are either geographic or projected, as introduced at the beginning of this chapter (see Figure \@ref(fig:vectorplots)). This section explains each type, laying the foundations for Chapter \@ref(reproj-geo-data), which provides a deep dive into setting, transforming and querying CRSs. ### Geographic coordinate reference systems \index{CRS!geographic} -Geographic coordinate reference systems identify any location on the Earth's surface using two values --- longitude and latitude (Figure \@ref(fig:vector-crs), left panel). +Geographic CRSs identify any location on the Earth's surface using two values --- longitude and latitude (Figure \@ref(fig:vector-crs), left panel). *Longitude* is location in the East-West direction in angular distance from the Prime Meridian plane. *Latitude* is angular distance North or South of the equatorial plane. Distances in geographic CRSs are therefore not measured in meters. This has important consequences, as demonstrated in Section \@ref(reproj-geo-data). -The surface of the Earth in geographic coordinate reference systems is represented by a spherical or ellipsoidal surface. +The surface of the Earth in geographic CRSs is represented by a spherical or ellipsoidal surface. Spherical models assume that the Earth is a perfect sphere of a given radius -- they have the advantage of simplicity but, at the same time, they are inaccurate as the Earth is not exactly a sphere. Ellipsoidal models are slightly more accurate, and are defined by two parameters: the equatorial radius and the polar radius. These are suitable because the Earth is compressed: the equatorial radius is around 11.5 km longer than the polar radius [@maling_coordinate_1992].^[ @@ -1078,7 +1079,7 @@ Black lines represent a *geocentric datum*, whose center is located in the Earth In a *local datum*, shown as a purple dashed line, the ellipsoidal surface is shifted to align with the surface at a particular location. These allow local variations in Earth's surface, for example due to large mountain ranges, to be accounted for in a local CRS. This can be seen in Figure \@ref(fig:datum-fig), where the local datum is fitted to the area of Philippines, but is misaligned with most of the rest of the planet's surface. -Both datums in Figure \@ref(fig:datum-fig) are put on top of a geoid - a model of global mean sea level.^[Please note that the geoid on the Figure exaggerates the bumpy surface of the geoid by a factor of 10,000 to highlight the irregular shape of the planet.] +Both datums in Figure \@ref(fig:datum-fig) are put on top of a geoid --- a model of global mean sea level.^[Note that the geoid in Figure \@ref(fig:datum-fig) is exaggerated by a factor of 10,000 to highlight the irregular shape of the planet.] (ref:datum-fig) Geocentric and local geodetic datums shown on top of a geoid (in false color and the vertical exaggeration by 10,000 scale factor). Image of the geoid is adapted from the work of @essd-11-647-2019. @@ -1098,7 +1099,7 @@ Therefore, some properties of the Earth's surface are distorted in this process, A projected coordinate reference system can preserve only one or two of those properties. Projections are often named based on a property they preserve: equal-area preserves area, azimuthal preserve direction, equidistant preserve distance, and conformal preserve local shape. -There are three main groups of projection types - conic, cylindrical, and planar (azimuthal). +There are three main groups of projection types: conic, cylindrical, and planar (azimuthal). In a conic projection, the Earth's surface is projected onto a cone along a single line of tangency or two lines of tangency. Distortions are minimized along the tangency lines and rise with the distance from those lines in this projection. Therefore, it is the best suited for maps of mid-latitude areas. @@ -1111,11 +1112,11 @@ It is typically used in mapping polar regions. A quick summary of different projections, their types, properties, and suitability can be found at [www.geo-projections.com](https://www.geo-projections.com/). We will expand on CRSs and explain how to project from one CRS to another in Chapter \@ref(reproj-geo-data). -For now, it is sufficient to know: +For now, it is sufficient to know that: -- That coordinate systems are a key component of geographic objects +- Coordinate systems are a key component of geographic objects - Knowing which CRS your data is in, and whether it is in geographic (lon/lat) or projected (typically meters), is important and has consequences for how R handles spatial and geometry operations -- CRSs of `sf` objects can be queried with the function `st_crs()`, CRSs of `terra` objects can be queried with the function `crs()` +- CRSs of `sf` objects can be queried with the function `st_crs()` and CRSs of `terra` objects can be queried with the function `crs()` ```{r vector-crs, echo=FALSE, fig.cap="Examples of geographic (WGS 84; left) and projected (NAD83 / UTM zone 12N; right) coordinate systems for a vector data type.", message=FALSE, fig.asp=0.56, fig.scap="Examples of geographic and projected CRSs (vector data)."} # source("https://github.com/geocompx/geocompr/raw/main/code/02-vector-crs.R") @@ -1127,7 +1128,7 @@ knitr::include_graphics("images/02_vector_crs.png") An important feature of CRSs is that they contain information about spatial units. Clearly, it is vital to know whether a house's measurements are in feet or meters, and the same applies to maps. It is good cartographic practice to add a *scale bar* or some other distance indicator onto maps to demonstrate the relationship between distances on the page or screen and distances on the ground. -Likewise, it is important to formally specify the units in which the geometry data or cells are measured to provide context, and ensure that subsequent calculations are done in context. +Likewise, it is important to formally specify the units in which the geometry data or cells are measured to provide context, and to ensure that subsequent calculations are done in context. A novel feature of geometry data in `sf` objects is that they have *native support* for units. This means that distance, area and other geometric calculations in **sf** return values that come with a `units` attribute, defined by the **units** package [@pebesma_measurement_2016]. @@ -1164,13 +1165,13 @@ units::set_units(st_area(luxembourg), km^2) Units are of equal importance in the case of raster data. However, so far **sf** is the only spatial package that supports units, meaning that people working on raster data should approach changes in the units of analysis (for example, converting pixel widths from imperial to decimal units) with care. The `my_rast` object (see above) uses a WGS84 projection with decimal degrees as units. -Consequently, its resolution is also given in decimal degrees but you have to know it, since the `res()` function simply returns a numeric vector. +Consequently, its resolution is also given in decimal degrees, but you have to know it, since the `res()` function simply returns a numeric vector. ```{r 02-spatial-data-61} res(my_rast) ``` -If we used the UTM projection, the units would change. +If we used the Universal Transverse Mercator (UTM) projection, the units would change. ```{r 02-spatial-data-62, warning=FALSE, message=FALSE} repr = project(my_rast, "EPSG:26912") diff --git a/03-attribute-operations.Rmd b/03-attribute-operations.Rmd index f88eb587a..788504e41 100644 --- a/03-attribute-operations.Rmd +++ b/03-attribute-operations.Rmd @@ -29,7 +29,7 @@ library(spData) # spatial data package introduced in Chapter 2 \index{attribute} Attribute data is non-spatial information associated with geographic (geometry) data. A bus stop provides a simple example: its position would typically be represented by latitude and longitude coordinates (geometry data), in addition to its name. -The [Elephant & Castle / New Kent Road](https://www.openstreetmap.org/relation/6610626) stop in London, for example has coordinates of -0.098 degrees longitude and 51.495 degrees latitude which can be represented as `POINT (-0.098 51.495)` in the `sfc` representation described in Chapter \@ref(spatial-class). +The [Elephant & Castle / New Kent Road](https://www.openstreetmap.org/relation/6610626) stop in London, for example has coordinates of $-0.098$ degrees longitude and 51.495 degrees latitude which can be represented as `POINT (-0.098 51.495)` in the `sfc` representation described in Chapter \@ref(spatial-class). Attributes, such as *name*\index{attribute}, of the POINT feature (to use simple features terminology) are the topic of this chapter. ```{r, eval=FALSE, echo=FALSE} @@ -51,7 +51,7 @@ point_sf = sf::st_as_sf(point_df, coords = c("X", "Y")) \index{attribute} Another example is the elevation value (attribute) for a specific grid cell in raster data. Unlike the vector data model, the raster data model stores the coordinate of the grid cell indirectly, meaning the distinction between attribute and spatial information is less clear. -To illustrate the point, think of a pixel in the 3^rd^ row and the 4^th^ column of a raster matrix. +To illustrate the point, think of a pixel in row 3 and column 4 of a raster matrix. Its spatial location is defined by its index in the matrix: move from the origin four cells in the x direction (typically east and right on maps) and three cells in the y direction (typically south and down). The raster's *resolution* defines the distance for each x- and y-step which is specified in a *header*. The header is a vital component of raster datasets which specifies how pixels relate to spatial coordinates (see also Chapter \@ref(spatial-operations)). @@ -99,7 +99,7 @@ Many of these (`aggregate()`, `cbind()`, `merge()`, `rbind()` and `[`) are for m A key feature of `sf` objects is that they store spatial and non-spatial data in the same way, as columns in a `data.frame`. ```{block2 03-attribute-operations-6, type = 'rmdnote'} -The geometry column of `sf` objects is typically called `geometry` or `geom` but any name can be used. +The geometry column of `sf` objects is typically called `geometry` or `geom`, but any name can be used. The following command, for example, creates a geometry column named g: `st_sf(data.frame(n = world$name_long), g = world$geom)` @@ -111,12 +111,12 @@ This enables geometries imported from spatial databases to have a variety of nam Thus **sf** enables the full power of R's data analysis capabilities to be unleashed on geographic data, whether you use base R or tidyverse functions for data analysis. \index{tibble} **sf** objects can also be used with the high-performance data processing package **data.table** although, as documented in the issue [`Rdatatable/data.table#2273`](https://github.com/Rdatatable/data.table/issues/2273), is not fully [compatible](https://github.com/Rdatatable/data.table/issues/5352) with `sf` objects. -Before using these capabilities it is worth re-capping how to discover the basic properties of vector data objects. +Before using these capabilities, it is worth recapping how to discover the basic properties of vector data objects. Let's start by using base R functions to learn about the `world` dataset from the **spData** package: ```{r 03-attribute-operations-7} class(world) # it's an sf object and a (tidy) data frame -dim(world) # it is a 2 dimensional object, with 177 rows and 11 columns +dim(world) # it is a two-dimensional object, with 177 rows and 11 columns ``` \index{attribute!dropping geometries} @@ -143,7 +143,7 @@ Base R subsetting methods include the operator `[` and the function `subset()`. The key **dplyr** subsetting functions are `filter()` and `slice()` for subsetting rows, and `select()` for subsetting columns. Both approaches preserve the spatial components of attribute data in `sf` objects, while using the operator `$` or the **dplyr** function `pull()` to return a single attribute column as a vector will lose the geometry data, as we will see. \index{attribute!subsetting} -This section focuses on subsetting `sf` data frames; for further details on subsetting vectors and non-geographic data frames we recommend reading section section [2.7](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors) of An Introduction to R [@rcoreteam_introduction_2021] and Chapter [4](https://adv-r.hadley.nz/subsetting.html) of Advanced R Programming [@wickham_advanced_2019], respectively. +This section focuses on subsetting `sf` data frames; for further details on subsetting vectors and non-geographic data frames we recommend reading Section [2.7](https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Index-vectors) of *An Introduction to R* [@rcoreteam_introduction_2021] and chapter [4](https://adv-r.hadley.nz/subsetting.html) of *Advanced R Programming* [@wickham_advanced_2019], respectively. \index{attribute!subsetting} The `[` operator can subset both rows and columns. @@ -292,7 +292,7 @@ world7 = filter(world, area_km2 < 10000) # countries with a small area world7 = filter(world, lifeExp > 82) # with high life expectancy ``` -The standard set of comparison operators can be used in the `filter()` function, as illustrated in Table \@ref(tab:operators): +The standard set of comparison operators can be used in the `filter()` function, as illustrated in Table \@ref(tab:operators). ```{r operators0, echo=FALSE} if (knitr::is_html_output()){ @@ -307,9 +307,8 @@ operators_exp = c("Equal to", "Not equal to", "Greater/Less than", "Greater/Less than or equal", "Logical operators: And, Or, Not") knitr::kable(tibble(Symbol = operators, Name = operators_exp), - caption = paste("Comparison operators that return Booleans", - "(TRUE/FALSE)."), - caption.short = "Comparison operators that return Booleans.", + caption = "Comparison operators that return Boolean (true/false) values.", + caption.short = "Comparison operators.", booktabs = TRUE) ``` @@ -366,7 +365,7 @@ world_agg1 = aggregate(pop ~ continent, FUN = sum, data = world, class(world_agg1) ``` -The result is a non-spatial data frame with six rows, one per continent, and two columns reporting the name and population of each continent (see Table \@ref(tab:continents) with results for the top 3 most populous continents). +The result is a non-spatial data frame with six rows, one per continent, and two columns reporting the name and population of each continent (see Table \@ref(tab:continents) with results for the top three most populous continents). `aggregate()` is a [generic function](https://adv-r.hadley.nz/s3.html#s3-methods) which means that it behaves differently depending on its inputs. **sf** provides the method `aggregate.sf()` which is activated automatically when `x` is an `sf` object and a `by` argument is provided: @@ -378,10 +377,11 @@ class(world_agg2) nrow(world_agg2) ``` -The resulting `world_agg2` object is a spatial object containing 8 features representing the continents of the world (and the open ocean). +The resulting `world_agg2` object is a spatial object containing eight features representing the continents of the world (and the open ocean). \index{attribute!aggregation} -`group_by() |> summarize()` is the **dplyr** equivalent of `aggregate()`, with the variable name provided in the `group_by()` function specifying the grouping variable and information on what is to be summarized passed to the `summarize()` function, as shown below: +`group_by() |> summarize()` is the **dplyr** equivalent of `aggregate()`. +Grouping variables are defined in the `group_by()` function and to aggregation formula is defined in the `summarize()` function, as shown below: ```{r 03-attribute-operations-28} world_agg3 = world |> @@ -389,7 +389,7 @@ world_agg3 = world |> summarize(pop = sum(pop, na.rm = TRUE)) ``` -The approach may seem more complex but it has benefits: flexibility, readability, and control over the new column names. +The approach may seem more complex, but it has benefits: flexibility, readability, and control over the new column names. This flexibility is illustrated in the command below, which calculates not only the population but also the area and number of countries in each continent: ```{r 03-attribute-operations-29} @@ -405,13 +405,13 @@ These aggregating functions return `sf` objects with rows representing continent \index{attribute!subsetting} \index{attribute!aggregation} Let's combine what we have learned so far about **dplyr** functions, by chaining multiple commands to summarize attribute data about countries worldwide by continent. -The following command calculates population density (with `mutate()`), arranges continents by the number countries they contain (with `arrange()`), and keeps only the 3 most populous continents (with `slice_max()`), the result of which is presented in Table \@ref(tab:continents)): +The following command calculates population density (with `mutate()`), arranges continents by the number of countries they contain (with `arrange()`), and keeps only the three most populous continents (with `slice_max()`), the result of which is presented in Table \@ref(tab:continents)): ```{r 03-attribute-operations-30} world_agg5 = world |> st_drop_geometry() |> # drop the geometry for speed select(pop, continent, area_km2) |> # subset the columns of interest - group_by(continent) |> # group by continent and summarize: + group_by(Continent = continent) |> # group by continent and summarize: summarize(Pop = sum(pop, na.rm = TRUE), Area = sum(area_km2), N = n()) |> mutate(Density = round(Pop / Area)) |> # calculate population density slice_max(Pop, n = 3) |> # keep only the top 3 @@ -422,8 +422,8 @@ world_agg5 = world |> options(scipen = 999) knitr::kable( world_agg5, - caption = "The top 3 most populous continents ordered by number of countries.", - caption.short = "Top 3 most populous continents.", + caption = "The top three most populous continents ordered by number of countries.", + caption.short = "Top three most populous continents.", booktabs = TRUE ) ``` @@ -491,7 +491,7 @@ Although there are only 47 rows of data in `coffee_data`, all 177 country record rows in the original dataset with no match are assigned `NA` values for the new coffee production variables. What if we only want to keep countries that have a match in the key variable? \index{attribute!join} -In that case an inner join can be used. +In that case, an inner join can be used. ```{r 03-attribute-operations-35, warning=FALSE} world_coffee_inner = inner_join(world, coffee_data) @@ -568,7 +568,7 @@ world_new$pop_dens = world_new$pop / world_new$area_km2 ``` \index{attribute!create} -Alternatively, we can use one of **dplyr** functions - `mutate()` or `transmute()`. +Alternatively, we can use one of **dplyr** functions: `mutate()` or `transmute()`. `mutate()` adds new columns at the penultimate position in the `sf` object (the last one is reserved for the geometry): ```{r 03-attribute-operations-43, eval=FALSE} @@ -581,7 +581,7 @@ The difference between `mutate()` and `transmute()` is that the latter drops all \index{attribute!create} `unite()` from the **tidyr** package (which provides many useful functions for reshaping datasets, including `pivot_longer()`) pastes together existing columns. For example, we want to combine the `continent` and `region_un` columns into a new column named `con_reg`. -Additionally, we can define a separator (here: a colon `:`) which defines how the values of the input columns should be joined, and if the original columns should be removed (here: `TRUE`). +Additionally, we can define a separator (here, a colon `:`) which defines how the values of the input columns should be joined, and if the original columns should be removed (here, `TRUE`). ```{r 03-attribute-operations-45, eval=FALSE} world_unite = world |> @@ -626,8 +626,8 @@ world_new_names = world |> ``` \index{attribute!create} -Each of these attribute data operations preserve the geometry of the simple features. -Sometimes it makes sense to remove the geometry, for example to speed-up aggregation. +Each of these attribute data operations preserves the geometry of the simple features. +Sometimes, it makes sense to remove the geometry, for example to speed up aggregation. Do this with `st_drop_geometry()`, **not** manually with commands such as `select(world, -geom)`, as shown below.^[ `st_geometry(world_st) = NULL` also works to remove the geometry from `world`, but overwrites the original object. ] @@ -724,9 +724,9 @@ elev[1, 1] elev[1] ``` -Subsetting of multilayered raster objects will return the cell value(s) for each layer. +Subsetting of multi-layered raster objects will return the cell value(s) for each layer. For example, `two_layers = c(grain, elev); two_layers[1]` returns a data frame with one row and two columns --- one for each layer. -To extract all values you can also use `values()`. +To extract all values, you can also use `values()`. Cell values can be modified by overwriting existing values in conjunction with a subsetting operation. The following code chunk, for example, sets the upper left cell of `elev` to 0 (results not shown): @@ -743,7 +743,7 @@ Multiple cells can also be modified in this way: elev[1, c(1, 2)] = 0 ``` -Replacing values of multilayered rasters can be done with a matrix with as many columns as layers and rows as replaceable cells (results not shown): +Replacing values of multi-layered rasters can be done with a matrix with as many columns as layers and rows as replaceable cells (results not shown): ```{r 03-attribute-operations-61b, eval=FALSE} two_layers = c(grain, elev) @@ -765,7 +765,7 @@ global(elev, sd) ``` ```{block2 03-attribute-operations-63, type='rmdnote'} -If you provide the `summary()` and `global()` functions with a multilayered raster object, they will summarize each layer separately, as can be illustrated by running: `summary(c(elev, grain))`. +If you provide the `summary()` and `global()` functions with a multi-layered raster object, they will summarize each layer separately, as can be illustrated by running: `summary(c(elev, grain))`. ``` \index{raster!summarizing} @@ -789,12 +789,12 @@ Descriptive raster statistics belong to the so-called global raster operations. These and other typical raster processing operations are part of the map algebra scheme, which are covered in the next chapter (Section \@ref(map-algebra)). ```{block 03-attribute-operations-65, type='rmdnote'} -Some function names clash between packages (e.g., a function with the name `extract()` exist in both **terra** and **tidyr** packages). +Some function names clash between packages (e.g., functions with the name `extract()` exist in both **terra** and **tidyr** packages). This may lead to unexpected results when loading packages in a different order. In addition to calling functions verbosely with their full namespace (e.g., `tidyr::extract()`) to avoid attaching packages with `library()`, another way to prevent function name clashes is by unloading the offending package with `detach()`. The following command, for example, unloads the **terra** package (this can also be done in the *package* tab which resides by default in the right-bottom pane in RStudio): `detach("package:terra", unload = TRUE, force = TRUE)`. The `force` argument makes sure that the package will be detached even if other packages depend on it. -This, however, may lead to a restricted usability of packages depending on the detached package, and is therefore not recommended. +This, however, may lead to a restricted usability of packages depending on the detached package, and it is therefore not recommended. ``` ## Exercises diff --git a/04-spatial-operations.Rmd b/04-spatial-operations.Rmd index ae6c51496..734e8da63 100644 --- a/04-spatial-operations.Rmd +++ b/04-spatial-operations.Rmd @@ -33,8 +33,8 @@ Spatial operations on raster objects include subsetting --- covered in Section \ *Map algebra* covers a range of operations that modify raster cell values, with or without reference to surrounding cell values. The concept of map algebra, vital for many applications, is introduced in Section \@ref(map-algebra); local, focal and zonal map algebra operations are covered in sections \@ref(local-operations), \@ref(focal-operations), and \@ref(zonal-operations), respectively. Global map algebra operations, which generate summary statistics representing an entire raster dataset, and distance calculations on rasters, are discussed in Section \@ref(global-operations-and-distances). -Next, relation between map algebra and vector operations are discussed in Section \@ref(map-algebra-counterparts-in-vector-processing). -In the final section before the exercises (\@ref(merging-rasters)) the process of merging two raster datasets is discussed and demonstrated with reference to a reproducible example. +Next, the relationships between map algebra and vector operations are discussed in Section \@ref(map-algebra-counterparts-in-vector-processing). +In the Section \@ref(merging-rasters),1 the process of merging two raster datasets is discussed and demonstrated with reference to a reproducible example. ```{block2 04-spatial-operations-2, type='rmdnote'} It is important to note that spatial operations that use two spatial objects rely on both objects having the same coordinate reference system, a topic that was introduced in Section \@ref(crs-intro) and which will be covered in more depth in Chapter \@ref(reproj-geo-data). @@ -50,7 +50,7 @@ Section \@ref(spatial-ras) presents spatial operations on raster datasets using Spatial subsetting is the process of taking a spatial object and returning a new object containing only features that *relate* in space to another object. Analogous to *attribute subsetting* (covered in Section \@ref(vector-attribute-subsetting)), subsets of `sf` data frames can be created with square bracket (`[`) operator using the syntax `x[y, , op = st_intersects]`, where `x` is an `sf` object from which a subset of rows will be returned, `y` is the 'subsetting object' and `, op = st_intersects` is an optional argument that specifies the topological relation (also known as the binary predicate) used to do the subsetting. The default topological relation used when an `op` argument is not provided is `st_intersects()`: the command `x[y, ]` is identical to `x[y, , op = st_intersects]` shown above but not `x[y, , op = st_disjoint]` (the meaning of these and other topological relations is described in the next section). -The `filter()` function from the **tidyverse**\index{tidyverse (package)} can also be used but this approach is more verbose, as we will see in the examples below. +The `filter()` function from the **tidyverse**\index{tidyverse (package)} can also be used, but this approach is more verbose, as we will see in the examples below. \index{vector!subsetting} \index{spatial!subsetting} @@ -62,7 +62,7 @@ canterbury = nz |> filter(Name == "Canterbury") canterbury_height = nz_height[canterbury, ] ``` -```{r nz-subset, echo=FALSE, warning=FALSE, fig.cap="Illustration of spatial subsetting with red triangles representing 101 high points in New Zealand, clustered near the central Canterbuy region (left). The points in Canterbury were created with the `[` subsetting operator (highlighted in gray, right).", fig.scap="Illustration of spatial subsetting.", message=FALSE} +```{r nz-subset, echo=FALSE, warning=FALSE, fig.cap="Spatial subsetting, with red triangles representing 101 high points in New Zealand, clustered near the central Canterbuy region (left). The points in Canterbury were created with the `[` subsetting operator (highlighted in gray, right).", fig.scap="Spatial subsetting.", message=FALSE} library(tmap) p_hpnz1 = tm_shape(nz) + tm_polygons(fill = "white") + @@ -118,11 +118,11 @@ canterbury_height2 = nz_height[sel_logical, ] The above code chunk creates an object of class `sgbp` (a sparse geometry binary predicate, a list of length `x` in the spatial operation) and then converts it into a logical vector `sel_logical` (containing only `TRUE` and `FALSE` values, something that can also be used by **dplyr**'s filter function). \index{binary predicate|see {topological relations}} The function `lengths()` identifies which features in `nz_height` intersect with *any* objects in `y`. -In this case 1 is the greatest possible value but for more complex operations one could use the method to subset only features that intersect with, for example, 2 or more features from the source object. +In this case, 1 is the greatest possible value, but for more complex operations one could use the method to subset only features that intersect with, for example, 2 or more features from the source object. ```{block2 04-spatial-operations-7, type='rmdnote'} Note: another way to return a logical output is by setting `sparse = FALSE` (meaning 'return a dense matrix not a sparse one') in operators such as `st_intersects()`. The command `st_intersects(x = nz_height, y = canterbury, sparse = FALSE)[, 1]`, for example, would return an output identical to `sel_logical`. -Note: the solution involving `sgbp` objects is more generalisable though, as it works for many-to-many operations and has lower memory requirements. +Note: the solution involving `sgbp` objects is more generalizable though, as it works for many-to-many operations and has lower memory requirements. ``` The same result can be also achieved with the **sf** function `st_filter()` which was [created](https://github.com/r-spatial/sf/issues/1148) to increase compatibility between `sf` objects and **dplyr** data manipulation code: @@ -154,7 +154,7 @@ The next section explores different types of spatial relation, also known as bin Topological relations\index{topological relations} describe the spatial relationships between objects. "Binary topological relationships", to give them their full name, are logical statements (in that the answer can only be `TRUE` or `FALSE`) about the spatial relationships between two objects defined by ordered sets of points (typically forming points, lines and polygons) in two or more dimensions [@egenhofer_mathematical_1990]. -That may sound rather abstract and, indeed, the definition and classification of topological relations is based on mathematical foundations first published in book form in 1966 [@spanier_algebraic_1995], with the field of algebraic topology continuing into the 21^st^ century [@dieck_algebraic_2008]. +That may sound rather abstract and, indeed, the definition and classification of topological relations is based on mathematical foundations first published in book form in 1966 [@spanier_algebraic_1995], with the field of algebraic topology continuing beyond the year 2000 [@dieck_algebraic_2008]. Despite their mathematical origins, topological relations can be understood intuitively with reference to visualizations of commonly used functions that test for common types of spatial relationships. Figure \@ref(fig:relations) shows a variety of geometry pairs and their associated relations. @@ -163,7 +163,7 @@ While the relations *equals*, *intersects*, *crosses*, *touches* and *overlaps* Notice that each geometry pair has a "DE-9IM" string such as FF2F11212, described in the next section. \index{topological relations} -```{r relations, echo=FALSE, fig.cap="Topological relations between vector geometries, inspired by Figures 1 and 2 in Egenhofer and Herring (1990). The relations for which the function(x, y) is true are printed for each geometry pair, with x represented in pink and y represented in blue. The nature of the spatial relationship for each pair is described by the Dimensionally Extended 9-Intersection Model string.", fig.show='hold', message=FALSE, fig.asp=0.66, warning=FALSE} +```{r relations, echo=FALSE, fig.cap="Topological relations between vector geometries, inspired by figures 1 and 2 in Egenhofer and Herring (1990). The relations for which the function(x, y) is true are printed for each geometry pair, with x represented in pink and y represented in blue. The nature of the spatial relationship for each pair is described by the Dimensionally Extended 9-Intersection Model string.", fig.show='hold', message=FALSE, fig.asp=0.66, warning=FALSE} # source("https://github.com/geocompx/geocompr/raw/main/code/de_9im.R") source("code/de_9im.R") library(sf) @@ -233,7 +233,7 @@ As we saw in the previous section, a *dense matrix* consisting of `TRUE` or `FAL st_intersects(point_sf, polygon_sfc, sparse = FALSE) ``` -In the above output each row represents a feature in the target (argument `x`) object and each column represents a feature in the selecting object (`y`). +In the above output each row represents a feature in the target (argument `x`) object, and each column represents a feature in the selecting object (`y`). In this case, there is only one feature in the `y` object `polygon_sfc` so the result, which can be used for subsetting as we saw in Section \@ref(spatial-subsetting), has only one column. `st_intersects()` returns `TRUE` even in cases where the features just touch: *intersects*\index{intersects} is a 'catch-all' topological operation which identifies many types of spatial relation, as illustrated in Figure \@ref(fig:relations). @@ -340,7 +340,7 @@ st_distance(nz_height[1:3, ], co) ``` Note that the distance between the second and third features in `nz_height` and the second feature in `co` is zero. -This demonstrates the fact that distances between points and polygons refer to the distance to *any part of the polygon*: +This demonstrates the fact that distances between points and polygons refer to the distance to *any part of the polygon*. The second and third points in `nz_height` are *in* Otago, which can be verified by plotting them (result not shown): ```{r 04-spatial-operations-33, eval=FALSE} @@ -352,8 +352,8 @@ plot(st_geometry(nz_height)[2:3], add = TRUE) Underlying the binary predicates demonstrated in the previous section is the Dimensionally Extended 9-Intersection Model (DE-9IM)\index{topological relations!DE-9IM}. As the cryptic name suggests, this is not an easy topic to understand, but it is worth knowing about because it underlies many spatial operations and enables the creation of custom spatial predicates. -The model was originally labelled "DE + 9IM" by its inventors, referring to the "dimension of the intersections of boundaries, interiors, and exteriors of two features" [@clementini_comparison_1995], but is now referred to as DE-9IM [@shen_classification_2018]. -DE-9IM is applicable to 2-dimensional objects (points, lines and polygons) in Euclidean space, meaning that the model (and software implementing it such as GEOS) assumes you are working with data in a projected coordinate reference system, described in Chapter \@ref(reproj-geo-data). +The model was originally labelled "DE + 9IM" by its inventors, referring to the "dimension of the intersections of boundaries, interiors, and exteriors of two features" [@clementini_comparison_1995], but it is now referred to as DE-9IM [@shen_classification_2018]. +DE-9IM is applicable to two-dimensional objects (points, lines and polygons) in Euclidean space, meaning that the model (and software implementing it such as GEOS) assumes you are working with data in a projected coordinate reference system, described in Chapter \@ref(reproj-geo-data). ```{r de-9im, echo=FALSE, eval=FALSE} # Todo one day: revive this @@ -389,9 +389,9 @@ text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text ``` To demonstrate how DE-9IM strings work, let's take a look at the various ways that the first geometry pair in Figure \@ref(fig:relations) relate. -Figure \@ref(fig:de9imgg) illustrates the 9 intersection model (9IM) which shows the intersections between every combination of each object's interior, boundary and exterior: when each component of the first object `x` is arranged as columns and each component of `y` is arranged as rows, a facetted graphic is created with the intersections between each element highlighted. +Figure \@ref(fig:de9imgg) illustrates the 9-intersection model (9IM) which shows the intersections between every combination of each object's interior, boundary and exterior: when each component of the first object `x` is arranged as columns, and each component of `y` is arranged as rows, a facetted graphic is created with the intersections between each element highlighted. -```{r de9imgg, echo=FALSE, warning=FALSE, fig.cap="Illustration of how the Dimensionally Extended 9 Intersection Model (DE-9IM) works. Colors not in the legend represent the overlap between different components. The thick lines highlight 2 dimensional intesections, e.g., between the boundary of object x and the interior of object y, shown in the middle top facet.", message=FALSE} +```{r de9imgg, echo=FALSE, warning=FALSE, fig.cap="Illustration of how the Dimensionally Extended 9 Intersection Model (DE-9IM) works. Colors not in the legend represent the overlap between different components. The thick lines highlight two-dimensional intersections, e.g., between the boundary of object x and the interior of object y, shown in the middle top facet.", message=FALSE} p1_2 = st_as_sf(c(p1, p3)) ii = st_as_sf(st_intersection(p1, p3)) ii$Object = "Intersection" @@ -488,7 +488,7 @@ ggplot(b9sf) + ``` DE-9IM strings are derived from the dimension of each type of relation. -In this case the red intersections in Figure \@ref(fig:de9imgg) have dimensions of 0 (points), 1 (lines), and 2 (polygons), as shown in Table \@ref(tab:de9emtable). +In this case, the red intersections in Figure \@ref(fig:de9imgg) have dimensions of 0 (points), 1 (lines), and 2 (polygons), as shown in Table \@ref(tab:de9emtable). ```{r de9emtable, echo=FALSE} # See https://github.com/geocompx/geocompr/issues/699 @@ -504,16 +504,16 @@ matrix_de_9im = function(pattern) { m = matrix_de_9im(pattern) colnames(m) = c("Interior (x)", "Boundary (x)", "Exterior (x)") rownames(m) = c("Interior (y)", "Boundary (y)", "Exterior (y)") -knitr::kable(m, caption = "Table showing relations between interiors, boundaries and exteriors of geometries x and y.") +knitr::kable(m, caption = "Relations between interiors, boundaries and exteriors of geometries x and y.") ``` Flattening this matrix 'row-wise' (meaning concatenating the first row, then the second, then the third) results in the string `212111212`. Another example will serve to demonstrate the system: the relation shown in Figure \@ref(fig:relations) (the third polygon pair in the third column and 1st row) can be defined in the DE-9IM system as follows: -- The intersections between the *interior* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of 2, 1 and 2 respectively -- The intersections between the *boundary* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 1 respectively, where 'F' means 'false', the objects are disjoint -- The intersections between the *exterior* of `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 2 respectively: the exterior of the larger object does not touch the interior or boundary of `y`, but the exterior of the smaller and larger objects cover the same area +- The intersections between the *interior* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of 2, 1 and 2, respectively +- The intersections between the *boundary* of the larger object `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 1, respectively, where 'F' means 'false', the objects are disjoint +- The intersections between the *exterior* of `x` and the interior, boundary and exterior of `y` have dimensions of F, F and 2, respectively: the exterior of the larger object does not touch the interior or boundary of `y`, but the exterior of the smaller and larger objects cover the same area These three components, when concatenated, create the string `212`, `FF1`, and `FF2`. This is the same as the result obtained from the function `st_relate()` (see the source code of this chapter to see how other geometries in Figure \@ref(fig:relations) were created): @@ -527,7 +527,7 @@ st_relate(x, y) Understanding DE-9IM strings allows new binary spatial predicates to be developed. The help page `?st_relate` contains function definitions for 'queen' and 'rook' relations in which polygons share a border or only a point, respectively. -'Queen' relations mean that 'boundary-boundary' relations (the cell in the second column and the second row in Table \@ref(tab:de9emtable), or the 5th element of the DE-9IM string) must not be empty, corresponding to the pattern `F***T****`, while for 'rook' relations the same element must be 1 (meaning a linear intersection). +'Queen' relations mean that 'boundary-boundary' relations (the cell in the second column and the second row in Table \@ref(tab:de9emtable), or the 5th element of the DE-9IM string) must not be empty, corresponding to the pattern `F***T****`, while for 'rook' relations, the same element must be 1 (meaning a linear intersection) (see \@ref(queens)). These are implemented as follows: ```{r} @@ -546,7 +546,7 @@ grid_sf$rooks = lengths(st_rook(grid, grid[5])) > 0 plot(grid, col = grid_sf$rooks) ``` -```{r queens, fig.cap="Demonstration of custom binary spatial predicates for finding 'queen' (left) and 'rook' (right) relations to the central square in a grid with 9 geometries.", echo=FALSE, warning=FALSE} +```{r queens, fig.cap="Demonstration of custom binary spatial predicates for finding queen (left) and rook (right) relations to the central square in a grid with 9 geometries.", echo=FALSE, warning=FALSE} st_crs(grid_sf) = "EPSG:2180" tm_shape(grid_sf) + tm_fill(fill = c("queens", "rooks"), @@ -620,7 +620,7 @@ As with attribute data, joining adds new columns to the target object (the argum \index{spatial!join} The process is illustrated by the following example: imagine you have ten points randomly distributed across the Earth's surface and you ask, for the points that are on land, which countries are they in? -Implementing this idea in a [reproducible example](https://github.com/geocompx/geocompr/blob/main/code/04-spatial-join.R) will build your geographic data handling skills and show how spatial joins work. +Implementing this idea in a [reproducible example](https://github.com/geocompx/geocompr/blob/main/code/04-spatial-join.R) will build your geographic data-handling skills and will showhow spatial joins work. The starting point is to create points that are randomly scattered over the Earth's surface. ```{r 04-spatial-operations-19} @@ -637,7 +637,7 @@ random_points = random_df |> The scenario illustrated in Figure \@ref(fig:spatial-join) shows that the `random_points` object (top left) lacks attribute data, while the `world` (top right) has attributes, including country names shown for a sample of countries in the legend. Spatial joins are implemented with `st_join()`, as illustrated in the code chunk below. The output is the `random_joined` object which is illustrated in Figure \@ref(fig:spatial-join) (bottom left). -Before creating the joined dataset, we use spatial subsetting to create `world_random`, which contains only countries that contain random points, to verify the number of country names returned in the joined dataset should be four (Figure \@ref(fig:spatial-join), top right panel). +Before creating the joined dataset, we use spatial subsetting to create `world_random`, which contains only countries that contain random points, to verify that number of country names returned in the joined dataset should be four (Figure \@ref(fig:spatial-join), top right panel). ```{r 04-spatial-operations-20, message=FALSE} world_random = world[random_points, ] @@ -652,13 +652,13 @@ source("code/04-spatial-join.R", print.eval = TRUE) By default, `st_join()` performs a left join, meaning that the result is an object containing all rows from `x` including rows with no match in `y` (see Section \@ref(vector-attribute-joining)), but it can also do inner joins by setting the argument `left = FALSE`. Like spatial subsetting, the default topological operator used by `st_join()` is `st_intersects()`, which can be changed by setting the `join` argument (see `?st_join` for details). The example above demonstrates the addition of a column from a polygon layer to a point layer, but the approach works regardless of geometry types. -In such cases, for example when `x` contains polygons, each of which match multiple objects in `y`, spatial joins will result in duplicate features by creating a new row for each match in `y`. +In such cases, for example when `x` contains polygons, each of which matches multiple objects in `y`, spatial joins will result in duplicate features by creating a new row for each match in `y`. ### Distance-based joins {#non-overlapping-joins} Sometimes two geographic datasets do not intersect but still have a strong geographic relationship due to their proximity. The datasets `cycle_hire` and `cycle_hire_osm`, already attached in the **spData** package, provide a good example. -Plotting them shows that they are often closely related but they do not touch, as shown in Figure \@ref(fig:cycle-hire), a base version of which is created with the following code below: +Plotting them shows that they are often closely related, but they do not touch, as shown in Figure \@ref(fig:cycle-hire), a base version of which is created with the following code below: \index{join!non-overlapping} ```{r 04-spatial-operations-21, eval=FALSE} @@ -743,7 +743,7 @@ The result of this join has used a spatial operation to change the attribute dat ### Spatial aggregation {#spatial-aggr} As with attribute data aggregation, spatial data aggregation *condenses* data: aggregated outputs have fewer rows than non-aggregated inputs. -Statistical *aggregating functions*, such as mean average or sum, summarise multiple values \index{statistics} of a variable, and return a single value per *grouping variable*. +Statistical *aggregating functions*, such as mean average or sum, summarize multiple values \index{statistics} of a variable, and return a single value per *grouping variable*. Section \@ref(vector-attribute-aggregation) demonstrated how `aggregate()` and `group_by() |> summarize()` condense data based on attribute variables, this section shows how the same functions work with spatial objects. \index{aggregation!spatial} @@ -781,7 +781,7 @@ plot(nz_agg2) The resulting `nz_agg` objects have the same geometry as the aggregating object `nz` but with a new column summarizing the values of `x` in each region using the function `mean()`. Other functions could be used instead of `mean()` here, including `median()`, `sd()` and other functions that return a single value per group. -Note: one difference between the `aggregate()` and `group_by() |> summarize()` approaches is that the former results in `NA` values for unmatching region names while the latter preserves region names. +Note: one difference between the `aggregate()` and `group_by() |> summarize()` approaches is that the former results in `NA` values for unmatching region names, while the latter preserves region names. The 'tidy' approach is thus more flexible in terms of aggregating functions and the column names of the results. Aggregating operations that also create new geometries are covered in Section \@ref(geometry-unions). @@ -795,7 +795,7 @@ Often this is the case for administrative boundary data, whereby larger units -- This is problematic for spatial aggregation (and other spatial operations) illustrated in Figure \@ref(fig:areal-example): aggregating the centroid of each sub-zone will not return accurate results. Areal interpolation overcomes this issue by transferring values from one set of areal units to another, using a range of algorithms including simple area weighted approaches and more sophisticated approaches such as 'pycnophylactic' methods [@tobler_smooth_1979]. -```{r areal-example, echo=FALSE, fig.cap="Illustration of congruent (left) and incongruent (right) areal units with respect to larger aggregating zones (translucent red borders).", fig.asp=0.33, fig.scap="Illustration of congruent and incongruent areal units."} +```{r areal-example, echo=FALSE, fig.cap="Congruent (left) and incongruent (right) areal units with respect to larger aggregating zones (translucent red borders).", fig.asp=0.33, fig.scap="Congruent and incongruent areal units."} source("code/04-areal-example.R", print.eval = TRUE) ``` @@ -818,7 +818,7 @@ This would be different for spatially [intensive](https://geodacenter.github.io/ ## Spatial operations on raster data {#spatial-ras} -This section builds on Section \@ref(manipulating-raster-objects), which highlights various basic methods for manipulating raster datasets, to demonstrate more advanced and explicitly spatial raster operations, and uses the objects `elev` and `grain` manually created in Section \@ref(manipulating-raster-objects). +This section builds on Section \@ref(manipulating-raster-objects), which highlights various basic methods for manipulating raster datasets, to demonstrate more advanced and explicitly spatial raster operations, and it uses the objects `elev` and `grain` manually created in Section \@ref(manipulating-raster-objects). For the reader's convenience, these datasets can be also found in the **spData** package. ```{r} @@ -855,7 +855,7 @@ elev[clip] This amounts to retrieving the values of the first raster object (in this case `elev`) that fall within the extent of a second raster (here: `clip`), as illustrated in Figure \@ref(fig:raster-subset). -```{r raster-subset, echo = FALSE, fig.cap = "Original raster (left). Raster mask (middle). Output of masking a raster (right).", fig.scap="Subsetting raster values."} +```{r raster-subset, echo = FALSE, fig.cap = "Original raster (left), raster mask (middle), and output of masking a raster (right).", fig.scap="Subsetting raster values."} knitr::include_graphics("images/04_raster_subset.png") ``` @@ -921,7 +921,7 @@ First, the headers of the raster datasets are queried and (in cases where map al Second, map algebra retains the so-called one-to-one locational correspondence, meaning that cells cannot move. This differs from matrix algebra, in which values change position, for example when multiplying or dividing matrices. -Map algebra (or cartographic modeling with raster data) divides raster operations into four subclasses [@tomlin_geographic_1990], with each working on one or several grids simultaneously: +Map algebra (or cartographic modeling with raster data) divides raster operations into four sub-classes [@tomlin_geographic_1990], with each working on one or several grids simultaneously: 1. *Local* or per-cell operations 2. *Focal* or neighborhood operations. @@ -955,7 +955,7 @@ knitr::include_graphics("images/04-local-operations.png") Another good example of local operations is the classification of intervals of numeric values into groups such as grouping a digital elevation model into low (class 1), middle (class 2) and high elevations (class 3). Using the `classify()` command, we need first to construct a reclassification matrix, where the first column corresponds to the lower and the second column to the upper end of the class. -The third column represents the new value for the specified ranges in column one and two. +The third column represents the new value for the specified ranges in columns one and two. ```{r 04-spatial-operations-40} rcl = matrix(c(0, 12, 1, 12, 24, 2, 24, 36, 3), ncol = 3, byrow = TRUE) @@ -981,7 +981,7 @@ Finally, the `lapp()` function allows us to apply a function to each cell using The calculation of the normalized difference vegetation index (NDVI) is a well-known local (pixel-by-pixel) raster operation. It returns a raster with values between -1 and 1; positive values indicate the presence of living plants (mostly > 0.2). NDVI is calculated from red and near-infrared (NIR) bands of remotely sensed imagery, typically from satellite systems such as Landsat or Sentinel. -Vegetation absorbs light heavily in the visible light spectrum, and especially in the red channel, while reflecting NIR light. Here's the NVDI formula: +Vegetation absorbs light heavily in the visible light spectrum, and especially in the red channel, while reflecting NIR light. Here's the NDVI formula: $$ \begin{split} @@ -989,14 +989,14 @@ NDVI&= \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}\\ \end{split} $$ -Let's calculate NDVI for the multispectral satellite file of the Zion National Park. +Let's calculate NDVI for the multi-spectral satellite file of Zion National Park. ```{r} multi_raster_file = system.file("raster/landsat.tif", package = "spDataLarge") multi_rast = rast(multi_raster_file) ``` -Our raster object has four satellite bands from the Landsat 8 satellite — blue, green, red, and near-infrared (NIR). +Our raster object has four satellite bands from the Landsat 8 satellite: blue, green, red, and NIR. Importantly, Landsat level-2 products are stored as integers to save disk space, and thus we need to convert them to floating-point numbers before doing any calculations. For that purpose, we need to apply a scaling factor (0.0000275) and add an offset (-0.2) to the original values.^[You can read more about it at https://www.usgs.gov/faqs/how-do-i-use-a-scale-factor-landsat-level-2-science-products.] @@ -1005,7 +1005,8 @@ multi_rast = (multi_rast * 0.0000275) - 0.2 ``` The proper values now should be in a range between 0 and 1. -This is not the case here, probably due to the presence of clouds and other atmospheric effects, thus we need to replace below 0 to 0. +This is not the case here, probably due to the presence of clouds and other atmospheric effects, which are stored as negative values. +We will replace these negative values with 0 as follows. ```{r} multi_rast[multi_rast < 0] = 0 @@ -1028,10 +1029,10 @@ That is why we subset the input raster with `multi_rast[[c(4, 3)]]` before doing ndvi_rast = lapp(multi_rast[[c(4, 3)]], fun = ndvi_fun) ``` -The result, shown on the right panel in Figure \@ref(fig:04-ndvi), can be compared to the RGB image of the same area (left panel of the same Figure). +The result, shown on the right panel in Figure \@ref(fig:04-ndvi), can be compared to the RGB image of the same area (left panel of the same figure). It allows us to see that the largest NDVI values are connected to northern areas of dense forest, while the lowest values are related to the lake in the north and snowy mountain ridges. -```{r 04-ndvi, echo=FALSE, fig.cap="RGB image (left) and NDVI values (right) calculated for the example satellite file of the Zion National Park"} +```{r 04-ndvi, echo=FALSE, fig.cap="RGB image (left) and NDVI values (right) calculated for the example satellite file of Zion National Park"} knitr::include_graphics("images/04-ndvi.png") ``` @@ -1045,7 +1046,7 @@ Spatial predictions on raster objects can therefore be made by applying estimate \index{map algebra!focal operations} While local functions operate on one cell, though possibly from multiple layers, **focal** operations take into account a central (focal) cell and its neighbors. -The neighborhood (also named kernel, filter or moving window) under consideration is typically of size 3-by-3 cells (that is the central cell and its eight surrounding neighbors), but can take on any other size or (not necessarily rectangular) shape as defined by the user. +The neighborhood (also named kernel, filter or moving window) under consideration is typically of size 3-by-3 cells (that is the central cell and its eight surrounding neighbors), but it can take on any other size or (not necessarily rectangular) shape as defined by the user. A focal operation applies an aggregation function to all cells within the specified neighborhood, uses the corresponding output as the new value for the central cell, and moves on to the next central cell (Figure \@ref(fig:focal-example)). Other names for this operation are spatial filtering and convolution [@burrough_principles_2015]. @@ -1058,14 +1059,14 @@ Here, we choose the minimum, but any other summary function, including `sum()`, r_focal = focal(elev, w = matrix(1, nrow = 3, ncol = 3), fun = min) ``` -This function also accepts additional arguments, for example, should it remove NAs in the process (`na.rm = TRUE`) or not (`na.rm = FALSE`). +The `min()` function has an additional argument to determine whether to remove NAs in the process (`na.rm = TRUE`) or not (`na.rm = FALSE`, the default). -```{r focal-example, echo = FALSE, fig.cap = "Input raster (left) and resulting output raster (right) due to a focal operation - finding the minimum value in 3-by-3 moving windows.", fig.scap="Illustration of a focal operation."} +```{r focal-example, echo = FALSE, fig.cap = "Input raster (left) and resulting output raster (right) due to a focal operation, finding the minimum value in 3-by-3 moving windows.", fig.scap="Illustration of a focal operation."} knitr::include_graphics("images/04_focal_example.png") ``` We can quickly check if the output meets our expectations. -In our example, the minimum value has to be always the upper left corner of the moving window (remember we have created the input raster by row-wise incrementing the cell values by one starting at the upper left corner). +In our example, the minimum value has to be always the upper left corner of the moving window (remember, we have created the input raster by row-wise incrementing the cell values by one starting at the upper left corner). In this example, the weighting matrix consists only of 1s, meaning each cell has the same weight on the output, but this can be changed. Focal functions or filters play a dominant role in image processing. @@ -1086,18 +1087,18 @@ Chapter \@ref(gis) shows how to access such GIS functionality from within R. Just like focal operations, *zonal* operations apply an aggregation function to multiple raster cells. However, a second raster, usually with categorical values, defines the *zonal filters* (or 'zones') in the case of zonal operations, as opposed to a neighborhood window in the case of focal operations presented in the previous section. Consequently, raster cells defining the zonal filter do not necessarily have to be neighbors. -Our grain size raster is a good example, as illustrated in the right panel of Figure \@ref(fig:cont-raster): different grain sizes are spread irregularly throughout the raster. +Our grain-size raster is a good example, as illustrated in the right panel of Figure \@ref(fig:cont-raster): different grain sizes are spread irregularly throughout the raster. Finally, the result of a zonal operation is a summary table grouped by zone which is why this operation is also known as *zonal statistics* in the GIS world\index{GIS}. This is in contrast to focal operations which return a raster object by default. -The following code chunk uses the `zonal()` function to calculate the mean elevation associated with each grain size class. +The following code chunk uses the `zonal()` function to calculate the mean elevation associated with each grain-size class. ```{r 04-spatial-operations-43} z = zonal(elev, grain, fun = "mean") z ``` -This returns the statistics\index{statistics} for each category, here the mean altitude for each grain size class. +This returns the statistics\index{statistics} for each category, here the mean altitude for each grain-size class. Note that it is also possible to get a raster with calculated statistics for each zone by setting the `as.raster` argument to `TRUE`. ### Global operations and distances @@ -1108,9 +1109,9 @@ The most common global operations are descriptive statistics\index{statistics} f Aside from that, global operations are also useful for the computation of distance and weight rasters. In the first case, one can calculate the distance from each cell to a specific target cell. For example, one might want to compute the distance to the nearest coast (see also `terra::distance()`). -We might also want to consider topography, that means, we are not only interested in the pure distance but would like also to avoid the crossing of mountain ranges when going to the coast. -To do so, we can weight the distance with elevation so that each additional altitudinal meter 'prolongs' the Euclidean distance (in Exercises 8 and 9 at the end of this chapter you will do exactly that). -Visibility and viewshed computations also belong to the family of global operations (in the exercises of Chapter \@ref(gis), you will compute a viewshed raster). +We might also want to consider topography, for example to avoid the crossing mountain ranges on the way to the coast. +This can be done by weighting distance by elevation so that each additional altitudinal meter 'prolongs' the Euclidean distance (in Exercises E8 and E9 at the end of this chapter you will do exactly that). +Visibility and viewshed computations also belong to the family of global operations (in the Exercises of Chapter \@ref(gis), you will compute a viewshed raster). ### Map algebra counterparts in vector processing @@ -1128,13 +1129,13 @@ Zonal operations dissolve the cells of one raster in accordance with the zones ( ### Merging rasters \index{raster!merge} -Suppose we would like to compute the NDVI (see Section \@ref(local-operations)), and additionally want to compute terrain attributes from elevation data for observations within a study area. +Suppose we would like to compute the NDVI (see Section \@ref(local-operations)), and additionally we want to compute terrain attributes from elevation data for observations within a study area. Such computations rely on remotely sensed information. The corresponding imagery is often divided into scenes covering a specific spatial extent, and frequently, a study area covers more than one scene. Then, we would need to merge the scenes covered by our study area. In the easiest case, we can just merge these scenes, that is put them side by side. This is possible, for example, with digital elevation data. -In the following code chunk we first download the SRTM elevation data for Austria and Switzerland (for the country codes, see the **geodata** function `country_codes()`). +In the following code chunk, we first download the Shuttle Radar Topography Mission (SRTM) elevation data for Austria and Switzerland (for the country codes, see the **geodata** function `country_codes()`). In a second step, we merge the two rasters into one. ```{r 04-spatial-operations-44, eval = FALSE} @@ -1147,9 +1148,9 @@ aut_ch = merge(aut, ch) The merging approach is of little use when the overlapping values do not correspond to each other. This is frequently the case when you want to combine spectral imagery from scenes that were taken on different dates. -The `merge()` command will still work but you will see a clear border in the resulting image. +The `merge()` command will still work, but you will see a clear border in the resulting image. On the other hand, the `mosaic()` command lets you define a function for the overlapping area. -For instance, we could compute the mean value -- this might smooth the clear border in the merged result but it will most likely not make it disappear. +For instance, we could compute the mean value --- this might smooth the clear border in the merged result, but it will most likely not make it disappear. For a more detailed introduction to remote sensing with R, see @wegmann_remote_2016. ## Exercises diff --git a/05-geometry-operations.Rmd b/05-geometry-operations.Rmd index eea736844..37047ab6c 100644 --- a/05-geometry-operations.Rmd +++ b/05-geometry-operations.Rmd @@ -40,7 +40,7 @@ Importantly it also shows how to 'polygonize' rasters and 'rasterize' vector dat ## Geometric operations on vector data {#geo-vec} This section is about operations that in some way change the geometry of vector (`sf`) objects. -It is more advanced than the spatial data operations presented in the previous chapter (in Section \@ref(spatial-vec)), because here we drill down into the geometry: +It is more advanced than the spatial data operations presented in the previous chapter (Section \@ref(spatial-vec)), because here we drill down into the geometry: the functions discussed in this section work on objects of class `sfc` in addition to objects of class `sf`. ### Simplification @@ -876,7 +876,7 @@ The main resampling methods include: - Nearest neighbor: assigns the value of the nearest cell of the original raster to the cell of the target one. This is a fast simple technique that is usually suitable for resampling categorical rasters. - Bilinear interpolation: assigns a weighted average of the four nearest cells from the original raster to the cell of the target one (Figure \@ref(fig:bilinear)). This is the fastest method that is appropriate for continuous rasters. - Cubic interpolation: uses values of the 16 nearest cells of the original raster to determine the output cell value, applying third-order polynomial functions. Used for continuous rasters and results in a smoother surface compared to bilinear interpolation but is computationally more demanding. -- Cubic spline interpolation: also uses values of the 16 nearest cells of the original raster to determine the output cell value but applies cubic splines (piece-wise third-order polynomial functions). Used for continuous rasters. +- Cubic spline interpolation: also uses values of the 16 nearest cells of the original raster to determine the output cell value, but applies cubic splines (piece-wise third-order polynomial functions). Used for continuous rasters. - Lanczos windowed sinc resampling: uses values of the 36 nearest cells of the original raster to determine the output cell value. Used for continuous rasters.^[ More detailed explanation of this method can be found at https://gis.stackexchange.com/a/14361/20955. ] diff --git a/06-raster-vector.Rmd b/06-raster-vector.Rmd index 093931896..8192f5085 100644 --- a/06-raster-vector.Rmd +++ b/06-raster-vector.Rmd @@ -56,7 +56,7 @@ srtm_cropped = crop(srtm, zion) \index{raster!masking} Related to `crop()` is the **terra** function `mask()`, which sets values outside of the bounds of the object passed to its second argument to `NA`. -The following command therefore masks every cell outside of the Zion National Park boundaries (Figure \@ref(fig:cropmask)(C)). +The following command therefore masks every cell outside of Zion National Park boundaries (Figure \@ref(fig:cropmask)(C)). ```{r 06-raster-vector-4 } srtm_masked = mask(srtm, zion) @@ -119,7 +119,7 @@ The reverse of raster extraction --- assigning raster cell values based on vecto \index{raster!extraction points} The basic example is of extracting the value of a raster cell at specific **points**. -For this purpose, we will use `zion_points`, which contain a sample of 30 locations within the Zion National Park (Figure \@ref(fig:pointextr)). +For this purpose, we will use `zion_points`, which contain a sample of 30 locations within Zion National Park (Figure \@ref(fig:pointextr)). The following command extracts elevation values from `srtm` and creates a data frame with points' IDs (one value per vector's row) and related `srtm` values for each point. Now, we can add the resulting object to our `zion_points` dataset with the `cbind()` function: @@ -483,7 +483,7 @@ grain_poly = as.polygons(grain) |> st_as_sf() ``` -```{r 06-raster-vector-40, echo=FALSE, fig.cap="Vectorization of (A) raster into (B) polygons (dissolve = FALSE) and aggregated polygons (dissolve = TRUE).", warning=FALSE, message=FALSE, fig.asp=0.4, fig.scap="Illustration of vectorization."} +```{r 06-raster-vector-40, echo=FALSE, fig.cap="Vectorization of (A) raster into (B) polygons (dissolve = FALSE) and aggregated polygons (dissolve = TRUE).", warning=FALSE, message=FALSE, fig.asp=0.4, fig.scap="Vectorization."} source("code/06-raster-vectorization2.R", print.eval = TRUE) ``` diff --git a/07-reproj.Rmd b/07-reproj.Rmd index 441687bcd..a46c114c7 100644 --- a/07-reproj.Rmd +++ b/07-reproj.Rmd @@ -25,7 +25,7 @@ It demonstrates how to set and *transform* geographic data from one CRS to anoth \index{CRS!projected} In many projects there is no need to worry about, let alone convert between, different CRSs. -Nonetheless, it is important to know if your data is in a projected or geographic coordinate reference system, and the consequences of this for geometry operations. +Nonetheless, it is important to know if your data is in a projected or geographic CRS, and the consequences of this for geometry operations. If you know this information, CRSs should *just work* behind the scenes: people often suddenly need to learn about CRSs when things go wrong. Having a clearly defined CRS that all project data is in, and understanding how and why to use different CRSs, can ensure that things don't go wrong. Furthermore, learning about coordinate systems will deepen your knowledge of geographic datasets and how to use them effectively. @@ -518,7 +518,7 @@ london2 = st_transform(london_geo, "EPSG:27700") Now that a transformed version of `london` has been created, using the **sf** function `st_transform()`, the distance between the two representations of London can be found.^[ An alternative to `st_transform()` is `st_transform_proj()` from the **lwgeom**, which enables transformations and which bypasses GDAL and can support projections not supported by GDAL. -However, at the time of writing (2022) we could not find any projections supported by `st_transform_proj()` but not supported by `st_transform()`. +However, at the time of writing (2024) we could not find any projections supported by `st_transform_proj()` but not supported by `st_transform()`. ] It may come as a surprise that `london` and `london2` are over 2 km apart!^[ The difference in location between the two points is not due to imperfections in the transforming operation (which is in fact very accurate) but the low precision of the manually-created coordinates that created `london` and `london_proj`. diff --git a/08-read-write-plot.Rmd b/08-read-write-plot.Rmd index b939fbf9a..bfdddc8d6 100644 --- a/08-read-write-plot.Rmd +++ b/08-read-write-plot.Rmd @@ -271,7 +271,7 @@ It is fast and flexible, but it may be worth looking at other packages such as * ### Raster data {#raster-data-read} \index{raster!data input} -Similar to vector data, raster data comes in many file formats with some supporting multilayer files. +Similar to vector data, raster data comes in many file formats with some supporting multi-layerfiles. **terra**'s `rast()` command reads in a single layer when a file with just one layer is provided. ```{r 07-read-write-plot-24, message=FALSE} @@ -279,7 +279,7 @@ raster_filepath = system.file("raster/srtm.tif", package = "spDataLarge") single_layer = rast(raster_filepath) ``` -It also works in case you want to read a multilayer file. +It also works in case you want to read a multi-layerfile. ```{r 07-read-write-plot-25} multilayer_filepath = system.file("raster/landsat.tif", package = "spDataLarge") @@ -519,7 +519,7 @@ usa_sf = ne_countries(country = "United States of America", returnclass = "sf") Country borders can be also accessed with other packages, such as **geodata**, **giscoR**, or **rgeoboundaries**. A second example downloads a series of rasters containing global monthly precipitation sums with spatial resolution of 10 minutes (~18.5 km at the equator) using the **geodata** package [@R-geodata]. -The result is a multilayer object of class `SpatRaster`. +The result is a multi-layerobject of class `SpatRaster`. ```{r 07-read-write-plot-5, eval=FALSE} library(geodata) @@ -632,7 +632,7 @@ Map image representations, such as tiles, can be accessed with the Web Map Servi Metadata is also covered by means of the Catalogue Service for the Web (CSW)\index{geographic web services!CSW}. Finally, standard processing is handled through the Web Processing Service (WPS)\index{geographic web services!WPS} or the the Web Coverage Processing Service (WCPS)\index{geographic web services!WCPS}. -Various open-source projects have adopted these protocols, such as [GeoServer](https://geoserver.org/) and [MapServer](https://mapserver.org/) for data handling, or [GeoNetwork](https://geonetwork-opensource.org/) and [PyCSW](https://pycsw.org/) for metadata handling, leading to standardization of queries. +Various open-source projects have adopted these protocols, such as [GeoServer](https://geoserver.org/) and [MapServer](https://mapserver.org/) for data-handling, or [GeoNetwork](https://geonetwork-opensource.org/) and [PyCSW](https://pycsw.org/) for metadata-handling, leading to standardization of queries. Integrated tools for Spatial Data Infrastructures (SDI), such as [GeoNode](https://geonode.org/), [GeOrchestra](https://www.georchestra.org/) or [Examind](https://www.examind.com/) have also adopted these standard webservices, either directly or by using previously mentioned open-source tools. Like other web APIs, OWS APIs use a 'base URL', an 'endpoint' and 'URL query arguments' following a `?` to request data (see the [`best-practices-api-packages`](https://httr.r-lib.org/articles/api-packages.html) vignette in the **httr** package). diff --git a/09-mapping.Rmd b/09-mapping.Rmd index 01e27d16d..99f33191a 100644 --- a/09-mapping.Rmd +++ b/09-mapping.Rmd @@ -285,7 +285,7 @@ tm_shape(nz) + tm_polygons(fill = "Median_income", fill.scale = tm_scale(values = "BuGn")) ``` -```{r tmpal, message=FALSE, fig.cap="Illustration of color settings. The results show (from left to right): default settings, manual breaks, n breaks, and the impact of changing the palette.", fig.scap="Illustration of color settings.", echo=FALSE, fig.asp=0.56, warning=FALSE} +```{r tmpal, message=FALSE, fig.cap="Color settings. The results show (from left to right): default settings, manual breaks, n breaks, and the impact of changing the palette.", fig.scap="Color settings.", echo=FALSE, fig.asp=0.56, warning=FALSE} source("https://github.com/geocompx/geocompr/raw/main/code/09-tmpal.R", print.eval = TRUE) ``` @@ -328,7 +328,7 @@ Other options are listed below and presented in Figure \@ref(fig:break-styles). Although `style` is an argument of **tmap** functions, in fact it originates as an argument in `classInt::classIntervals()` --- see the help page of this function for details. ``` -```{r break-styles, message=FALSE, fig.cap="Different interval scale methods set using the style argument in tmap.", fig.scap="Illustration of different binning methods using tmap.", echo=FALSE, warning=FALSE, fig.width=8} +```{r break-styles, message=FALSE, fig.cap="Different interval scale methods set using the style argument in tmap.", fig.scap="Different binning methods using tmap.", echo=FALSE, warning=FALSE, fig.width=8} source("code/09-break-styles.R", print.eval = TRUE) ``` diff --git a/11-algorithms.Rmd b/11-algorithms.Rmd index 76d64c90b..4df9f9cb3 100644 --- a/11-algorithms.Rmd +++ b/11-algorithms.Rmd @@ -6,7 +6,7 @@ source("code/before_script.R") ## Prerequisites {-} -This chapter has minimal software prerequisites as it primarily uses base R. +This chapter has minimal software prerequisites, as it primarily uses base R. Only, the **sf**\index{sf} package is used to check the results of an algorithm we will develop to calculate the area of polygons. In terms of prior knowledge, this chapter assumes you have an understanding of the geographic classes introduced in Chapter \@ref(spatial-class) and how they can be used to represent a wide range of input file formats (see Chapter \@ref(read-write)). @@ -199,7 +199,7 @@ C1 = (T1[1,] + T1[2,] + T1[3,]) / 3 C1_alternative = (T1[1, , drop = FALSE] + T1[2, , drop = FALSE] + T1[3, , drop = FALSE]) / 3 ``` -```{r polymat, echo=FALSE, fig.cap="Illustration of polygon centroid calculation problem.", fig.height="100", warning=FALSE} +```{r polymat, echo=FALSE, fig.cap="Polygon centroid calculation problem.", fig.height="100", warning=FALSE} # initial plot: can probably delete this: old_par = par(pty = "s") plot(poly_mat, cex = 3) @@ -282,7 +282,7 @@ Algorithm\index{algorithm} development is hard. This should be apparent from the amount of work that has gone into developing a centroid\index{centroid} algorithm\index{algorithm} in base R\index{R} that is just one, rather inefficient, approach to the problem with limited real-world applications (convex polygons are uncommon in practice). The experience should lead to an appreciation of low-level geographic libraries such as GEOS\index{GEOS} and CGAL\index{CGAL} (the Computational Geometry Algorithms Library) which not only run fast but work on a wide range of input geometry types. A great advantage of the open source nature of such libraries is that their source code\index{source code} is readily available for study, comprehension and (for those with the skills and confidence) modification.^[ -The CGAL\index{CGAL} function `CGAL::centroid()` is in fact composed of 7 sub-functions as described at https://doc.cgal.org/latest/Kernel_23/group__centroid__grp.html allowing it to work on a wide range of input data types, whereas the solution we created works only on a very specific input data type. +The CGAL\index{CGAL} function `CGAL::centroid()` is in fact composed of seven sub-functions as described at https://doc.cgal.org/latest/Kernel_23/group__centroid__grp.html allowing it to work on a wide range of input data types, whereas the solution we created works only on a very specific input data type. The source code underlying GEOS\index{GEOS} function `Centroid::getCentroid()` can be found at https://github.com/libgeos/geos/search?q=getCentroid. ] diff --git a/13-transport.Rmd b/13-transport.Rmd index 21d1412ba..cc70aea64 100644 --- a/13-transport.Rmd +++ b/13-transport.Rmd @@ -119,7 +119,7 @@ View(cw0103) Like many cities, Bristol has major congestion, air quality and physical inactivity problems. Cycling can tackle all of these issues efficiently: it has a greater potential to replace car trips than walking, with typical [speeds](https://en.wikipedia.org/wiki/Bicycle_performance) of 15-20 km/h vs 4-6 km/h for walking. -For this reason Bristol's [Transport Strategy](https://www.bristol.gov.uk/council-and-mayor/policies-plans-and-strategies/bristol-transport-strategy) has ambitious plans for cycling. +For this reason, Bristol's [Transport Strategy](https://www.bristol.gov.uk/council-and-mayor/policies-plans-and-strategies/bristol-transport-strategy) has ambitious plans for cycling. To highlight the importance of policy considerations in transportation research, this chapter is guided by the need to provide evidence for people (transport planners, politicians and other stakeholders) tasked with getting people out of cars and onto more sustainable modes --- walking and cycling in particular. The broader aim is to demonstrate how geocomputation can support evidence-based transport planning. @@ -505,8 +505,8 @@ While R users can access CycleStreets routes via the package [**cyclestreets**]( ### Contraction hierarchies and traffic assigment Contraction hierarchies and traffic assignment are advanced but important topics in transport modeling worth being aware of, especially if you want your code to scale to large networks. -Calculating many routes is computationally resource intensive and can take hours, leading to the development of several algorithms to speed-up routing calculations. -**Contraction hierarchies** is a well-known algorithm that can lead to a substantial (1000x+ in some cases) speed-up in routing tasks, depending on network size. +Calculating many routes is computationally resource intensive and can take hours, leading to the development of several algorithms to speed up routing calculations. +**Contraction hierarchies** is a well-known algorithm that can lead to a substantial (1000x+ in some cases) speed up in routing tasks, depending on network size. Contraction hierarchies are used behind the scenes in the routing engines mentioned in the previous sections. Traffic assignment is a problem that is closely related to routing: in practice, the shortest path between two points is not always the fastest, especially if there is congestion. @@ -691,7 +691,7 @@ ways_centrality = ways_sfn |> mutate(betweenness = tidygraph::centrality_edge_betweenness(lengths)) ``` -```{r wayssln, fig.cap="Illustration of route network datasets. The grey lines represent a simplified road network, with segment thickness proportional to betweenness. The green lines represent potential cycling flows (one way) calculated with the code above.", fig.scap="Illustration of a small route network.", echo=FALSE} +```{r wayssln, fig.cap="Illustration of route network datasets. The grey lines represent a simplified road network, with segment thickness proportional to betweenness. The green lines represent potential cycling flows (one way) calculated with the code above.", fig.scap="A small route network.", echo=FALSE} bb_wayssln = tmaptools::bb(route_network_scenario, xlim = c(0.1, 0.9), ylim = c(0.1, 0.6), relative = TRUE) tm_shape(zones_od) + tm_fill(fill_alpha = 0.2, lwd = 0.1) + diff --git a/15-eco.Rmd b/15-eco.Rmd index af07ad5e7..4e285c078 100644 --- a/15-eco.Rmd +++ b/15-eco.Rmd @@ -181,7 +181,7 @@ ep = qgisprocess::qgis_run_algorithm( ``` This returns a list named `ep` containing the paths to the computed output rasters. -Let's read in catchment area as well as catchment slope into a multilayer `SpatRaster` object (see Section \@ref(raster-classes)). +Let's read in catchment area as well as catchment slope into a multi-layer`SpatRaster` object (see Section \@ref(raster-classes)). Additionally, we will add two more raster objects to it, namely `dem` and `ndvi`. ```{r 15-eco-7, eval=FALSE} @@ -191,7 +191,7 @@ ep = ep[c("AREA", "SLOPE")] |> rast() names(ep) = c("carea", "cslope") # assign better names origin(ep) = origin(dem) # make sure rasters have the same origin -ep = c(dem, ndvi, ep) # add dem and ndvi to the multilayer SpatRaster object +ep = c(dem, ndvi, ep) # add dem and ndvi to the multi-layerSpatRaster object ``` Additionally, the catchment area\index{catchment area} values are highly skewed to the right (`hist(ep$carea)`). @@ -534,7 +534,7 @@ autotuner_rf$predict(task) ``` The `predict` method will apply the model to all observations used in the modeling. -Given a multilayer `SpatRaster` containing rasters named as the predictors used in the modeling, `terra::predict()` will also make spatial distribution maps, i.e., predict to new data. +Given a multi-layer`SpatRaster` containing rasters named as the predictors used in the modeling, `terra::predict()` will also make spatial distribution maps, i.e., predict to new data. ```{r 15-eco-28, cache=TRUE, cache.lazy=FALSE, eval=FALSE} pred = terra::predict(ep, model = autotuner_rf, fun = predict) diff --git a/16-synthesis.Rmd b/16-synthesis.Rmd index b8846e995..51d4d4abd 100644 --- a/16-synthesis.Rmd +++ b/16-synthesis.Rmd @@ -84,7 +84,7 @@ length(pkgs_char) The rate of evolution in R's spatial ecosystem may be fast, but there are strategies to deal with the wide range of options. Our advice is to start by learning one approach *in depth* but to have a general understanding of the *breadth* of available options. -This advice applies equally to solving geographic problems with R as it does to other fields of knowledge and application. +This advice applies equally to solving geographic problems with R, as it does to other fields of knowledge and application. Section \@ref(next) covers developments in other languages. Of course, some packages perform better than others for the *same* task, in which case it's important to know which to use. diff --git a/README.Rmd b/README.Rmd index 07c43eb37..28e2b5b01 100644 --- a/README.Rmd +++ b/README.Rmd @@ -48,7 +48,7 @@ Since commencing work on the Second Edition in September 2021 much has changed, - Improve the experience of using the book in Binder (ideal for trying out the code before installing or updating the necessary R packages), as documented in issue [#691](https://github.com/geocompx/geocompr/issues/691) (thanks to [yuvipanda](https://github.com/yuvipanda)) - Improved communication of binary spatial predicates in Chapter 4 (see [#675](https://github.com/geocompx/geocompr/pull/675)) - New section on the links between subsetting and clipping (see [#698](https://github.com/geocompx/geocompr/pull/698)) in Chapter 5 -- New [section](https://r.geocompx.org/spatial-operations.html#de-9im-strings) on the dimensionally extended 9 intersection model (DE-9IM) +- New [section](https://r.geocompx.org/spatial-operations.html#de-9im-strings) on the dimensionally extended 9-intersection model (DE-9IM) - New [chapter](https://r.geocompx.org/raster-vector.html) on raster-vector interactions split out from Chapter 5 - New [section](https://r.geocompx.org/spatial-class.html#the-sfheaders-package) on the **sfheaders** package - New [section](https://r.geocompx.org/spatial-class.html#s2) in Chapter 2 on spherical geometry engines and the **s2** package @@ -64,7 +64,7 @@ Since commencing work on the Second Edition in September 2021 much has changed, See [https://github.com/geocompx/geocompr/compare/1.9...main](https://github.com/geocompx/geocompr/compare/1.9...main#files_bucket) for a continuously updated summary of the changes to date. -At the time of writing (March 2022) there have been about 40k lines of code/prose added, lots of refactoring! +This shows 20k+ lines of code/prose added, lots of refactoring! Contributions are very welcome. diff --git a/README.md b/README.md index 2b1abaf5f..0f3e7c903 100644 --- a/README.md +++ b/README.md @@ -63,7 +63,7 @@ changed, including: [\#698](https://github.com/geocompx/geocompr/pull/698)) in Chapter 5 - New [section](https://r.geocompx.org/spatial-operations.html#de-9im-strings) - on the dimensionally extended 9 intersection model (DE-9IM) + on the dimensionally extended 9-intersection model (DE-9IM) - New [chapter](https://r.geocompx.org/raster-vector.html) on raster-vector interactions split out from Chapter 5 - New diff --git a/_03-ex.Rmd b/_03-ex.Rmd index 0d644401a..1854d6f32 100644 --- a/_03-ex.Rmd +++ b/_03-ex.Rmd @@ -48,7 +48,7 @@ us_states |> select(contains("total_pop")) us_states |> select(matches("tal_p")) ``` -E3. Find all states with the following characteristics (bonus find *and* plot them): +E3. Find all states with the following characteristics (bonus: find *and* plot them): - Belong to the Midwest region. - Belong to the West region, have an area below 250,000 km^2^ *and* in 2015 a population greater than 5,000,000 residents (hint: you may need to use the function `units::set_units()` or `as.numeric()`). @@ -107,7 +107,7 @@ class(us_states_stats) ``` E8. `us_states_df` has two more rows than `us_states`. -How can you find them? (hint: try to use the `dplyr::anti_join()` function) +How can you find them? (Hint: try to use the `dplyr::anti_join()` function.) ```{r 03-ex-e8} us_states_df |> @@ -141,7 +141,7 @@ us_states %>% ``` E12. Using `us_states` and `us_states_df` create a new object called `us_states_sel`. -The new object should have only two variables - `median_income_15` and `geometry`. +The new object should have only two variables: `median_income_15` and `geometry`. Change the name of the `median_income_15` column to `Income`. ```{r 03-ex-e12} @@ -185,7 +185,7 @@ us_pov_change |> as.character() ``` -E15. Create a raster from scratch with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84). +E15. Create a raster from scratch, with nine rows and columns and a resolution of 0.5 decimal degrees (WGS84). Fill it with random numbers. Extract the values of the four corner cells. diff --git a/_04-ex.Rmd b/_04-ex.Rmd index bcb6f1076..97a748267 100644 --- a/_04-ex.Rmd +++ b/_04-ex.Rmd @@ -65,7 +65,7 @@ nz_height_combined |> E4. Test your knowledge of spatial predicates by finding out and plotting how US states relate to each other and other spatial objects. The starting point of this exercise is to create an object representing Colorado state in the USA. Do this with the command -`colorado = us_states[us_states$NAME == "Colorado",]` (base R) or with with the `filter()` function (tidyverse) and plot the resulting object in the context of US states. +`colorado = us_states[us_states$NAME == "Colorado",]` (base R) or with the `filter()` function (tidyverse) and plot the resulting object in the context of US states. - Create a new object representing all the states that geographically intersect with Colorado and plot the result (hint: the most concise way to do this is with the subsetting method `[`). - Create another object representing all the objects that touch (have a shared boundary with) Colorado and plot the result (hint: remember you can use the argument `op = st_intersects` and other spatial relations during spatial subsetting operations in base R). diff --git a/_15-ex.Rmd b/_15-ex.Rmd index 396e186d7..e7f708ebc 100644 --- a/_15-ex.Rmd +++ b/_15-ex.Rmd @@ -90,7 +90,7 @@ ep = ep[c("AREA", "SLOPE")] |> names(ep) = c("carea", "cslope") # make sure all rasters share the same origin origin(ep) = origin(dem) -# add dem and ndvi to the multilayer SpatRaster object +# add dem and ndvi to the multi-layerSpatRaster object ep = c(dem, ndvi, ep) ep$carea = log10(ep$carea) diff --git a/code/chapters/02-spatial-data.R b/code/chapters/02-spatial-data.R index be762bbc7..12638fc8c 100644 --- a/code/chapters/02-spatial-data.R +++ b/code/chapters/02-spatial-data.R @@ -40,7 +40,7 @@ library(spDataLarge) # load larger geographic data ## source("https://github.com/Robinlovelace/geocompr/raw/main/code/02-vectorplots.R") # generate subsequent figure -## ----vectorplots, fig.cap="Illustration of vector (point) data in which location of London (the red X) is represented with reference to an origin (the blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula.", out.width="49%", fig.show='hold', echo=FALSE, fig.scap="Illustration of vector (point) data."---- +## ----vectorplots, fig.cap="Vector (point) data in which location of London (red X) is represented with reference to an origin (blue circle). The left plot represents a geographic CRS with an origin at 0° longitude and latitude. The right plot represents a projected CRS with an origin located in the sea west of the South West Peninsula.", out.width="49%", fig.show='hold', echo=FALSE, fig.scap="Vector (point) data."---- knitr::include_graphics(c("images/vector_lonlat.png", "images/vector_projected.png")) @@ -68,7 +68,7 @@ class(world) names(world) -## ----world-all, fig.cap="A spatial plot of the world using the sf package, with a facet for each attribute.", warning=FALSE, fig.scap="A spatial plot of the world using the sf package."---- +## ----world-all, fig.cap="Map of the world using the sf package, with a facet for each attribute.", warning=FALSE, fig.scap="Map of the world using the sf package."---- plot(world) @@ -123,7 +123,7 @@ world_asia = world[world$continent == "Asia", ] asia = st_union(world_asia) -## ----asia, out.width='50%', fig.cap="A plot of Asia added as a layer on top of countries worldwide.", eval=FALSE---- +## ----asia, out.width='50%', fig.cap="Plot of Asia added as a layer on top of countries worldwide.", eval=FALSE---- ## plot(world["pop"], reset = FALSE) ## plot(asia, add = TRUE, col = "red") @@ -161,7 +161,7 @@ par(old_par) ## waldo::compare(st_geometry(world), world[0]) -## ----sfcs, echo=FALSE, fig.cap="Illustration of point, linestring and polygon geometries.", fig.asp=0.4---- +## ----sfcs, echo=FALSE, fig.cap="Point, linestring and polygon geometries.", fig.asp=0.4---- old_par = par(mfrow = c(1, 3), pty = "s", mar = c(0, 3, 1, 0)) plot(st_as_sfc(c("POINT(5 2)")), axes = TRUE, main = "POINT") plot(st_as_sfc("LINESTRING(1 5, 4 4, 4 1, 2 2, 3 2)"), axes = TRUE, main = "LINESTRING") @@ -443,9 +443,9 @@ sf_use_s2(TRUE) ## These edge cases include operations on polygons that are not valid according to S2's stricter definition. -## If you see error message such as `#> Error in s2_geography_from_wkb ...` it may be worth trying the command that generated the error message again, after turning off S2. +## If you see error messages such as `#> Error in s2_geography_from_wkb ...` it may be worth trying the command that generated the error message again, after turning off S2. -## To turn off S2 for the entirety of a project you can create a file called .Rprofile in the root directory (the main folder) of your project containing the command `sf::sf_use_s2(FALSE)`. +## To turn off S2 for the entirety of a project, you can create a file called .Rprofile in the root directory (the main folder) of your project containing the command `sf::sf_use_s2(FALSE)`. ## ----raster-intro-plot, echo = FALSE, fig.cap = "Raster data types: (A) cell IDs, (B) cell values, (C) a colored raster map.", fig.scap="Raster data types.", fig.asp=0.5, message=FALSE---- diff --git a/code/chapters/03-attribute-operations.R b/code/chapters/03-attribute-operations.R index c9ac96311..bfe01b73e 100644 --- a/code/chapters/03-attribute-operations.R +++ b/code/chapters/03-attribute-operations.R @@ -56,7 +56,7 @@ library(spData) # spatial data package introduced in Chapter 2 ## ----03-attribute-operations-7---------------------------------------------------------------------- class(world) # it's an sf object and a (tidy) data frame -dim(world) # it is a 2 dimensional object, with 177 rows and 11 columns +dim(world) # it is a two-dimensional object, with 177 rows and 11 columns ## ----03-attribute-operations-8---------------------------------------------------------------------- @@ -247,8 +247,8 @@ world_agg5 = world |> options(scipen = 999) knitr::kable( world_agg5, - caption = "The top 3 most populous continents ordered by number of countries.", - caption.short = "Top 3 most populous continents.", + caption = "The top three most populous continents ordered by number of countries.", + caption.short = "Top three most populous continents.", booktabs = TRUE ) @@ -447,7 +447,7 @@ elev[1, c(1, 2)] = 0 ## The `force` argument makes sure that the package will be detached even if other packages depend on it. -## This, however, may lead to a restricted usability of packages depending on the detached package, and is therefore not recommended. +## This, however, may lead to a restricted usability of packages depending on the detached package, and it is therefore not recommended. ## ---- echo=FALSE, results='asis'-------------------------------------------------------------------- diff --git a/code/chapters/04-spatial-operations.R b/code/chapters/04-spatial-operations.R index 19d02f965..f8b18f382 100644 --- a/code/chapters/04-spatial-operations.R +++ b/code/chapters/04-spatial-operations.R @@ -18,7 +18,7 @@ canterbury = nz |> filter(Name == "Canterbury") canterbury_height = nz_height[canterbury, ] -## ----nz-subset, echo=FALSE, warning=FALSE, fig.cap="Illustration of spatial subsetting with red triangles representing 101 high points in New Zealand, clustered near the central Canterbuy region (left). The points in Canterbury were created with the `[` subsetting operator (highlighted in gray, right).", fig.scap="Illustration of spatial subsetting.", message=FALSE---- +## ----nz-subset, echo=FALSE, warning=FALSE, fig.cap="Spatial subsetting with red triangles representing 101 high points in New Zealand, clustered near the central Canterbuy region (left). The points in Canterbury were created with the `[` subsetting operator (highlighted in gray, right).", fig.scap="Spatial subsetting.", message=FALSE---- library(tmap) p_hpnz1 = tm_shape(nz) + tm_polygons(col = "white") + tm_shape(nz_height) + tm_symbols(shape = 2, col = "red", size = 0.25) + @@ -53,7 +53,7 @@ canterbury_height2 = nz_height[sel_logical, ] ## Note: another way to return a logical output is by setting `sparse = FALSE` (meaning 'return a dense matrix not a sparse one') in operators such as `st_intersects()`. The command `st_intersects(x = nz_height, y = canterbury, sparse = FALSE)[, 1]`, for example, would return an output identical to `sel_logical`. -## Note: the solution involving `sgbp` objects is more generalisable though, as it works for many-to-many operations and has lower memory requirements. +## Note: the solution involving `sgbp` objects is more generalizable though, as it works for many-to-many operations and has lower memory requirements. ## --------------------------------------------------------------------------------------------------- @@ -74,7 +74,7 @@ canterbury_height3 = nz_height |> ## waldo::compare(canterbury_height2, canterbury_height4) -## ----relations, echo=FALSE, fig.cap="Topological relations between vector geometries, inspired by Figures 1 and 2 in Egenhofer and Herring (1990). The relations for which the function(x, y) is true are printed for each geometry pair, with x represented in pink and y represented in blue. The nature of the spatial relationship for each pair is described by the Dimensionally Extended 9-Intersection Model string.", fig.show='hold', message=FALSE, fig.asp=0.66, warning=FALSE---- +## ----relations, echo=FALSE, fig.cap="Topological relations between vector geometries, inspired by figures 1 and 2 in Egenhofer and Herring (1990). The relations for which the function(x, y) is true are printed for each geometry pair, with x represented in pink and y represented in blue. The nature of the spatial relationship for each pair is described by the Dimensionally Extended 9-Intersection Model string.", fig.show='hold', message=FALSE, fig.asp=0.66, warning=FALSE---- # source("https://github.com/Robinlovelace/geocompr/raw/c4-v2-updates-rl/code/de_9im.R") source("code/de_9im.R") library(sf) @@ -230,7 +230,7 @@ st_is_within_distance(point_sf, polygon_sfc, dist = 0.2, sparse = FALSE)[, 1] ## text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add text -## ----de9imgg, echo=FALSE, warning=FALSE, fig.cap="Illustration of how the Dimensionally Extended 9 Intersection Model (DE-9IM) works. Colors not in the legend represent the overlap between different components. The thick lines highlight 2 dimensional intesections, e.g. between the boundary of object x and the interior of object y, shown in the middle top facet.", message=FALSE---- +## ----de9imgg, echo=FALSE, warning=FALSE, fig.cap="Illustration of how the Dimensionally Extended 9 Intersection Model (DE-9IM) works. Colors not in the legend represent the overlap between different components. The thick lines highlight two-dimensional intersections, e.g. between the boundary of object x and the interior of object y, shown in the middle top facet.", message=FALSE---- p1_2 = st_as_sf(c(p1, p3)) ii = st_as_sf(st_intersection(p1, p3)) ii$Object = "Intersection" @@ -340,7 +340,7 @@ matrix_de_9im = function(pattern) { m = matrix_de_9im(pattern) colnames(m) = c("Interior (x)", "Boundary (x)", "Exterior (x)") rownames(m) = c("Interior (y)", "Boundary (y)", "Exterior (y)") -knitr::kable(m, caption = "Table showing relations between interiors, boundaries and exteriors of geometries x and y.") +knitr::kable(m, caption = "Relations between interiors, boundaries and exteriors of geometries x and y.") ## --------------------------------------------------------------------------------------------------- @@ -547,7 +547,7 @@ nz_agg2 = st_join(x = nz, y = nz_height) |> ## # aggregate looses the name of aggregating objects -## ----areal-example, echo=FALSE, fig.cap="Illustration of congruent (left) and incongruent (right) areal units with respect to larger aggregating zones (translucent blue borders).", fig.asp=0.2, fig.scap="Illustration of congruent and incongruent areal units."---- +## ----areal-example, echo=FALSE, fig.cap="Congruent (left) and incongruent (right) areal units with respect to larger aggregating zones (translucent blue borders).", fig.asp=0.2, fig.scap="Congruent and incongruent areal units."---- source("https://github.com/Robinlovelace/geocompr/raw/main/code/04-areal-example.R", print.eval = TRUE) @@ -656,7 +656,7 @@ ndvi_fun = function(nir, red){ ndvi_rast = lapp(multi_rast[[c(4, 3)]], fun = ndvi_fun) -## ----04-ndvi, echo=FALSE, fig.cap="RGB image (left) and NDVI values (right) calculated for the example satellite file of the Zion National Park"---- +## ----04-ndvi, echo=FALSE, fig.cap="RGB image (left) and NDVI values (right) calculated for the example satellite file of Zion National Park"---- knitr::include_graphics("images/04-ndvi.png") diff --git a/code/chapters/05-geometry-operations.R b/code/chapters/05-geometry-operations.R index eca2f4b30..7e538950c 100644 --- a/code/chapters/05-geometry-operations.R +++ b/code/chapters/05-geometry-operations.R @@ -134,7 +134,7 @@ rotation = function(a){ nz_rotate = (nz_sfc - nz_centroid_sfc) * rotation(30) + nz_centroid_sfc -## ----affine-trans, echo=FALSE, fig.cap="Illustrations of affine transformations: shift, scale and rotate.", warning=FALSE, eval=TRUE, fig.scap="Illustrations of affine transformations."---- +## ----affine-trans, echo=FALSE, fig.cap="Affine transformations: shift, scale and rotate.", warning=FALSE, eval=TRUE, fig.scap="Affine transformations."---- st_crs(nz_shift) = st_crs(nz_sfc) st_crs(nz_scale) = st_crs(nz_sfc) st_crs(nz_rotate) = st_crs(nz_sfc) diff --git a/code/chapters/06-raster-vector.R b/code/chapters/06-raster-vector.R index b71da3c53..ad8c2f348 100644 --- a/code/chapters/06-raster-vector.R +++ b/code/chapters/06-raster-vector.R @@ -27,7 +27,7 @@ srtm_final = mask(srtm_cropped, zion) srtm_inv_masked = mask(srtm, zion, inverse = TRUE) -## ----cropmask, echo = FALSE, fig.cap="Illustration of raster cropping and raster masking.", fig.asp=0.36, fig.width = 10, warning=FALSE---- +## ----cropmask, echo = FALSE, fig.cap="Raster cropping and raster masking.", fig.asp=0.36, fig.width = 10, warning=FALSE---- library(tmap) library(rcartocolor) terrain_colors = carto_pal(7, "Geyser") @@ -305,7 +305,7 @@ grain_poly = as.polygons(grain) |> st_as_sf() -## ----06-raster-vector-40, echo=FALSE, fig.cap="Illustration of vectorization of raster (left) into polygons (dissolve = FALSE; center) and aggregated polygons (dissolve = TRUE; right).", warning=FALSE, fig.asp=0.4, fig.scap="Illustration of vectorization."---- +## ----06-raster-vector-40, echo=FALSE, fig.cap="Vectorization of raster (left) into polygons (dissolve = FALSE; center) and aggregated polygons (dissolve = TRUE; right).", warning=FALSE, fig.asp=0.4, fig.scap="Vectorization."---- source("https://github.com/Robinlovelace/geocompr/raw/main/code/06-raster-vectorization2.R", print.eval = TRUE) diff --git a/code/chapters/09-mapping.R b/code/chapters/09-mapping.R index 05f844a52..234f3346c 100644 --- a/code/chapters/09-mapping.R +++ b/code/chapters/09-mapping.R @@ -114,11 +114,11 @@ map_nza = tm_shape(nz) + ## tm_shape(nz) + tm_polygons(col = "Median_income", palette = "BuGn") -## ----tmpal, message=FALSE, fig.cap="Illustration of settings that affect color settings. The results show (from left to right): default settings, manual breaks, n breaks, and the impact of changing the palette.", fig.scap="Illustration of settings that affect color settings.", echo=FALSE, fig.asp=0.56---- +## ----tmpal, message=FALSE, fig.cap="Settings that affect color settings. The results show (from left to right): default settings, manual breaks, n breaks, and the impact of changing the palette.", fig.scap="Settings that affect color settings.", echo=FALSE, fig.asp=0.56---- source("https://github.com/Robinlovelace/geocompr/raw/main/code/09-tmpal.R", print.eval = TRUE) -## ----break-styles, message=FALSE, fig.cap="Illustration of different binning methods set using the style argument in tmap.", , fig.scap="Illustration of different binning methods using tmap.", echo=FALSE---- +## ----break-styles, message=FALSE, fig.cap="Different binning methods set using the style argument in tmap.", , fig.scap="Different binning methods using tmap.", echo=FALSE---- source("https://github.com/Robinlovelace/geocompr/raw/main/code/09-break-styles.R", print.eval = TRUE) @@ -168,12 +168,12 @@ map_nz + source("https://github.com/Robinlovelace/geocompr/raw/main/code/09-layout1.R", print.eval = TRUE) -## ----layout2, message=FALSE, fig.cap="Illustration of selected layout options.", echo=FALSE, fig.asp=0.56---- +## ----layout2, message=FALSE, fig.cap="Selected layout options.", echo=FALSE, fig.asp=0.56---- # todo: add more useful settings to this plot source("https://github.com/Robinlovelace/geocompr/raw/main/code/09-layout2.R", print.eval = TRUE) -## ----layout3, message=FALSE, fig.cap="Illustration of selected color-related layout options.", echo=FALSE, fig.asp=0.56---- +## ----layout3, message=FALSE, fig.cap="Selected color-related layout options.", echo=FALSE, fig.asp=0.56---- source("https://github.com/Robinlovelace/geocompr/raw/main/code/09-layout3.R", print.eval = TRUE) diff --git a/code/chapters/11-algorithms.R b/code/chapters/11-algorithms.R index 96199a4a9..8492fda91 100644 --- a/code/chapters/11-algorithms.R +++ b/code/chapters/11-algorithms.R @@ -54,7 +54,7 @@ T1 = rbind(Origin, poly_mat[2:3, ], Origin) C1 = (T1[1, , drop = FALSE] + T1[2, , drop = FALSE] + T1[3, , drop = FALSE]) / 3 -## ----polymat, echo=FALSE, fig.cap="Illustration of polygon centroid calculation problem.", fig.height="100", warning=FALSE---- +## ----polymat, echo=FALSE, fig.cap="Polygon centroid calculation problem.", fig.height="100", warning=FALSE---- # initial plot: can probably delete this: old_par = par(pty = "s") plot(poly_mat) diff --git a/code/chapters/15-eco.R b/code/chapters/15-eco.R index 42035efad..376ef21f0 100644 --- a/code/chapters/15-eco.R +++ b/code/chapters/15-eco.R @@ -118,7 +118,7 @@ knitr::include_graphics("images/15_sa_mongon_sampling.png") ## terra::rast() ## names(ep) = c("carea", "cslope") # assign proper names ## terra::origin(ep) = terra::origin(dem) # make sure rasters have the same origin -## ep = c(dem, ndvi, ep) # add dem and ndvi to the multilayer SpatRaster object +## ep = c(dem, ndvi, ep) # add dem and ndvi to the multi-layerSpatRaster object ## ----15-eco-8, eval=FALSE--------------------------------------------------------------------------- diff --git a/code/chapters/_15-ex.R b/code/chapters/_15-ex.R index 860acedac..830382a46 100644 --- a/code/chapters/_15-ex.R +++ b/code/chapters/_15-ex.R @@ -84,7 +84,7 @@ ep = ep[c("AREA", "SLOPE")] |> names(ep) = c("carea", "cslope") # make sure all rasters share the same origin origin(ep) = origin(dem) -# add dem and ndvi to the multilayer SpatRaster object +# add dem and ndvi to the multi-layerSpatRaster object ep = c(dem, ndvi, ep) ep$carea = log10(ep$carea) diff --git a/index.Rmd b/index.Rmd index 84c8ba7c4..c51e338ff 100644 --- a/index.Rmd +++ b/index.Rmd @@ -276,7 +276,7 @@ As outlined in Section \@ref(why-use-r-for-geocomputation), there are many reaso R is well suited to the interactive use required in many geographic data analysis workflows compared with other languages. R excels in the rapidly growing fields of Data Science (which includes data carpentry, statistical learning techniques and data visualization) and Big Data (via efficient interfaces to databases and distributed computing systems). Furthermore, R enables a reproducible workflow: sharing scripts underlying your analysis will allow others to build on your work. -To ensure reproducibility in this book we have made its source code available at [github.com/geocompx/geocompr](https://github.com/geocompx/geocompr#geocomputation-with-r). +To ensure reproducibility in this book, we have made its source code available at [github.com/geocompx/geocompr](https://github.com/geocompx/geocompr#geocomputation-with-r). There you will find script files in the `code/` folder that generate figures: when code generating a figure is not provided in the main text of the book, the name of the script file that generated it is provided in the caption (see for example the caption for Figure \@ref(fig:zones)).