-
Notifications
You must be signed in to change notification settings - Fork 28
/
Copy pathREADME.Rmd
298 lines (206 loc) · 12.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
---
output: github_document
#always_allow_html: true
---
```{r, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.path = "man/figures/")
```
# gdalcubes <img src="man/figures/logo.svg" align="right" alt="" width="120" />
[![R-CMD-check](https://github.com/appelmar/gdalcubes_R/actions/workflows/R-CMD-check.yml/badge.svg)](https://github.com/appelmar/gdalcubes_R/actions/workflows/R-CMD-check.yml)
[![CRAN](https://www.r-pkg.org/badges/version/gdalcubes)](https://cran.r-project.org/package=gdalcubes)
[![Downloads](https://cranlogs.r-pkg.org/badges/grand-total/gdalcubes)](https://cran.r-project.org/package=gdalcubes)
The R package `gdalcubes` aims at making analyses of large satellite image collections easier, faster, more intuitive, and more interactive.
The package represents the data as _regular raster data cubes_ with dimensions `bands`, `time`, `y`, and `x` and hides complexities in the data due to different spatial resolutions,map projections, data formats, and irregular temporal sampling.
# Features
- Read and process multitemporal, multispectral Earth observation image collections as _regular raster data cubes_ by applying on-the-fly reprojection, rescaling, cropping, and resampling.
- Work with existing Earth observation imagery on local disks or cloud storage without the need to maintain a 2nd copy of the data.
- Apply user-defined R functions on data cubes.
- Execute data cube operation chains using parallel processing and lazy evaluation.
Among others, the package has been successfully used to process data from the Sentinel-2, Sentinel-5P, Landsat, PlanetScope, MODIS, and Global Precipitation Measurement Earth observation satellites / missions.
# Installation
Install from CRAN with:
```{r, eval=FALSE}
install.packages("gdalcubes")
```
## From sources
Installation from sources is easiest with
```{r, eval=FALSE}
remotes::install_git("https://github.com/appelmar/gdalcubes")
```
Please make sure that the [git command line client](https://git-scm.com/downloads) is available on your system. Otherwise, the above command might not clone the gdalcubes C++ library as a submodule under src/gdalcubes.
The package builds on the external libraries [GDAL](https://www.gdal.org), [NetCDF](https://www.unidata.ucar.edu/software/netcdf), [SQLite](https://www.sqlite.org), and [curl](https://curl.haxx.se/libcurl).
## Windows
On Windows, you will need [Rtools](https://cran.r-project.org/bin/windows/Rtools) to build the package from sources.
## Linux
Please install the system libraries e.g. with the package manager of your Linux distribution. Also make sure that you are using a recent version of GDAL (>2.3.0). On Ubuntu, the following commands will install all neccessary libraries.
```
sudo add-apt-repository ppa:ubuntugis/ppa && sudo apt-get update
sudo apt-get install libgdal-dev libnetcdf-dev libcurl4-openssl-dev libsqlite3-dev libudunits2-dev
```
## MacOS
Using [Homebrew](https://brew.sh), required system libraries can be installed with
```
brew install pkg-config
brew install gdal
brew install netcdf
brew install libgit2
brew install udunits
brew install curl
brew install sqlite
brew install libtiff
brew install hdf5
brew install protobuf
```
# Getting started
## Download example data
```{r download}
if (!dir.exists("L8_Amazon")) {
download.file("https://hs-bochum.sciebo.de/s/8XcKAmPfPGp2CYh/download", destfile = "L8_Amazon.zip",mode = "wb")
unzip("L8_Amazon.zip", exdir = "L8_Amazon")
}
```
## Creating an image collection
At first, we must scan all available images once, and extract some metadata such as their spatial extent and acquisition time. The resulting _image collection_ is stored on disk, and typically consumes a few kilobytes per image. Due to the diverse structure of satellite image products, the rules how to derive the required metadata are formalized as _collection_formats_. The package comes with predefined formats for some Sentinel, Landsat, and MODIS products (see `collection_formats()` to print a list of available formats).
```{r}
library(gdalcubes)
gdalcubes_options(parallel=8)
files = list.files("L8_Amazon", recursive = TRUE,
full.names = TRUE, pattern = ".tif")
length(files)
sum(file.size(files)) / 1024^2 # MiB
L8.col = create_image_collection(files, format = "L8_SR", out_file = "L8.db")
L8.col
```
## Creating data cubes
To create a regular raster data cube from the image collection, we define the geometry of our target cube as
a _data cube view_, using the `cube_view()` function. We define a simple overview, covering the full spatiotemporal extent
of the imagery at 1km x 1km pixel size where one data cube cell represents a duration of one year. The provided resampling
and aggregation methods are used to spatially reproject, crop, and rescale individual images and combine pixel values from many images within one year respectively. The `raster_cube()` function returns a _proxy_ object, i.e., it returns immediately without doing any expensive computations.
```{r}
v.overview = cube_view(extent=L8.col, dt="P1Y", dx=1000, dy=1000, srs="EPSG:3857",
aggregation = "median", resampling = "bilinear")
raster_cube(L8.col, v.overview)
```
## Processing data cubes
We can apply (and chain) operations on data cubes:
```{r cubes}
x = raster_cube(L8.col, v.overview) |>
select_bands(c("B02","B03","B04")) |>
reduce_time(c("median(B02)","median(B03)","median(B04)"))
x
plot(x, rgb=3:1, zlim=c(0,1200))
library(RColorBrewer)
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") |>
plot(zlim=c(0,1), nbreaks=10, col=brewer.pal(9, "YlGn"), key.pos=1)
```
Calling data cube operations always returns _proxy_ objects, computations are started lazily when users call e.g. `plot()`.
## Animations
Multitemporal data cubes can be animated (thanks to the [gifski package](https://cran.r-project.org/package=gifski)):
```{r animation,results='hide'}
v.subarea.yearly = cube_view(extent=list(left=-6180000, right=-6080000, bottom=-550000, top=-450000,
t0="2014-01-01", t1="2018-12-31"), dt="P1Y", dx=50, dy=50,
srs="EPSG:3857", aggregation = "median", resampling = "bilinear")
raster_cube(L8.col, v.subarea.yearly) |>
select_bands(c("B02","B03","B04")) |>
animate(rgb=3:1,fps = 2, zlim=c(100,1000), width = 400,
height = 400, save_as = "man/figures/animation.gif")
```
![](man/figures/animation.gif)
## Data cube export
Data cubes can be exported as single netCDF files with `write_ncdf()`, or as a collection of (possibly cloud-optimized) GeoTIFF files with `write_tif()`, where each time slice of the cube yields one GeoTIFF file. Data cubes can also be converted to `terra` or `stars`objects:
```{r extpkgload, include=FALSE}
library(raster)
library(stars)
```
```{r, warning=FALSE}
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") |>
write_tif() |>
terra::rast() -> x
x
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") |>
stars::st_as_stars() -> y
y
```
To reduce the size of exported data cubes, compression and packing (conversion of doubles to smaller integer types) are supported.
If only specific time slices of a data cube are needed, `select_time()` can be called before plotting / exporting.
```{r select_time}
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI") |>
select_time(c("2015", "2018")) |>
plot(zlim=c(0,1), nbreaks=10, col=brewer.pal(9, "YlGn"), key.pos=1)
```
## User-defined functions
Users can pass custom R functions to `reduce_time()` and `apply_pixel()`. Below, we derive a _greenest pixel composite_ by returning RGB values from pixels with maximum NDVI for all pixel time-series.
```{r greenest_pixel_composite}
v.subarea.monthly = cube_view(view = v.subarea.yearly, dt="P1M", dx = 100, dy = 100,
extent = list(t0="2015-01", t1="2018-12"))
raster_cube(L8.col, v.subarea.monthly) |>
select_bands(c("B02","B03","B04","B05")) |>
apply_pixel(c("(B05-B04)/(B05+B04)"), names="NDVI", keep_bands=TRUE) |>
reduce_time(names=c("B02","B03","B04"), FUN=function(x) {
if (all(is.na(x["NDVI",]))) return(rep(NA,3))
return (x[c("B02","B03","B04"), which.max(x["NDVI",])])
}) |>
plot(rgb=3:1, zlim=c(100,1000))
```
## Extraction of pixels, time series, and summary statistics over polygons
In many cases, one is interested in extracting sets of points, time series, or summary statistics over polygons, e.g., to generate training data for machine learning models. Package version 0.6 therefore introduces the `extract_geom()` function, which replaces the previous implementations in `query_points()`, `query_timeseries()`, and `zonal_statistics()`.
Below, we randomly select 100 locations and query values of single data cube cells and complete time series.
```{r extract}
x = runif(100, v.overview$space$left, v.overview$space$right)
y = runif(100, v.overview$space$bottom, v.overview$space$top)
t = sample(as.character(2013:2019), 100, replace = TRUE)
df = sf::st_as_sf(data.frame(x = x, y = y), coords = c("x", "y"), crs = v.overview$space$srs)
# spatiotemporal points
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
extract_geom(df, datetime = t) |>
dplyr::sample_n(15) # print 15 random rows
# time series at spatial points
raster_cube(L8.col, v.overview) |>
select_bands(c("B04","B05")) |>
extract_geom(df) |>
dplyr::sample_n(15) # print 15 random rows
```
In the following, we use the example Landsat dataset (reduced resolution) from the package and compute median NDVI values within some administrative regions in New York City. The result is a data.frame containing data cube bands, feature IDs, and time as columns.
```{r zonal_statistics}
L8_files <- list.files(system.file("L8NY18", package = "gdalcubes"),
".TIF", recursive = TRUE, full.names = TRUE)
v = cube_view(srs="EPSG:32618", dy=300, dx=300, dt="P1M",
aggregation = "median", resampling = "bilinear",
extent=list(left=388941.2, right=766552.4,
bottom=4345299, top=4744931,
t0="2018-01", t1="2018-12"))
sf = sf::st_read(system.file("nycd.gpkg", package = "gdalcubes"), quiet = TRUE)
raster_cube(create_image_collection(L8_files, "L8_L1TP"), v) |>
select_bands(c("B04", "B05")) |>
apply_pixel("(B05-B04)/(B05+B04)", "NDVI") |>
extract_geom(sf, FUN = median) -> zstats
dplyr::sample_n(zstats, 15) # print 15 random rows
```
We can combine the result with the original features by a table join on the FID column using `merge()`:
```{r}
sf$FID = rownames(sf)
x = merge(sf, zstats, by = "FID")
plot(x[x$time == "2018-07-01", "NDVI"])
```
When using input features with additional attributes / labels, the `extract_geom()` function
hence makes it easy to create training data for machine learning models.
# More Features
**Cloud support with STAC**: `gdalcubes` can be used directly on cloud computing platforms including Amazon Web Services, Google Cloud Platform, and Microsoft Azure. Imagery can be read from their open data catalogs and discovered by connecting to STAC API endpoints using the [`rstac` package](https://cran.r-project.org/package=rstac) (see links at the end of this page).
**Machine learning**: The built-in functions `extract_geom` and `predict` help to create training data and apply predictions on data cubes using machine learning models as created from packages `caret` or `tidymodels`.
**Further operations**: The previous examples covered only a limited set of built-in functions. Further data cube operations for example include spatial and/or temporal slicing (`slice_time`, `slice_space`), cropping (`crop`), application of moving window / focal operations (`window_time`, `window_space`), filtering by arithmetic expressions on pixel values and spatial geometries (`filter_pixel`, `filter_geom`), and combining two or more data cubes with identical shape (`join_bands`).
# Further reading
* [Official R package website](https://gdalcubes.github.io)
* [Tutorial on YouTube](https://youtu.be/Xlg__2PeTXM?t=3693) how to use gdalcubes in the cloud, streamed at OpenGeoHub Summer School 2021
* [1st blog post on r-spatial.org](https://www.r-spatial.org/r/2019/07/18/gdalcubes1.html)
* [2nd blog post on r-spatial.org](https://r-spatial.org/r/2021/04/23/cloud-based-cubes.html) describing how to use gdalcubes in cloud-computing environments
* [Open access paper](https://www.mdpi.com/2306-5729/4/3/92) in the special issue on Earth observation data cubes of the data journal