Skip to content

Commit

Permalink
Merge pull request #262 from ropensci/query-split
Browse files Browse the repository at this point in the history
add query-split vignette for #254 - Thanks @Mashin6 for this really useful contribution. I just tweaked the text a bit, mostly to more explicitly state what each bit of code actually does.
  • Loading branch information
mpadge authored Jan 23, 2022
2 parents ac0704c + ed4b9f2 commit 8714e83
Show file tree
Hide file tree
Showing 5 changed files with 192 additions and 4 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: osmdata
Title: Import 'OpenStreetMap' Data as Simple Features or Spatial Objects
Version: 0.1.8.018
Version: 0.1.8.020
Authors@R: c(
person("Mark", "Padgham", , "[email protected]", role = c("aut", "cre")),
person("Bob", "Rudis", role = "aut"),
Expand All @@ -11,6 +11,7 @@ Authors@R: c(
person("Andrea", "Gilardi", role = "ctb"),
person("Enrico", "Spinielli", role = "ctb"),
person("Anthony", "North", role = "ctb"),
person("Martin", "Machyna", role = "ctb"),
person("Marcin", "Kalicinski", role = c("ctb", "cph"),
comment = "Author of included RapidXML code"),
person("Finkelstein", "Noam", role = c("ctb", "cph"),
Expand Down
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ Major changes:
- New function `opq_around` to query features within a specified radius
*around* a defined location; thanks to @barryrowlingson via #199 and
@maellecoursonnais via #238
- New vignette on splitting large queries thanks to @Machin6 (via #262)

Minor changes:

Expand Down
9 changes: 7 additions & 2 deletions codemeta.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"codeRepository": "https://github.com/ropensci/osmdata/",
"issueTracker": "https://github.com/ropensci/osmdata/issues",
"license": "https://spdx.org/licenses/GPL-3.0",
"version": "0.1.8.018",
"version": "0.1.8.020",
"programmingLanguage": {
"@type": "ComputerLanguage",
"name": "R",
Expand Down Expand Up @@ -73,6 +73,11 @@
"givenName": "Anthony",
"familyName": "North"
},
{
"@type": "Person",
"givenName": "Martin",
"familyName": "Machyna"
},
{
"@type": "Person",
"givenName": "Marcin",
Expand Down Expand Up @@ -356,7 +361,7 @@
"r-package",
"peer-reviewed"
],
"fileSize": "23876.288KB",
"fileSize": "23887.5KB",
"citation": [
{
"@type": "ScholarlyArticle",
Expand Down
2 changes: 1 addition & 1 deletion vignettes/makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
LFILE = osmdata
LFILE = query-split

all: knith open

Expand Down
181 changes: 181 additions & 0 deletions vignettes/query-split.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
title: "4. Splitting large queries"
author:
- "Mark Padgham"
- "Martin Machyna"
date: "`r Sys.Date()`"
bibliography: osmdata-refs.bib
output:
html_document:
toc: true
toc_float: true
number_sections: false
theme: flatly
vignette: >
%\VignetteIndexEntry{4. query-split}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

## 1. Introduction

The `osmdata` package retrieves data from the [`overpass`
server](https://overpass-api.de) which is primarily designed to deliver small
subsets of the full Open Street Map (OSM) data set, determined both by specific
bounding coordinates and specific OSM key-value pairs. The server has internal
routines to limit delivery rates on queries for excessively large data sets,
and may ultimately fail for large queries. This vignette describes one approach
for breaking overly large queries into a set of smaller queries, and for
re-combining the resulting data sets into a single `osmdata` object reflecting
the desired, large query.


## 2. Query splitting

Complex or data-heavy queries may exhaust the time or memory limits of the
`overpass` server. One way to get around this problem is to split the bounding
box (bbox) of a query into several smaller fragments, and then to re-combine
the data and remove duplicate objects. This section demonstrates how that may
be done, starting with a large bounding box.

```{r get-bbox, eval = FALSE}
library(osmdata)
bb <- getbb("Southeastern Connecticut COG", featuretype = "boundary")
bb
```
```{r out1, eval = FALSE}
min max
x -72.46677 -71.79315
y 41.27591 41.75617
```

The following lines then divide that bounding box into two smaller areas:

```{r bbox-split, eval = FALSE}
dx <- (bb["x", "max"] - bb["x", "min"]) / 2
bbs <- list(bb, bb)
bbs[[1]]["x", "max"] <- bb["x", "max"] - dx
bbs[[2]]["x", "min"] <- bb["x", "min"] + dx
bbs
```
```{r out2, eval = FALSE}
[[1]]
min max
x -72.46677 -72.12996
y 41.27591 41.75617
[[2]]
min max
x -72.12996 -71.79315
y 41.27591 41.75617
```

These two bounding boxes can then be used to submit two separate overpass
queries:

```{r opq-2x, eval = FALSE}
res <- list()
res[[1]] <- opq(bbox = bbs[[1]]) |>
add_osm_feature(key="admin_level", value="8") |>
osmdata_sf()
res[[2]] <- opq(bbox = bbs[[2]]) |>
add_osm_feature(key="admin_level", value="8") |>
osmdata_sf()
```

The retrieved `osmdata` objects can then be merged using the`c(...)` function,
which automatically removes duplicate objects.

```{r opq-merge, eval = FALSE}
res <- c(res[[1]], res[[2]])
```


## 3. Automatic bbox splitting

The previous code demonstrated how to divide a bounding box into two, smaller
regions. It will generally not be possible to know in advance how small a
bounding box should be for a query for work, and so we need a more general
version of that functionality to divide a bounding box into a arbitrary number
of sub-regions.

We can automate this process by monitoring the exit status of `opq() |>
osmdata_sf()` and in case of a failed query we can keep recursively splitting
the current bounding box into increasingly smaller fragments until the overpass
server returns a result. The following function demonstrates splitting a
bounding box into a list of four equal-sized bounding boxes in a 2-by-2 grid.

```{r bbox-auto-split, eval = FALSE}
split_bbox <- function(bbox, grid = 2) {
xmin <- bbox["x", "min"]
ymin <- bbox["y", "min"]
dx <- (bbox["x", "max"] - bbox["x", "min"]) / grid
dy <- (bbox["y", "max"] - bbox["y", "min"]) / grid
bboxl <- list()
for (i in 1:grid) {
for (j in 1:grid) {
b <- matrix(c(xmin + ((i-1) * dx),
ymin + ((j-1) * dy),
xmin + (i * dx),
ymin + (j * dy)),
nrow = 2,
dimnames = dimnames(bbox))
bboxl <- append(bboxl, list(b))
}
}
bboxl
}
```

We pre-split our area and create a queue of bounding boxes that we will use for
submitting queries.

```{r bbox-pre-split, eval = FALSE}
bb <- getbb("Connecticut", featuretype = NULL)
queue <- split_bbox(bb)
result <- list()
```

Now we can create a loop that will monitor the exit status of our query and in
case of success remove the bounding box from the queue. If our query fails for
some reason, we split the failed bounding box into four smaller fragments and
add them to our queue, repeating until all results have been successfully
delivered.

```{r auto-query, eval = FALSE}
while (length(queue) > 0) {
print(queue[[1]])
opres <- NULL
opres <- try({
opq(bbox = queue[[1]], timeout = 25) |>
add_osm_feature(key="natural", value="tree") |>
osmdata_sf()
})
if (class(opres)[1] != "try-error") {
result <- append(result, list(opres))
queue <- queue[-1]
} else {
bboxnew <- split_bbox(queue[[1]])
queue <- append(bboxnew, queue[-1])
}
}
```

All retrieved `osmdata` objects stored in the `result` list can then be
combined using the `c(...)` operator. Note that for large datasets this process
can be quite time consuming.

```{r merge-result-list, eval = FALSE}
final <- do.call(c, result)
```

0 comments on commit 8714e83

Please sign in to comment.