Skip to content

Commit

Permalink
work on spatial, touch up for duplicated special term labels
Browse files Browse the repository at this point in the history
  • Loading branch information
freezenik committed Sep 30, 2024
1 parent 1807271 commit 7adbf05
Show file tree
Hide file tree
Showing 2 changed files with 126 additions and 53 deletions.
10 changes: 10 additions & 0 deletions R/specials.R
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,16 @@ special_terms <- function(x, data, binning = FALSE, digits = Inf, ...)
}
}

if(any(dups <- duplicated(names(sterms)))) {
dups <- names(sterms)[dups]
for(j in dups) {
dn <- grep(j, names(sterms), fixed = TRUE, value = TRUE)
dn <- paste0(dn, ".", 1:length(dn))
i <- grep(j, names(sterms), fixed = TRUE)
names(sterms)[i] <- dn
}
}

return(sterms)
}

Expand Down
169 changes: 116 additions & 53 deletions vignettes/spatial.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,84 +18,147 @@ vignette: >

```{r preliminaries, echo=FALSE, message=FALSE, results="hide"}
library("gamlss2")
library("sf")
```

Describe how to incorporate spatial effects in _gamlss2_

```{r, eval=FALSE, echo=FALSE}
download_data <- function(data = "WeatherGermany10.rds") {
file <- paste0("https://nikum.org/abm/Data/", data)
tdir <- tempfile()
dir.create(tdir)
download.file(file, file.path(tdir, data))
return(readRDS(file.path(tdir, data)))
Spatial data analysis is important in many fields such as environmental science, epidemiology,
and climatology, where observations are collected across different geographical locations.
A key challenge in spatial modeling is accounting for the dependence structure among nearby
regions, which often display correlated patterns in outcomes.

In generalized additive models for location, scale, and shape (GAMLSS), spatial effects can be
incorporated in various ways, such as through Markov random fields (MRFs). MRFs handle spatial
correlation by applying penalties that reflect the neighborhood structure of the spatial data.
This vignette is divided into two parts: the first part demonstrates how to estimate discrete
spatial effects using MRFs with the gamlss2 package, while the second part provides examples
of modeling spatial effects with smooth functions like thin-plate splines or tensor product splines.

### Example: Modeling Severe Storm Counts in Germany

In this example, we analyze severe storm counts recorded at various weather stations across
Germany over multiple years. Our goal is to model these storm counts while accounting for the
spatial dependence between stations. To achieve this, we associate each weather station with
its respective county in Germany, enabling us to incorporate the geographical structure into
the model.

```{r}
## load the Germany severe storm data
data("storms", package = "gamlss2")
## plot storm counts per station and year
plot(range(storms$year), range(storms$counts), type = "n",
xlab = "Year", ylab = "Counts")
for(j in levels(storms$id)) {
dj <- subset(storms, id == j)
dj <- dj[order(dj$year), ]
with(dj, lines(counts ~ year, type = "b", pch = 16,
col = rgb(0.1, 0.1, 0.1, alpha = 0.4)))
}
```

WeatherGermany10 <- download_data("WeatherGermany10.rds")
The data contains storm counts per year for each station. A preliminary visualization
of these counts allows us to inspect patterns of storm frequency over time and across stations.

WeatherGermany10 <- na.omit(WeatherGermany10)
### Visualizing the Spatial Structure

owd <- getwd()
tdir <- tempdir()
dir.create(tdir)
setwd(tdir)
We begin by plotting the locations of weather stations on a map of Germany. The _sf_ package is
used to manage and plot spatial data.

download.file("https://geodata.ucdavis.edu/gadm/gadm4.1/shp/gadm41_DEU_shp.zip",
file.path(tdir, "DE.zip"))
```{r}
## load map of Germany
## needs sf package for plotting
library("sf")
data("Germany", package = "gamlss2")
unzip("DE.zip")
## plot station locations
plot(st_geometry(Germany))
co <- unique(storms[, c("lat", "lon")])
points(co, col = 2, pch = 4, lwd = 2)
```

setwd(owd)
This map shows the geographical distribution of weather stations. The spatial structure will
be incorporated into our model to account for the proximity of stations when estimating storm counts.

library("sf")
### Defining the Neighborhood Structure

m <- st_read(file.path(tdir, "gadm41_DEU_2.shp"))
m <- m[c("NAME_1", "NAME_2")]
names(m) <- c("state", "county", "geometry")
m$id <- as.factor(1:nrow(m))
m <- m[c("id", "county", "state", "geometry")]
Next, we define the neighborhood structure among the weather stations using a distance-based
criterion. This is crucial for the Markov random field, as it specifies how spatial
correlation should be penalized.

co <- unique(WeatherGermany10[, c("lon", "lat")])
co_sf <- st_as_sf(co, coords = c("lat", "lon"), crs = st_crs(m))
```{r}
## estimate spatial count model using
## a Markov random field, first a neighbor matrix
## needs to be computed, here we use distance based
## neighbors
library("spdep")
nb <- poly2nb(Germany)
plot(st_geometry(Germany), border = "lightgray")
plot.nb(nb, st_geometry(Germany), add = TRUE)
```

jd <- st_join(co_sf, m[, c("id", "county", "state"), drop = FALSE])
The neighbor matrix is constructed using the `poly2nb()` function from the _spdep_ package,
which calculates the adjacency structure of the geographical regions. We then visualize
the spatial network of neighbors on the map.

co <- cbind(co, jd[, c("id", "county", "state")])
### Constructing the Penalty Matrix

d <- WeatherGermany10[, c("Date", "Wmax", "Tmax", "Tmin", "Pre", "alt", "lat", "lon")]
The penalty matrix defines the spatial penalties imposed by the Markov random field.
The matrix is constructed based on the neighbor relationships defined earlier.

d$ID <- d$County <- d$State <- NA
```{r}
## compute final neighbor penalty matrix
K <- nb2mat(nb, style = "B", zero.policy = TRUE)
for(i in 1:nrow(co)) {
j <- which((d$lon == co$lon[i]) & (d$lat == co$lat[i]))
d$ID[j] <- co$id[i]
d$lat[j] <- co$lat[i]
d$lon[j] <- co$lon[i]
d$County[j] <- co$county[i]
d$State[j] <- co$state[i]
}
## assign region names
rownames(K) <- colnames(K) <- levels(Germany$id)
d <- d[, c("ID", "State", "County", "Date", "Wmax", "Tmax", "Tmin", "Pre", "alt", "lat", "lon")]
## set up final penalty matrix
K <- -1 * K
diag(K) <- -1 * rowSums(K)
saveRDS(d, file = "~/data/WeatherGermany10.rds")
saveRDS(m, file = "~/data/Germany.rds")
## remove regions not in data
i <- which(rownames(K) %in% levels(storms$id))
K <- K[i, i]
```

The penalty matrix `K` is set up such that it reflects the neighborhood relationships between the
regions. Each element of the matrix represents how strongly each region is connected to its
neighbors. The diagonal entries represent the total number of neighbors for each region.

d$Year <- as.POSIXlt(d$Date)$year + 1900
### Estimating the Model

df <- aggregate(Wmax ~ ID + Year + State + County + alt + lon + lat,
FUN = function(x) sum(x >= 24.5), data = d)
We now estimate the spatial count model using the Negative Binomial distribution (`NBI`).
The model includes smooth functions of altitude, year, and an interaction between altitude and year.
Spatial effects are incorporated using the `bs = "mrf"` option.

df <- df[, c("ID", "County", "State", "Year", "Wmax", "alt", "lon", "lat")]
```{r, results="hide"}
## estimate count model using the NBI family,
## model formula is
f <- ~ s(alt) + s(year) + s(id, bs = "mrf", xt = list("penalty" = K)) +
te(alt, year)
f <- list(update(f, counts ~ .), f)
names(m) <- c("ID", "County", "State", "geometry")
## estimate model using BIC for shrinkage parameter selection
b <- gamlss2(f, data = storms, family = NBI, criterion = "BIC")
```

storms <- df
Germany <- st_simplify(m, dTolerance = 2000)
```{r}
## model summary
summary(b)
```

names(storms) <- gsub("Wmax", "Counts", names(storms))
Here, the spatial effect is modeled as an MRF smooth (`s(id, bs = "mrf")`), where the penalty
matrix `K` enforces spatial structure based on neighboring stations. We use the
Bayesian Information Criterion (BIC) to select the optimal smoothing parameters.

storms <- storms[order(storms$ID), ]
### Visualizing the Estimated Effects

save(storms, file = "../data/storms.rda", compress = "xz")
save(Germany, file = "../data/Germany.rda", compress = "xz")
Finally, we visualize the estimated effects from the fitted model.

```{r}
plot(b)
```

The plot shows the estimated smooth functions for altitude, year, and the spatial effect.
These visualizations help us interpret how storm counts vary across space and time.

0 comments on commit 7adbf05

Please sign in to comment.