-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathamen_model_writeup.Rmd
347 lines (259 loc) · 13.4 KB
/
amen_model_writeup.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
---
title: "Implementing the Social Relations and AMEN model in Stan"
author: "Adam M. Lauretig"
date: "3/31/2020"
output: pdf_document
mainfont: Palatino
header-includes:
- \usepackage{palatino}
---
# Introduction
In a [recent review article](https://www.e-publications.org/ims/submission/STS/user/submissionFile/36407?confirm=150a239a), Peter Hoff discussed the Additive and Multiplicative Effects Network (AMEN) model, a Bayesian hierarchical model, developed by Minhas, Hoff, and Ward for modeling network data. This model can be understood as a hierarchical model with a more structured covariance, to account for the network dependence between observations. While there is currently an [implementation in R](https://cran.r-project.org/web/packages/amen/index.html), I wanted to explore building this model up in [Stan](https://mc-stan.org/), a probabilistic programming language, and taking advantage of variational inference to fit models more quickly.
These models were all fit in stan using the `vb()` function, which meant they were fit quickly. Stan is fast, and provides a variety of diagnostic tools, though final work should probably use dynamic HMC/NUTS sampling.
In the remaining sections, I gather some example data, and a fixed effects model, a social relations model, and an AMEN model to trade data from the 1990s.
# Dataset
I use a dataset included in the `amen` package to demonstrate model-building: exports from one country to another (in billions of dollars), averaged over the 1990s. I convert the sociomatrix to a real-valued edgelist, with indicators for sending and receiving countries, dyads, and the direction of the dyad. Plotting the data, we see the long-tailed distribution often characteristic of network data.
\clearpage
```{r, eda_dataviz, cache = TRUE, echo = TRUE, eval = TRUE, warning=FALSE, message=FALSE, size = "small"}
library(ggplot2)
library(amen)
library(rstan)
# A function to convert a matrix of pairs to an edgelist
# Assumes that rows == columns, but will check
matrix_to_edgelist <- function(sociomatrix_to_convert){
if(nrow(sociomatrix_to_convert) != ncol(sociomatrix_to_convert)){
stop("nrows != ncols")
}
all_nodes <- expand.grid(
list(rownames(sociomatrix_to_convert), colnames(sociomatrix_to_convert)))
all_nodes[, 1] <- as.numeric(all_nodes[, 1])
all_nodes[, 2] <- as.numeric(all_nodes[, 2])
all_nodes <- all_nodes[all_nodes[, 1] != all_nodes[, 2], ]
lookup_table <- data.frame(
node_names = rownames(sociomatrix_to_convert),
idx = as.numeric(factor(rownames(sociomatrix_to_convert))))
obs_values <- which(sociomatrix_to_convert != 0, arr.ind = TRUE)
obs_values <- cbind(obs_values, sociomatrix_to_convert[obs_values])
edgelist <- merge(
all_nodes, obs_values, by.x = c("Var1", "Var2"),
by.y = c("row", "col"), all.x = TRUE)
edgelist$V3 <- ifelse(is.na(edgelist$V3), 0, edgelist$V3)
edgelist <- cbind(
edgelist,
rep(seq.int(from = 1, to = sum(edgelist[, 1] < edgelist[, 2])), 2))
edgelist <- cbind(edgelist, ifelse(edgelist[, 1] < edgelist[, 2], 1, 2))
colnames(edgelist) <- c("sender", "receiver", "y", "dyad_id", "sr_indicator")
# for dyad list include a (1, 2) indicator for send/receive
# if s < r -> 1, else 2
return(list(edgelist = edgelist, lookup_table = lookup_table,
n_nodes = max(edgelist[, 1]), n_dyads = max(edgelist[, 4]),
N = nrow(edgelist)))
}
data_for_stan <- matrix_to_edgelist(IR90s$dyadvars[, , 2]) # trade data
y_df <- data.frame(y = data_for_stan$edgelist$y)
ggplot(data = y_df, aes(x = y)) + geom_histogram() + theme_minimal() + labs(x = "Export Volume", y = "Frequency", title = "Distribution of Export Volume")
```
\clearpage
# Fixed Effects Model
I begin with a simple fixed-effects model, where the outcome, $y_{i, j}$ is the exports from $\text{sender}_{i}$ to $\text{receiver}_{j}$.
$$y_{i,j} \sim \mathcal{N}(\alpha + \beta_{\text{sender}_{i}} + \beta_{\text{receiver}_{j}}, 1) $$
$$\alpha \sim \mathcal{N}(0, 5) $$
$$\beta_{\text{sender}_{i}} \sim \mathcal{N}(0, 5)$$
$$\beta_{\text{receiver}_{j}} \sim \mathcal{N}(0, 5)$$
\clearpage
```{stan, output.var = "m0_stan", echo = TRUE, eval = FALSE, size = "small"}
data{
int n_nodes ;
int n_dyads ;
int N; //total obs. should be n_dyads * 2
int sender_id[n_dyads * 2] ; //indexing for sender
int receiver_id[n_dyads * 2] ; //indexing for receivers
real Y[n_dyads * 2] ; // outcome variable
}
parameters{
real intercept ;
vector[n_nodes] sender_beta; //fixed effects
vector[n_nodes] receiver_beta; //fixed effects
}
model{
intercept ~ normal(0, 5) ;
sender_beta ~ normal(0, 5) ;
receiver_beta ~ normal(0, 5) ;
for(n in 1:N){
Y[n] ~ normal(intercept +
sender_beta[sender_id[n]] + receiver_beta[receiver_id[n]], 1 );
}
}
generated quantities{
real Y_sim[N] ;
for(n in 1:N){
Y_sim[n] = normal_rng(intercept +
sender_beta[sender_id[n]] + receiver_beta[receiver_id[n]], 1) ;
}
}
```
```{r, load_saved_objects, echo = FALSE, message = FALSE, warning = FALSE, cache = TRUE, size = "small"}
library(rstan)
load("srm_amen_stan.rdata")
```
\clearpage
Examining the posterior predictive check from the fixed effects model, we see a model with some clear fit problems, for example, in the scatterplot, the axes are different by an order of magnitude. In this model, the point predictions are the posterior means of outcomes simulated from the fitted model.
```{r, m0_ppc, echo = TRUE, cache = TRUE, message=FALSE, warning=FALSE, size = "small"}
library(bayesplot)
color_scheme_set("red")
m0_params <- extract(m0)
to_visualize <- data.frame(y = y_df$y, y_sim = colMeans(m0_params$Y_sim))
mse_0 <- mean((to_visualize$y - to_visualize$y_sim)^2)
#ppc_dens_overlay(y = y_df$y, yrep = m0_params$Y_sim)
ggplot(data = to_visualize, aes(x = y, y = y_sim)) +
geom_point(alpha = .1) +
theme_minimal() +
labs(x = "Observed Y", y = "Predicted Fit", title = paste0("Mean Squared Error is ", round(mse_0, 4)))
```
# Social Relations Model
To improve model fit, we can incorporate covariance between sender and receiver. In concrete terms, this means that we assume that US exports covary with US imports. To model this, we turn to the *social relations model*, where the outcome, $y_{i, j}$ is the exports from $\text{sender}_{i}$ to $\text{receiver}_{j}$, however, we add additional structure to the covariances.
$$y_{i,j} \sim \mathcal{N}(\alpha + \beta_{\text{sender}_{i}} + \beta_{\text{receiver}_{j}}, 1) $$
$$\alpha \sim \mathcal{N}(0, 5) $$
$$ \pmb{\beta}_{1\times2} \sim \mathcal{MVN}(\pmb{0}, \pmb{\Sigma})$$
$$ \pmb{\Sigma} = \pmb{\sigma} \pmb{\Omega}_{\text{cholesky}}$$
$$ \pmb{\sigma} \sim \text{Gam}(1, 1)$$
$$\pmb{\Omega}_{\text{cholesky}} \sim \mathcal{LKJ}(5)$$
\clearpage
In Stan, this is written as follows, with the non-centered parameterization used for the Multivariate normal:
```{stan output.var="m1_srm", echo = TRUE, eval = FALSE, size = "small"}
data{
int n_nodes ;
int n_dyads ;
int N; //total obs. should be n_dyads * 2
int sender_id[n_dyads * 2] ;
int receiver_id[n_dyads * 2] ;
real Y[n_dyads * 2] ;
}
parameters{
real intercept ;
cholesky_factor_corr[2] corr_nodes; // correlation matrix
vector<lower=0>[2] sigma_nodes; // sd
matrix[2, n_nodes] z_nodes ; // for node non-centered parameterization
}
transformed parameters{
matrix[n_nodes, 2] mean_nodes;
mean_nodes = (diag_pre_multiply(
sigma_nodes, corr_nodes) * z_nodes)'; // non-centered parameterization
}
model{
intercept ~ normal(0, 5) ;
to_vector(z_nodes) ~ normal(0, 1) ;
corr_nodes ~ lkj_corr_cholesky(5) ;
sigma_nodes ~ gamma(1, 1) ;
for(n in 1:N){
Y[n] ~ normal(intercept +
mean_nodes[sender_id[n], 1] + mean_nodes[receiver_id[n], 2], 1 );
}
}
generated quantities{
real Y_sim[N] ;
for(n in 1:N){
Y_sim[n] = normal_rng(intercept +
mean_nodes[sender_id[n], 1] + mean_nodes[receiver_id[n], 2], 1) ;
}
}
```
\clearpage
Examining fit from this model, we see that fit is largely the same, likely because despite allowing for covariance between sender and receiver effects, we are still imposing a strong additive relationship:
```{r, m1_ppc, echo = TRUE, cache = TRUE, message=FALSE, warning=FALSE, size = "small"}
library(bayesplot)
color_scheme_set("red")
m1_params <- extract(m1)
to_visualize1 <- data.frame(y = y_df$y, y_sim = colMeans(m1_params$Y_sim))
mse_1 <- mean((to_visualize1$y - to_visualize1$y_sim)^2)
#ppc_dens_overlay(y = y_df$y, yrep = m1_params$Y_sim)
ggplot(data = to_visualize1, aes(x = y, y = y_sim)) +
geom_point(alpha = .1) +
theme_minimal() +
labs(x = "Observed Y", y = "Predicted Fit", title = paste0("Mean Squared Error is ", round(mse_1, 4)))
```
# Additive and Multiplicative Effects Network Model
To relax the assumption of additivity in the parameters, [Peter Hoff](https://www.e-publications.org/ims/submission/STS/user/submissionFile/36407?confirm=150a239a) suggests the Additive and Multiplicative Effects Network (AMEN) Model. This adds a latent multiplicative effect to capture a higher-order network representation. This effect, for a given observation $y_{i,j}$ is modeled by the inner product $\pmb{u}_{\text{sender}_{i}} \cdot \pmb{v}_{\text{receiver}_{j}}^\top$, where $\pmb{u}$ and $\pmb{v}$ both have $1 \times K$ dimensions. These are concatenated together, and their covariance is modeled with a $2K \times 2K$ covariance matrix.
$$y_{i,j} \sim \mathcal{N}(\alpha + \beta_{\text{sender}_{i}} + \beta_{\text{receiver}_{j}} + (\pmb{u}_{\text{sender}_{i}} \cdot \pmb{v}_{\text{receiver}_{j}}^\top), 1) $$
$$\alpha \sim \mathcal{N}(0, 5) $$
$$ \pmb{\beta}_{1\times2} \sim \mathcal{MVN}(\pmb{0}, \pmb{\Sigma})$$
$$ \pmb{\Sigma} = \pmb{\sigma} \pmb{\Omega}_{\text{cholesky}}$$
$$ \pmb{\sigma} \sim \text{Gam}(1, 1)$$
$$\pmb{\Omega}_{\text{cholesky}} \sim \mathcal{LKJ}(5)$$
$$ c(\pmb{u}_{\text{sender}_{i}}, \pmb{v}_{\text{receiver}_{j}}^\top) \sim \mathcal{MVN}(\pmb{0}, \pmb{\Sigma}_{\pmb{u}, \pmb{v}})$$
$$ \pmb{\Sigma}_{\pmb{u}, \pmb{v}} = \pmb{\sigma}_{\pmb{u}, \pmb{v}} \pmb{\Omega}_{\pmb{u}, \pmb{v}_{\text{cholesky}}}$$
$$ \pmb{\sigma}_{\pmb{u}, \pmb{v}} \sim \text{Gam}(1, 1)$$
$$\pmb{\Omega}_{\pmb{u}, \pmb{v}_{\text{cholesky}}} \sim \mathcal{LKJ}(5)$$
\clearpage
In stan, this is written as follows, with non-centered parameterizations for all multivariate normals:
```{stan output.var="m3_amen", echo = TRUE, eval = FALSE, size = "small"}
data{
int n_nodes ;
int n_dyads ;
int N; //total obs. should be n_dyads * 2
int sender_id[n_dyads * 2] ;
int receiver_id[n_dyads * 2] ;
int K ; // number of latent dimensions
real Y[n_dyads * 2] ;
}
parameters{
real intercept ;
cholesky_factor_corr[2] corr_nodes; // correlation matrix for additive noteeffects
vector<lower=0>[2] sigma_nodes; // sd w/in nodes
matrix[2, n_nodes] z_nodes ; // for node non-centered parameterization
cholesky_factor_corr[K * 2] corr_multi_effects ; // correlation matrix for multiplicative effect
vector<lower=0>[K * 2] sigma_multi_effects ; // sd
matrix[K * 2, n_nodes] z_multi_effects; // Multi-effect non-centered term
}
transformed parameters{
matrix[n_nodes, 2] mean_nodes; // node parameter mean
matrix[n_nodes, K * 2] mean_multi_effects ; // multi-effect mean
mean_nodes = (diag_pre_multiply(
sigma_nodes, corr_nodes) * z_nodes)'; // sd *correlation
mean_multi_effects = (diag_pre_multiply(
sigma_multi_effects, corr_multi_effects) * z_multi_effects)'; // sd *correlation
}
model{
intercept ~ normal(0, 5) ;
//node terms
to_vector(z_nodes) ~ normal(0, 1) ;
corr_nodes ~ lkj_corr_cholesky(5) ;
sigma_nodes ~ gamma(1, 1) ;
// multi-effect terms
to_vector(z_multi_effects) ~ normal(0, 1) ;
corr_multi_effects ~ lkj_corr_cholesky(5) ;
sigma_multi_effects ~ gamma(1, 1) ;
for(n in 1:N){
Y[n] ~ normal(intercept +
mean_nodes[sender_id[n], 1] + mean_nodes[receiver_id[n], 2] +
mean_multi_effects[sender_id[n], 1:K] *
(mean_multi_effects[receiver_id[n], (K+1):(K*2)])',
1 );
}
}
generated quantities{
real Y_sim[N] ;
for(n in 1:N){
Y_sim[n] = normal_rng(intercept +
mean_nodes[sender_id[n], 1] + mean_nodes[receiver_id[n], 2] +
mean_multi_effects[sender_id[n], 1:K] *
(mean_multi_effects[receiver_id[n], (K+1):(K*2)])', 1) ;
}
}
```
\clearpage
```{r, m3_ppc, echo = TRUE, cache = TRUE, message=FALSE, warning=FALSE, size = "small"}
library(bayesplot)
color_scheme_set("red")
m3_params <- extract(m3)
to_visualize3 <- data.frame(y = y_df$y, y_sim = colMeans(m3_params$Y_sim))
mse_3 <- mean((to_visualize3$y - to_visualize3$y_sim)^2)
#ppc_dens_overlay(y = y_df$y, yrep = m3_params$Y_sim)
ggplot(data = to_visualize3, aes(x = y, y = y_sim)) +
geom_point(alpha = .1) +
theme_minimal() +
labs(x = "Observed Y", y = "Predicted Fit", title = paste0("Mean Squared Error is ", round(mse_3, 4)))
```
We see that the AMEN model can much more effectively model the outcome variable, with a much lower mean squared error, and that the scale of the predicted outcomes is much closer to the actual outcomes.
# Discussion
We have seen that the multiplicative effects model does a far better job of modeling the distribution of $\pmb{y}$ than either the fixed effects model, or the social relations model. That the model fit changes so dramatically suggests the importance of taking network structure seriously in modeling. One further extension would involve incorporating covariates into the model, in order to take advantage of additional estimation when estimating model fit.