-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathanalysis.qmd
460 lines (404 loc) · 16.7 KB
/
analysis.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
---
title: "Data Analysis"
html:
code-fold: true
code-summary: "Show the code"
---
## Overview
The goal of this analysis is to understand how dependency ratios in Washington State are evolving temporally and spatially. How are the statewide ratios evolving? Which counties have the highest ratios? Are there regional differences in dependency ratios? If so, are there any patterns to the distribution of high dependency ratios?
```{r setup}
#| message: false
#| echo: false
library(tidyverse)
library(sf)
library(tigris)
library(gt)
library(broom)
library(ggrepel)
page_theme <- theme_minimal() +
theme(
plot.caption.position = "plot",
plot.title.position = "plot",
plot.subtitle = element_text(size=14),
plot.title = element_text(size=18)
)
map_theme <- theme_void() +
theme(
plot.subtitle = element_text(size=14),
plot.title = element_text(size=18),
strip.text = element_text(size = 12),
legend.position = "bottom",
plot.caption.position = "plot",
plot.title.position = "plot"
)
theme_set(page_theme)
```
```{r read in county dataframe}
#| echo: false
df_counties <- read_rds("intermediate/df_countries_proced.rds")
```
```{r store year range in variables}
#| echo: false
years <- levels(df_counties$year)
start_year <- first(years)
last_year <- last(years)
start_year_num <- as.numeric(start_year)
last_year_num <- as.numeric(last_year)
```
## Statewide dependency ratios
As the chart and table below show, the total dependency ratio for the State is increasing. Child and aged dependency ratios are moving in opposite directions. Child dependency is decreasing slowly while the aged dependency ratio is increasing. The aged dependency ratio is driving the trend in total dependency over the `r last_year_num - start_year_num` years from `r start_year` to `r last_year`.
```{r statewide dependency ratios}
#| code-fold: true
#| code-summary: "Show the code"
df_counties |>
filter(geography == "Washington State") |>
select(-(age_65:age_1)) |>
pivot_longer(total_dep_ratio:aged_dep_ratio, names_to = "dep_ratio", values_to = "ratio_val") |>
mutate(
dep_ratio = fct_rev(fct_recode(as_factor(dep_ratio),
"Total" = "total_dep_ratio",
"Child" = "child_dep_ratio",
"Aged" = "aged_dep_ratio"))
) |>
ggplot(aes(x=year, y=ratio_val, fill=dep_ratio)) +
geom_col(position = "dodge", colour="white") +
labs(
x = element_blank(),
y = "Value",
fill = "Dependency ratio",
caption="Data source: Washington State County Demography Dashboard",
title = paste("Total dependency increases from", start_year, "to", last_year),
subtitle = "Child and aged dependencies move in opposite directions"
) +
theme(
legend.position = "bottom"
)
```
```{r table of DR values at state level}
#| code-fold: true
#| code-summary: "Show the code"
df_counties |>
filter(geography == "Washington State") |>
select( "Year"=year, "Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |>
mutate(
pct_change_total = round((Total - lag(Total, n=1))/lag(Total, n=1)*100, 2)
) |>
gt() |>
tab_header(
title = paste("Changes in dependency ratios in Washington State,", start_year, "-", last_year)
) |>
cols_label(
pct_change_total = "Percent Change Total"
) |>
tab_options(
table.width = pct(100)
)
```
### Understanding the total dependency ratio trend
In order to quantify how total dependency is changing in Washington a simple linear model was employed. The model indicates that in each subsequent year the total dependency ratio increases by about 0.67 and the intercept can be interpreted as the value of the total dependency ratio in 2010.
```{r TDR trends}
#| code-fold: true
#| code-summary: "Show the code"
lm_fit <- df_counties |>
filter(geography == "Washington State") |>
mutate(
year = as.numeric(year)
) |>
lm(total_dep_ratio ~ year, data = _)
model_estimates <- broom::tidy(lm_fit)
model_estimates |>
gt() |>
tab_header(
title = "Simple linear model of total depency ratio"
) |>
fmt_number(
decimals = 2
)
```
The plot below shows the model fit and confidence interval.
```{r plot total dependency trend}
#| message: false
#| code-fold: true
#| code-summary: "Show the code"
df_counties |>
filter(geography == "Washington State") |>
ggplot(aes(x=as.numeric(year), y=total_dep_ratio)) +
geom_point() +
geom_smooth(method="lm", se=TRUE) +
scale_x_continuous(
breaks = 1:11,
minor_breaks = NULL,
labels = as.character(2011:2021)
) +
labs(
title = paste("Washington State total dependecy ratio trend",start_year,"-",last_year),
x = "Year",
y = "Total dependency ratio",
caption="Data source: Washington State County Demography Dashboard"
)
```
The confidence interval is very tight around the trend line.
### Prediciting future total dependency ratios
We can project this model forward a few years to see what total dependency ratios will be in subsequent years. When the 2022-23 demographic data become available, they can be compared with model predictions and the model can be updated.
```{r TDR predictions}
#| code-fold: true
#| code-summary: "Show the code"
#| message: false
# 2022 and 2023 are the 12th and 13th years in the series.
TDR_preds <- predict(lm_fit, newdata = data.frame(year = c(12,13)), type="response")
```
In this case, the model predicts total dependency ratios of `r round(TDR_preds[1],2)` and `r round(TDR_preds[2],2)` for 2022 and 2023, respectively.
## Counties with highest total dependency ratios
The table below shows the counties with the highest total dependency ratios in 2021. The counties in this list all have much higher aged dependency ratios than the statewide value (25.71) while generally having typical child dependency ratios. Jefferson county being the notable exception with a child dependency ratio of 18.58.
```{r identify high ratio counties}
#| code-fold: true
#| code-summary: "Show the code"
df_high_ratio <- df_counties |>
filter(geography != "Washington State" & year == 2021) |>
arrange(desc(total_dep_ratio)) |>
select("County"= geography, "Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |>
head(10)
df_high_ratio |>
gt() |>
tab_header(
title = paste("Counties with highest dependency ratios in",last_year)
) |>
tab_options(
table.width = pct(100)
)
```
### Where these counties are located
In order to understand how dependency ratios vary across the state, shape files for the counties are imported and the dependency ratios are overlaid on a map of the state.
```{r create WA geometries and map high ratio counties}
#| warning: false
#| code-fold: true
#| code-summary: "Show the code"
#| fig-width: 8
#| fig-height: 6
wa_county <- counties(state = "WA", cb = TRUE, class = "sf", progress_bar = FALSE)
label_size <- 3.5
wa_county |>
left_join(df_high_ratio, by=c("NAME"="County")) |>
mutate(
NAME = if_else(is.na(Total), "", NAME)
) |>
ggplot() +
geom_sf(aes(fill=Total), colour="white", ) +
geom_label_repel(aes(label = NAME, geometry = geometry),
stat = "sf_coordinates", size = label_size) +
# for the same colour scale for all dependency maps with limits
scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
map_theme +
labs(
title="Counties with highest total dependency ratios in 2021",
caption="Data source: Washington State County Demography Dashboard",
fill="Total dependency ratio"
)
```
From this map, it is clear that the distribution of the high dependency counties is not random. They are located on the east/west extremes of the state. Furthermore, 5 of the top 6 are located on the west coast, while the other five are located in the eastern quarter of the state.
#### Child dependency comparisons
These counties have roughly similar rates of child dependency, with Jefferson and San Juan Counties being notable exceptions.
```{r map high ratio counties child dependency rates}
#| warning: false
#| code-fold: true
#| code-summary: "Show the code"
#| fig-width: 8
#| fig-height: 6
wa_county |>
left_join(df_high_ratio, by=c("NAME"="County")) |>
mutate(
NAME = if_else(is.na(Child), "", NAME)
) |>
ggplot() +
geom_sf(aes(fill=Child), colour="white", ) +
geom_label_repel(aes(label = NAME, geometry = geometry),
stat = "sf_coordinates", size = label_size) +
# for the same colour scale for all dependency maps with limits
scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
map_theme +
labs(
title="Child dependency ratios for counties with highest\ntotal dependency ratios in 2021",
caption="Data source: Washington State County Demography Dashboard",
fill="Child dependency ratio"
)
```
That said, there is a statistically significant difference, with $\alpha=0.05$, between the mean child dependency ratios of the west coast and eastern counties.
```{r comparing west coast and eastern counties child dependency ratios}
#| code-fold: true
#| code-summary: "Show the code"
west_coast_CDR <- c(18.85, 24.40, 24.47, 24.55, 19.69)
eastern_CDR <- c(31.6, 30.61, 27.34, 25.46, 27.13)
state_CDR_mean <- 27.73
t.test(
west_coast_CDR, # west coast counties
eastern_CDR # eastern counties
)
```
If we compare the west coast counties to the statewide mean, there is a statistically significant difference at the $\alpha=0.05$ significance level.
```{r testing west coast against statewide CDR}
#| code-fold: true
#| code-summary: "Show the code"
t.test(
x=west_coast_CDR,
mu = state_CDR_mean
)
```
If we compare the eastern counties to the statewide mean, we can see that there is *not* a statistically significant difference at the $\alpha=0.05$ significance level.
```{r testing eastern counties against statewide CDR}
#| code-fold: true
#| code-summary: "Show the code"
t.test(
x=eastern_CDR,
mu = state_CDR_mean
)
```
#### Aged dependency comparisons
The map shows aged dependency ratios for these counties. It is clear that the west coast counties have higher ratios than the eastern counties.
```{r map high ratio counties aged dependency rates}
#| warning: false
#| code-fold: true
#| code-summary: "Show the code"
#| fig-width: 8
#| fig-height: 6
wa_county |>
left_join(df_high_ratio, by=c("NAME"="County")) |>
mutate(
NAME = if_else(is.na(Aged), "", NAME)
) |>
ggplot() +
geom_sf(aes(fill=Aged), colour="white", ) +
geom_label_repel(aes(label = NAME, geometry = geometry),
stat = "sf_coordinates", size = label_size) +
# for the same colour scale for all dependency maps with limits
scale_fill_continuous(type = "viridis",limits=c(0, 100)) +
map_theme +
labs(
title="Aged dependency ratios for counties with highest\ntotal dependency ratios in 2021",
caption="Data source: Washington State County Demography Dashboard",
fill="Aged dependency ratio"
)
```
With $\alpha = 0.05$, there is a statistically significant difference between the means of the two regions' counties.
```{r comparing west coast and eastern counties aged dependency ratios}
#| code-fold: true
#| code-summary: "Show the code"
t.test(
c(79.39, 66.51, 64.54, 62.54, 67.15), # west coast counties
c(59.65, 53.93, 51.14, 52.23, 50.42) # eastern counties
)
```
The aged dependency ratios for both county groups is so much higher than the statewide mean that performing a statistical test is unnecessary.
::: {.callout-tip}
## Key takeaway
Among the counties with the highest total dependencies there are significant differences between these counties and the statewide value. Furthermore, there are differences between the regional subgroups in these 10 counties.
:::
### Modeling changes in total dependency
Predicting future changes to the total dependency ratio for counties can help county planners understand what budgetary pressures they might expect to face in the coming years. While attempting to forecast far into the future comes with risk, we can use simple linear models to project into the near future for the counties with the highest total dependency ratios.
The trend lines for all 10 counties show a clear upward movement.
```{r build linear models}
#| message: false
#| code-fold: true
#| code-summary: "Show the code"
#| fig-width: 10
#| fig-height: 10
high_ratio_counties <- df_high_ratio |>
select(County) |>
pull()
df_county_models <- df_counties |>
filter(geography %in% high_ratio_counties) |>
select(-c(age_65:age_1, child_dep_ratio, aged_dep_ratio)) |>
mutate(
year = as.numeric(year)
) |>
group_by(geography) |>
nest() |>
mutate(
model = map(.x=data, ~lm(total_dep_ratio ~ year, data=.x) |> tidy())
)
df_county_models |>
unnest(data) |>
ggplot(aes(x=year, y=total_dep_ratio)) +
geom_point() +
geom_smooth(method = "lm", se=TRUE) +
facet_wrap(~geography, ncol=2) +
scale_x_continuous(
breaks = c(1:11),
labels = as.character(2011:2021),
minor_breaks = NULL,
) +
labs(
x=element_blank(),
y="Total dependency ratio",
title="Simple regression models of total dependency ratios",
caption="Data source: Washington State County Demography Dashboard"
) +
theme(
axis.text.x = element_text(
angle = 45
)
)
```
The following table shows the model slopes for each of the ten counties. The interpretation of these slopes is that for each year the total dependency ratio will change by the amount of the slope. For Jefferson County, the state with the highest total dependency ratio, we can expect the total dependency ratio to change by 3.31 points each year. This almost 5 times higher than Washington State's model slope of 0.67. Garfield County has the steepest slope, but as discussed below, the integrity of the Garfield County data is in question. The standard error for the Garfield County slope reflects this.
```{r table of model results}
#| code-fold: true
#| code-summary: "Show the code"
df_county_models |>
select(-data) |>
unnest(model) |>
select(-statistic) |>
pivot_wider(names_from = term, values_from = estimate) |>
rename(Intercept = "(Intercept)", "Slope"=year) |>
filter(is.na(Intercept)) |>
select(-Intercept, County = "geography", Slope, "Std Error" = "std.error", "P Value"="p.value") |>
relocate(Slope, .before = "Std Error") |>
arrange(desc(Slope)) |>
ungroup() |>
gt() |>
tab_header(
title = "Rates of change for total dependency ratio by county"
) |>
fmt_number(
decimals = 2
) |>
cols_align(
align = "left",
columns = County
)
```
#### Potential problems with Garfield County data
The Garfield County data has an interesting bend in it that occurred in 2016. This behaviour isn't seen in the other counties. The Garfield County data for 2015-2016 shows the following:
```{r Garfield County 2015-2016}
#| code-fold: true
#| code-summary: "Show the code"
df_counties |>
filter(geography == "Garfield" & year %in% c(2015,2016)) |>
select(-(age_65:age_1)) |>
select("Child"=child_dep_ratio, "Aged"=aged_dep_ratio, "Total"=total_dep_ratio) |>
gt() |>
tab_header(
title = paste("Garfield County dependency data, 2015-2016")
) |>
tab_options(
table.width = pct(100)
)
```
There is a noticeable jump in both the child and aged dependency rates from 2015 to 2016. There is not a significant change in the total population of the county over the decade from 2011 to 2021 as shown in the following table:
```{r}
#| code-fold: true
#| code-summary: "Show the code"
df_counties |>
filter(geography=="Garfield") |>
select("Year" = year, age_65:age_1) |>
pivot_longer(cols=age_65:age_1, names_to = "brackets", values_to = "value") |>
group_by(Year) |>
summarise(`Total Population` = ceiling(sum(value))) |>
gt()
```
There is a small decline from 2015 to 2016. In looking for an explanation for this jump I consulted the [Washington Regional Economic Analysis Project](https://washington.reaproject.org/analysis/comparative-trends-analysis/population/tools/530023/530000/) as well as [Garfield County, WA](https://datausa.io/profile/geo/garfield-county-wa/) looking for a possible explanation for this change.
::: {.callout-warning}
## More information needed
Additional consultation with Washington State Department of Health is required to understand what caused this pattern in the Garfield County data.
:::
<section class="button-container">
<a href="conclusions.html" role="button" class="btn btn-primary">Continue to Conclusions</a>
</section>