-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathblog2.Rmd
340 lines (262 loc) · 15.6 KB
/
blog2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
---
title: "Health care in Mexico: a boon or a bane?"
description: "The blog focuses on finding out if individuals have recovered at the same pace as testing positive for COVID-19 and if the deaths and cases have decreased since the beginning of vaccination doses."
author:
- name: Nishtha Arora
date: "2021-09-17"
output:
distill::distill_article:
toc: true
---
```{r loadinglibraries, echo=FALSE}
library(readr)
library(tidyverse)
library(lubridate)
library(kableExtra)
library(ggplot2)
library(countrycode)
library(plotly)
library(Hmisc)
library(corrplot)
library(gt)
library(ggpubr)
```
# Introduction
```{r reading_data, echo=FALSE}
covid_data <- read_csv(here::here(("data/owid-covid-data.csv")))
```
Mexico is a country in North America and northern most country of Latin America, which mainly consists of Spanish and Portuguese speaking population. It has a population of 128,932,754 people high rates of population growth due to improvement in health care sector. This blog will highlight the effect of improved health care during the COVID-19 pandemic (2020 and 2021) in Mexico (Grayson & Ball, 2021).
```{r image, echo = F, out.width = '70%', fig.align='center'}
knitr::include_graphics("images/mex.jpg")
```
## Background
Mexico has reported one of the highest mortality rates in the world due to its late implementation of social distancing restrictions. The lethality in Mexico due to COVID-19 is found out to be 9.2% and has the lowest no. of tests per capita (Tariq et al., 2021).
According to the [Scientific reports](https://www.nature.com/articles/s41598-021-90329-w) that after the restrictions, Mexico has been successful in controlling its COVID-19 cases, by implementing better public health measures (Arellanos-Soto et al., 2021).
The data stories throw light on the numbers of recoveries, COVID-19 testing, cases, deaths and vaccinations to justify the above background.
# Data Description
## Data source
The data has been extracted from an online publication, [Our World in Data](https://ourworldindata.org/coronavirus/country/mexico). The data is collected and uploaded by researchers *Hannah Ritchie, Diana Beltekian, Edouard Mathieu, Cameron Appel, Lucas Rodés-Guirao, Esteban Ortiz-Ospina, Charlie Giattino, Bobbie MacDonald, Joe Hasell and Max Roser* (Ritchie et al., 2021).
The collection of data by the researchers is done via various sources like specialized institutions, research articles, statistical agencies or official data from government sources ("Data FAQs", 2021).
## Variable description
```{r data_wrangle_check_correct, include=FALSE}
var_select <- covid_data %>%
filter(location == "Mexico") %>%
select(date, new_cases, new_deaths, new_tests, positive_rate, new_vaccinations) %>%
mutate(year= year(date)) %>%
#mutate(year= as.factor(year)) %>%
mutate(new_recoveries = new_cases-new_deaths) %>%
mutate(new_test_result_positive = (positive_rate*new_tests)) %>%
mutate(new_test_result_positive=round(new_test_result_positive)) %>%
rename(new_positive_rate = "positive_rate") %>%
select(-c(date))
clean <- unique(var_select [ , 1:8 ] )
data_types <- str(clean)
```
The original data set is wrangled, duplicate data is removed, variables are selected and renamed and new ones are mutated to form a clean data set for the analysis.
The data set formed contains the statistics for everyday case, deaths, recoveries, tests and vaccination doses for the year 2020 and 2021. The number of rows in the data set are `r nrow(clean)` and the number of columns are `r ncol(clean)`.
Table 1 shows the variables that are used for the analysis:
*All the variables (except year) have prefix as 'new'. This means the variable is calculated by subtracting the 'previous total value before the date' from 'current total value till date'.*
```{r, echo=FALSE}
Desc <- data.frame(Variables = names(clean),
Description = c("New COVID-19 cases",
"New COVID-19 deaths",
"New COVID-19 tests",
"This values shows the news positive tests out of total new tests",
"New COVID-19 vaccination doses",
"2020 or 2020 (pandemic years)",
"The no. of people recovered from COVID-19 after testing positive for it. It is calculated by subtracting 'new_deaths' from 'new_cases'",
"The no. of poeple testing positive for COVID-19. It is calculated by multiplying 'positive rate' by new_positive_rate"
),
Type = c("numeric",
"numeric",
"numeric",
"numeric",
"numeric",
"factor",
"numeric",
"numeric"))
knitr::kable(Desc,
caption = "Description of Variables") %>%
kable_styling(bootstrap_options = c("striped", "hover")) %>%
row_spec(1:8, color = "black", background = "#DAC8AB")
```
## Summary Statistics
Table 2 displays the minimum (min), maximum (max), mean and standard deviation (sd) value for each numeric variable in the data set.
```{r, echo=FALSE}
cases <- clean %>% select(new_cases) %>% na.omit()
deaths <- clean %>% select(new_deaths) %>% na.omit()
tests <- clean %>% select(new_tests) %>% na.omit()
rate <- clean %>% select(new_positive_rate) %>% na.omit()
vac <- clean %>% select(new_vaccinations) %>% na.omit()
rec <- clean %>% select(new_recoveries) %>% na.omit()
pos <- clean %>% select(new_test_result_positive) %>% na.omit()
variables <- c("new Cases", "new deaths", "new_tests", "new_positive_rate", "new_vaccinations", "new_recoveries", "new_test_result_positive")
min <- c(min(cases$new_cases),
min(deaths$new_deaths),
min(tests$new_tests),
min(rate$new_positive_rate),
min(vac$new_vaccinations),
min(rec$new_recoveries),
min(pos$new_test_result_positive))
max <- c(max(cases$new_cases),
max(deaths$new_deaths),
max(tests$new_tests),
max(rate$new_positive_rate),
max(vac$new_vaccinations),
max(rec$new_recoveries),
max(pos$new_test_result_positive))
mean <- c(round(mean(cases$new_cases),2),
round(mean(deaths$new_deaths),2),
round(mean(tests$new_tests),2),
round(mean(rate$new_positive_rate),2),
round(mean(vac$new_vaccinations),2),
round(mean(rec$new_recoveries),2),
round(mean(pos$new_test_result_positive),2))
sd <-c(round(sd(cases$new_cases),2),
round(sd(deaths$new_deaths),2),
round(sd(tests$new_tests),2),
round(sd(rate$new_positive_rate),2),
round(sd(vac$new_vaccinations),2),
round(sd(rec$new_recoveries),2),
round(sd(pos$new_test_result_positive),2))
description <- data.frame(variables, min, max, mean, sd)
gt(description) %>%
tab_header(title = "Table 2: Summary statistics",
subtitle = "2020-2021") %>%
tab_options(
heading.subtitle.font.size = 12,
heading.align = "center",
table.border.top.color = "black",
column_labels.border.bottom.color = "black",
column_labels.border.bottom.width= px(3),
)%>%
data_color(
columns = c(min, max, mean, sd),
colors = scales::col_numeric(
c("#DAC8AB", "#DAC8AB", "#DAC8AB"),
domain = NULL
)
)
```
# Analysis
## Story 1 Visualizations
**Map showing that Mexico is one of the countries with moderate recovery rate**
*(Please use plotly i.e. click on the country regions to see more information.)*
```{r map, echo=FALSE, fig.cap="Countries vs Recovery count"}
for_map <- covid_data %>% mutate(new_recoveries = new_cases-new_deaths)
map <- for_map %>%
select(location, new_recoveries) %>% na.omit() %>%
group_by(location) %>%
summarise(recoveries=sum(new_recoveries))
map $iso3 <- countrycode(map$location, 'country.name', 'iso3c')
map[map$location=="Kosovo","iso3"] <- "XKX"
map <- map %>% filter(location != "World")
fig <- plot_ly(map, type='choropleth', locations= map$iso3 , z= map$recoveries, text= ~paste(
"<br><b>Country:</b> ", location,
"<br><b>Recovery_count:</b>", recoveries), colorscale="distiller")
fig <- fig %>% colorbar(title = "Total recoveries" )
fig
```
Plot showing relation between **the no. of people tested positive for COVID-19** and **their recoveries**. *(Please press on PLAY for transitioning from 2020 to 2021)*
```{r, echo=FALSE, fig.cap= "Positive tests vs Recovery in 2020 and 2021"}
fig <- clean %>%
plot_ly(
x = ~new_test_result_positive,
y = ~new_recoveries,
frame = ~year,
type = 'scatter',
color = "red",
mode = 'markers',
showlegend = F
)
fig <- fig %>%
animation_opts(
1000, easing = "elastic", redraw = FALSE
)
fig <- fig %>%
animation_slider(
currentvalue = list(prefix = "YEAR ", font = list(color="red"))
)
fig
```
Story 1 Observations:
- From Figure 1, it is seen that Mexico falls under the countries that have had moderate recovery rate since 2020 till present (2021).
- From Figure 2, we see that in 2020 less people tested positive for COVID-19 as compared to 2021. Although people are testing positive at almost the same rate, people are recovering too at the same pace.
Both the graphs show that Mexico has a recovery rate which is moderate and at a suitable pace. This shows that the health sector in Mexico is gradually improving and **has managed to control the situation and mortality**.
## Story 2: Visualizations
Plot's showing correlation of **No. of vaccination doses** with **No. of cases** and **No. of deaths** using correlation coefficient(R value).
```{r, fig.show='hold', echo=FALSE, fig.cap="Cases vs Vaccination doses"}
cleann <- clean %>% na.omit()
ggscatter(cleann, x = "new_vaccinations", y = "new_cases",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "No. of vaccinations",
ylab = "No. of cases")+
theme_pubr()+
scale_x_continuous(labels = scales::comma) +
geom_smooth(method = "loess", color = "red", se = FALSE) +
geom_point(alpha = 0.4)
```
```{r, fig.cap="Deaths vs Vaccination doses", echo=FALSE}
ggscatter(cleann, x = "new_vaccinations", y = "new_deaths",
add = "reg.line", conf.int = TRUE,
cor.coef = TRUE, cor.method = "pearson",
xlab = "No. of vaccinations",
ylab = "No. of deaths")+
theme_pubr()+
scale_x_continuous(labels = scales::comma) +
geom_smooth(method = "loess", color = "red", se = FALSE) +
geom_point(alpha = 0.4)
```
Story 2 Observations:
- Figure 3: The R value here is R=0.2 which signifies that it is a VERY weak positive linear relationship. This means that as no. of vaccinations are increasing, the no. of cases are also increasing but at a very slow pace. We can say that the vaccinations little impact on the no. of cases as the cases are not increasing rapidly.
- Figure 4: The R value here is R=-0.25 which signifies that it is a VERY weak negative linear relationship. This means that as the vaccination doses increase, the deaths decrease, but at a very slow pace. We can say that the vaccinations have some impact on controlling deaths as well as the no. of deaths are decreasing to some extent.
# Conclusion
The conclusion is explained via plotting a *correlation matrix plot* i.e. Correlogram. Here, POSITIVE correlations are displayed in BLUE and NEGATIVE are displayed in RED.
```{r conclusion, echo=FALSE, fig.cap="Conclusion"}
plot <- clean %>% select(-c(year)) %>%
na.omit()
plot.cor <- cor(plot)
plot.cor = cor(plot, method = c("spearman"))
corrplot(plot.cor)
```
We conclude that:
- Though positive tested people keep increasing throughout the year of 2020 and 2021, but recoveries take place at the same pace too, showing that the** health care sector is improving and effecting the health in a positive way**.
- There is almost VERY weak positive relation between vaccinations and cases i.e. as vaccination doses increase, the cases increase but at a VERY slow pace. Also, the increase in vaccination doses has lead to decrease of deaths too, again, at a slow pace but a decrease is seen (VERY weak negative relationship).This shows that the **rate at which vaccinations are provided by the health care, is helping in reducing the surge of cases and bringing down deaths slowly but steadily**.
Hence, the health sector at Mexico is improved and is improving **gradually** to become a **big boon to the country**.
# References
[1] Alboukadel Kassambara (2020). ggpubr: 'ggplot2' Based Publication Ready
Plots. R package version 0.4.0. https://CRAN.R-project.org/package=ggpubr
[2] Arel-Bundock et al., (2018). countrycode: An R package to convert country
names and country codes. Journal of Open Source Software, 3(28), 848,
https://doi.org/10.21105/joss.00848
[3] Arellanos-Soto, D., Padilla-Rivas, G., Ramos-Jimenez, J., Galan-Huerta, K., Lozano-Sepulveda, S., & Martinez-Acuña, N. et al. (2021). Decline in influenza cases in Mexico after the implementation of public health measures for COVID-19. Scientific Reports, 11(1). doi: 10.1038/s41598-021-90329-w
[4] C. Sievert. Interactive Web-Based Data Visualization with R, plotly, and
shiny. Chapman and Hall/CRC Florida, 2020.
[5] Data FAQs. (2021). Retrieved 16 September 2021, from https://ourworldindata.org/faqs#how-is-our-work-copyrighted
[6] Frank E Harrell Jr, with contributions from Charles Dupont and many others.
(2021). Hmisc: Harrell Miscellaneous. R package version 4.5-0.
https://CRAN.R-project.org/package=Hmisc
[7] Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy
with lubridate. Journal of Statistical Software, 40(3), 1-25. URL
https://www.jstatsoft.org/v40/i03/.
[8] GitHub: Where the world builds software. (2021). Retrieved 2 June 2021, from https://github.com/
[9] Grayson, G., & Ball, J. (2021). About Mexico | Scholastic. Retrieved 16 September 2021, from https://www.scholastic.com/teachers/articles/teaching-content/mexico/
[10] H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New
York, 2016.
[11] Hadley Wickham and Jim Hester (2021). readr: Read Rectangular Text
Data. R package version 2.0.0. https://CRAN.R-project.org/package=readr
[12] Hao Zhu (2021). kableExtra: Construct Complex Table with 'kable' and Pipe
Syntax. R package version 1.3.4. https://CRAN.R-project.org/package=kableExtra
[13] Richard Iannone, Joe Cheng and Barret Schloerke (2021). gt: Easily Create
Presentation-Ready Display Tables. R package version 0.3.1.
https://CRAN.R-project.org/package=gt
[14] Ritchie, H., Mathieu, E., Rodés-Guirao, L., Appel, C., Giattino, C., & Ortiz-Ospina, E. et al. (2021). Coronavirus Pandemic (COVID-19). Retrieved 16 September 2021, from https://ourworldindata.org/coronavirus/country/mexico
[15] RStudio Team (2020). RStudio: Integrated Development for R. RStudio, PBC, Boston, MA URL http://www.rstudio.com/.
[16] Tanaka, E., 2021. ETC5523: Communicating with Data, Monash University.
[17] Taiyun Wei and Viliam Simko (2021). R package 'corrplot': Visualization of a
Correlation Matrix (Version 0.90). Available from
https://github.com/taiyun/corrplot
[18] Tariq, A., Banda, J., Skums, P., Dahal, S., Castillo-Garsow, C., & Espinoza, B. et al. (2021). Transmission dynamics and forecasts of the COVID-19 pandemic in Mexico, March-December 2020. PLOS ONE, 16(7), e0254826. doi: 10.1371/journal.pone.0254826
[19] Wickham et al., (2019). Welcome to the tidyverse. Journal of Open
Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686