Merge pull request #13 from UCL-ARC/stats_episode_updates

Stats episode updates closes Issue #12
UCL-ARC · Aug 7, 2024 · 93e83e4 · 93e83e4
2 parents 24747f9 + 64d21f9
commit 93e83e4
Showing 1 changed file with 14 additions and 15 deletions.
diff --git a/episodes/23-statistics.Rmd b/episodes/23-statistics.Rmd
@@ -37,10 +37,9 @@ library(tidyverse)
 lon_dims_imd_2019 <- read.csv("data/English_IMD_2019_Domains_rebased_London_by_CDRC.csv")
 # Commenting out as not used in this version
 # library(lubridate)
-library(gapminder)
-# create a binary membership variable for europe (for later examples)
-gapminder <- gapminder %>%
-  mutate(european = continent == "Europe")
+#library(gapminder)
+# create a binary membership variable for City of London (for later examples)
+lon_dims_imd_2019 <- lon_dims_imd_2019 %>% mutate(city = la19nm == "City of London")
 ```
 
 We are going to use the data from the gapminder package.  We have added a variable *European* indicating if a country is in Europe.
@@ -257,7 +256,7 @@ lon_dims_imd_2019 %>%
 ```
 
 
-Is the difference between heights statistically significant?
+Is the difference between the income ranks statistically significant?
 
 ## t-test
 
@@ -274,9 +273,9 @@ Is the difference between heights statistically significant?
 ## Doing a t-test
 
 ```{r}
-# Example to be changed
-# t.test(pop ~ european, data = gapminder)$statistic
-# t.test(pop ~ european, data = gapminder)$parameter
+
+ t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$statistic
+ t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$parameter
 ```
 
 Notice that the summary()** of the test contains more data than is output by default.
@@ -286,8 +285,8 @@ Write a paragraph in markdown format reporting this test result including the t-
 
 ### t-test result
 
-Testing supported the rejection of the null hypothesis that there is no difference between mean populations of European and non-European participants (**t**=`r round(t.test(pop ~european, data = gapminder)$statistic,4)`, **df**= `r round(t.test(pop ~european, data = gapminder)$parameter,4)`,
-**p**= `r round( t.test(pop ~european, data = gapminder)$p.value,4)`).
+Testing supported the rejection of the null hypothesis that there is no difference between mean health rank of City of London and non-City of London areas (**t**=`r round(t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$statistic,4)`, **df**= `r round(t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$parameter,4)`,
+**p**= `r round( t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$p.value,4)`).
 
 (Can you get p to display to four places?  Cf *format()*.)
 
@@ -336,18 +335,18 @@ summary(modelone)
 Run the following code chunk and compare the results to the t test conducted earlier.
 
 ```{r}
-gapminder %>%
-  mutate(european = factor(european))
+lon_dims_imd_2019 %>%
+  mutate(city = factor(city))
 
-modelttest <- lm(gapminder$pop ~ gapminder$european)
+modelttest <- lm(lon_dims_imd_2019$health_london_rank ~ lon_dims_imd_2019$city)
 
 summary(modelttest)
 ```
 
 ## Regression with a categorical IV (ANOVA)
 
-Use the `lm()` function to model the relationship between `gapminder$gdpGroup`
-and `gapminder$pop`. Compare the results with the ANOVA carried out earlier.
+Use the `lm()` function to model the relationship between `lon_dims_imd_2019$la19nm`
+and `lon_dims_imd_2019$health_london_rank`. Compare the results with the ANOVA carried out earlier.
 
 ## Break