Skip to content

Commit

Permalink
Merge pull request #13 from UCL-ARC/stats_episode_updates
Browse files Browse the repository at this point in the history
Stats episode updates closes Issue #12
  • Loading branch information
quirksahern authored Aug 7, 2024
2 parents 24747f9 + 64d21f9 commit 93e83e4
Showing 1 changed file with 14 additions and 15 deletions.
29 changes: 14 additions & 15 deletions episodes/23-statistics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,9 @@ library(tidyverse)
lon_dims_imd_2019 <- read.csv("data/English_IMD_2019_Domains_rebased_London_by_CDRC.csv")
# Commenting out as not used in this version
# library(lubridate)
library(gapminder)
# create a binary membership variable for europe (for later examples)
gapminder <- gapminder %>%
mutate(european = continent == "Europe")
#library(gapminder)
# create a binary membership variable for City of London (for later examples)
lon_dims_imd_2019 <- lon_dims_imd_2019 %>% mutate(city = la19nm == "City of London")
```

We are going to use the data from the gapminder package. We have added a variable *European* indicating if a country is in Europe.
Expand Down Expand Up @@ -257,7 +256,7 @@ lon_dims_imd_2019 %>%
```


Is the difference between heights statistically significant?
Is the difference between the income ranks statistically significant?

## t-test

Expand All @@ -274,9 +273,9 @@ Is the difference between heights statistically significant?
## Doing a t-test

```{r}
# Example to be changed
# t.test(pop ~ european, data = gapminder)$statistic
# t.test(pop ~ european, data = gapminder)$parameter
t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$statistic
t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$parameter
```

Notice that the summary()** of the test contains more data than is output by default.
Expand All @@ -286,8 +285,8 @@ Write a paragraph in markdown format reporting this test result including the t-

### t-test result

Testing supported the rejection of the null hypothesis that there is no difference between mean populations of European and non-European participants (**t**=`r round(t.test(pop ~european, data = gapminder)$statistic,4)`, **df**= `r round(t.test(pop ~european, data = gapminder)$parameter,4)`,
**p**= `r round( t.test(pop ~european, data = gapminder)$p.value,4)`).
Testing supported the rejection of the null hypothesis that there is no difference between mean health rank of City of London and non-City of London areas (**t**=`r round(t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$statistic,4)`, **df**= `r round(t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$parameter,4)`,
**p**= `r round( t.test(health_london_rank ~ city, data = lon_dims_imd_2019)$p.value,4)`).

(Can you get p to display to four places? Cf *format()*.)

Expand Down Expand Up @@ -336,18 +335,18 @@ summary(modelone)
Run the following code chunk and compare the results to the t test conducted earlier.

```{r}
gapminder %>%
mutate(european = factor(european))
lon_dims_imd_2019 %>%
mutate(city = factor(city))
modelttest <- lm(gapminder$pop ~ gapminder$european)
modelttest <- lm(lon_dims_imd_2019$health_london_rank ~ lon_dims_imd_2019$city)
summary(modelttest)
```

## Regression with a categorical IV (ANOVA)

Use the `lm()` function to model the relationship between `gapminder$gdpGroup`
and `gapminder$pop`. Compare the results with the ANOVA carried out earlier.
Use the `lm()` function to model the relationship between `lon_dims_imd_2019$la19nm`
and `lon_dims_imd_2019$health_london_rank`. Compare the results with the ANOVA carried out earlier.

## Break

Expand Down

0 comments on commit 93e83e4

Please sign in to comment.