-
Notifications
You must be signed in to change notification settings - Fork 0
/
lecture.Rmd
1154 lines (770 loc) · 34 KB
/
lecture.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Introduction to statistical Analysis"
author: "Mark Dunning. Original materials by Materials by Deepak Parashar, Sarah Dawson and Sarah Vowler"
date: '`r format(Sys.time(), "Last modified: %d %b %Y")`'
output:
slidy_presentation:
footer: 'This work is licensed under the Creative Commons Attribution-ShareAlike
2.0. COURSE HOME: tiny.cc/crukStats'
ioslides_presentation: default
beamer_presentation: default
css: mystyle.css
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(comment = NA,echo=FALSE,message=FALSE,warning = FALSE)
```
# Introduction
## ~ Timetable
- 10.30 - 11.15 – Lecture: Introduction to Statistical analysis
- 11.15 - 11.30 – Quiz: Variables/Dependencies/Tests/Generalisability
- 11.30 - 12.00 – Lecture: Parametric Tests for Continuous Variables; t-tests
- 12.00 - 12.30 – Examples/Practicals (computer based)
- 12.30 - 13.30 – Lunch (not provided)
- 13.30 - 14.00 – Lecture: Non-parametric tests for continuous variable
- 14.00 - 14.30 – Examples/Practicals (computer based)
- (14:30 COFFEE)
- 14.30 - 14.45 – Lecture: Tests for Categorical Variables
- 14.45 - 15.30 – Examples/Practicals/Solutions (computer based)
- 15.30 - 16.25 – Group based exercise: Choosing appropriate tests
- 16.25 - 16.30 – Summary
## The point of statistics
- Rarely feasible to study the whole population that we are interested in, so we take a sample instead
- Assume that data collected represents a larger population
- Use sample data to make conclusions about the overall population
![](images/aSample.png)
## Beginning a study
- Which samples to include?
+ Randomly selected?
+ Generalisability
- Always think about the statistical analysis
+ Randomised comparisons, or biased?
+ Any dependency between measurements?
+ Data type?
+ Distribution of data? (Normally distributed? Skewed? Bimodal?...)
## Generalisability
- How samples are selected affects interpretation
+ What is the population that the results apply to?
+ How widely applicable will the study be?
- Statistical methods assume random samples
- Do not extrapolate beyond range of the data
+ i.e. don’t assume results apply to anything not represented in the data
- Examples:
+ Males only, no idea about females
+ Adults only, no idea about children
+ 1 litter of mice, no idea about other litters
## Data types
- Several different categorisations
- Simplest:
+ Categorical (nominal)
+ Categorical with ordering (ordinal)
+ Discrete
+ Continuous
## Nominal
![](images/nominal-things.png)
<!----http://www.boredpanda.com/donald-trump-hair-look-alikes-->
- Most basic type of data
- Three requirements:
+ Same value assigned to all the members of level
+ Same number not assigned to different levels
+ Each observation only assigned to one level
- Boils down to yes/no answer
- e.g. Surgery type, smoker / non-smoker, eye colour, dead/alive, ethnicity.
## Ordinal
![](images/ordinal.png)
- Mutually exclusive fixed categories
- Implicit order
- Can say one category higher than another
+ But not how much higher
- Example: stress level 1 = low … 7 = high
- Others: Grade, stage, treatment response, education level, pain level.
## Discrete
![](images/discrete.png)
- Fixed categories, can only take certain values
- Like ordinal but with well-defined distances
+ Can be treated as continuous if range is large
- Anything counted (cardinal) is discrete
+ *how many*?
- Examples: number of tumours, shoe size, hospital admissions, number of side effects, medication dose, CD4 count, viral load, reads.
## Continuous
![](images/continuous.png)
- Final type of data
- Anything that is measured, can take any value
- May have finite or infinite range
- Zero may be meaningful: ratios, differences
+ Care required with interpretation
- Given any two observations, one fits between
- Examples: Height, weight, blood pressure, temperature, operation time, blood loss, age.
## Data types
- Several different categorisations
- Simplest:
+ Categorical (nominal) – yes/no
+ Categorical with ordering (ordinal) – implicit order
+ Discrete – only takes certain values; counts (cardinal)
+ Continuous – measurements; finite/infinite range
## Measurements: Dependent / Independent?
- Measurements of gene expression taken from each of 20 individuals
- Are any measurements more closely related than others?
+ Siblings/littermates?
+ Same individual measured twice?
+ Batch effects?
- If no reason, assume independent observations
## Continuous Data - Distribution
```{r fig.width=12}
library(ggplot2)
library(gridExtra)
p1 <- ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=10))
p2 <- ggplot(data.frame(x=rnorm(1000)),aes(x)) + geom_histogram()
p3 <- ggplot(data.frame(x="x",y=rnorm(1000)),aes(x,y)) + geom_boxplot()
grid.arrange(p1,p2,p3,ncol=3)
```
## Continuous Data - Distribution?
```{r message=FALSE}
df <- data.frame(group = c(rep("WT", 50), rep("KO",50)), count = c(rnorm(50,5,3), rnorm(50,4,2)))
p1 <- ggplot(df, aes(x=count)) + geom_histogram(fill="steelblue",col="gray") + facet_wrap(~group)
df <- data.frame(group = c(rep("WT", 50), rep("KO",50)), count = c(rnorm(50,400,20), rnorm(50,300,30)))
p2 <- ggplot(df, aes(x=group,y=count)) + geom_boxplot(fill="steelblue")
df <- data.frame(x=rbeta(1000,2,5))
p3 <- ggplot(df, aes(x)) + geom_histogram(fill="steelblue",col="gray")
df <- data.frame(x=rnorm(10,2,5))
p4 <- ggplot(df, aes(x)) + geom_histogram(fill="steelblue",col="gray")
grid.arrange(p1,p3,p2,p4,ncol=2)
```
## Continuous Data - Descriptive Statistics
- Measures of location and spread
```{r fig.height=5,fig.width=5}
ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=10))
```
- Mean and standard deviation
$\bar{X} = \frac{X_1 + X_2 + \dots X_n}{n}$
$s.d = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}$
## Continuous Data - Descriptive Statistics
```{r fig.height=5,fig.width=5}
df <- data.frame(x=rbeta(10000,2,5))
ggplot(df, aes(x)) + geom_histogram(fill="steelblue",col="gray")
```
- Median: middle value
- Lower quartile: median bottom half of data
- Upper quartile: median top half of data
## Continuous Data - Descriptive Statistics (Example)
- e.g. No of Facebook friends for 7 colleagues
+ 311, 345, 270, 310, 243, 5300, 11
- Mean and standard deviation
$\bar{X} = \frac{X_1 + X_2 + \dots X_n}{n} = 970$
$s.d = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}=1912.57$
- Median and Interquartile range
+ 11, **243**, 270, ***310***, 311, **345**, 5300
## Continuous Data - Descriptive Statistics (Example)
- e.g. No of Facebook friends for 7 colleagues
+ 311, 345, 270, 310, 243, ***530***, 11
- Mean and standard deviation
$\bar{X} = \frac{X_1 + X_2 + \dots X_n}{n} = 289$
$s.d = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}=153.79$
- Median and Interquartile range
+ 11, **243**, 270, ***310***, 311, **345**, 5300
## Continuous Data - Descriptive Statistics (Example)
- e.g. No of Facebook friends for 7 colleagues
+ 311, 345, 270, 310, 243, ***530***, 11
- Mean and standard deviation: ***low breakdown point***
$\bar{X} = \frac{X_1 + X_2 + \dots X_n}{n} = 289$
$s.d = \sqrt{\frac{1}{N-1} \sum_{i=1}^N (x_i - \overline{x})^2}=153.79$
- Median and Interquartile range: ***robust to outliers***
+ 11, **243**, 270, ***310***, 311, **345**, 5300
## Categorical data
- Summarised by counts and percentages
- Examples
+ 19/82 (23%) subjects had Grade IV tumour
+ 48/82 (58%) subjects had Diarrhoea as an Adverse Event
```{r}
ggplot(data.frame(Grade=c("","I","II","IV"), Frequency=c(25,2,15,30)),aes(x=Grade,y=Frequency)) + geom_bar(stat="identity",fill="steelblue")
```
## Standard Deviation and Standard Error
- Commonly confused
- Standard deviation:
+ Measure of spread of the data
- Standard error:
+ Variability of the mean from repeated sampling
+ Precision of mean
+ Used to calculate confidence interval
- SD: How widely scattered measurements are
- SE: Uncertainty in estimate of sample mean
## Confidence intervals for the mean
- Confidence interval (CI) is a random interval
- In repeated experiments…
+ 95% of time CI covers the mean
- The mean should be in the CI 95% of the time
95% CI: $(\bar{X}-1.96 \times SE,\bar{X} + 1.96 \times SE)$
SE: $$\frac{sd}{\sqrt{n}} $$
## Confidence intervals for the mean
- Confidence interval (CI) is a random interval
- In repeated experiments…
+ 95% of time CI covers the mean
- The mean should be in the CI 95% of the time
95% CI: $(\bar{X}-1.96 \times SE,\bar{X} + 1.96 \times SE)$
SE: $$ \frac{sd}{\sqrt{n}} = \frac{154}{\sqrt{7}} = 58 $$
Facebook data: 11, 243, 270, ***310***, 311, 345, 530
Mean: 289, 95% CI (175, 402)
## Confidence Intervals
![](images/confidence-intervals.png)
## Hypothesis tests - basic setup
- Formulate a null hypothesis, $H_0$
+ Example: the difference in gene expression before and after treatment = 0
- Calculate a test statistic from the data under the null hypothesis
- Compare the test statistic to the theoretical values
+ is it more extreme than expected
+ the "***p-value***"
- Either reject or do not reject the null hypothesis
+ "***Absence of evidence is not evidence of absence***" (Bland and Altman, 1995)
- (Correction for multiple testing)
## Hypothesis tests - Example
![](images/320px-Nice_Cup_of_Tea.jpg)
- [The Lady Tasting Tea](https://en.wikipedia.org/wiki/Lady_tasting_tea) - Randomised Experiment by Fisher
- Randomly ordered 8 cups of tea
+ 4 were prepared by first adding milk
+ 4 were prepared by first adding tea
- Task: Lady had to select the 4 cups of one particular method
- $H_0$: Lady had no such ability
- *Test Statistic*: number of successes in selecting the 4 cups
- *Result*: Lady got all 4 cups correct
- *Conclusion*: Reject the null hypothesis
## Hypothesis tests - Errors
![](images/error-table.png)
- Many factors that may affect our results
+ significance level, sample size, difference of interest, variability of the observations
- Be aware of issues of multiple testing
## When to use which test
![](images/cheat-sheet.png)
## Quiz
- Annoymous!
- tinyurl.com/further-stats-prac
# Tests for continuous variables: T-tests
## Statistical tests - continuous variables
- t-test:
+ ***One-sample t-test***
+ (e.g. $H_0$: mean=5)
+ ***Independent two-sample t-test***
+ (e.g. $H_0$: mean of sample 1 = mean of sample 2)
+ ***Paired two-sample t-test***
+ (e.g. $H_0$: mean difference between pairs = 0)
## T-distributions
```{r}
ggplot(data.frame(x=c(-4,4)),aes(x,color="red")) + stat_function(fun=dt, args=list(df=1),aes(colour="df=1")) +
stat_function(fun=dt, args=list(df=3),aes(colour="df=3")) +
stat_function(fun=dt, args=list(df=8),aes(colour="df=8")) +
stat_function(fun=dt, args=list(df=30),aes(colour="df=30")) +
stat_function(fun=dnorm, aes(colour="normal")) +
scale_color_manual(name="Distribution", values=c("red","blue","green","yellow","black")) +
ggtitle("Comparison of t Distributions")
```
## One-sample t-test: does mean = X?
- e.g. Question: Published data suggests that the microarray failure rate for a particular supplier is 2.1%
- **Genomics Core want to know if this holds true in their own lab?**
## One-sample t-test: does mean = X?
- Null hypothesis, $H_0$:
+ Mean monthly failure rate = 2.1%
- Alternative hypothesis: $H_1$:
+ Mean monthly failure rate $\ne$ 2.1%
- Tails: *two-tailed*
- Either *reject* or *do not reject* the null hypothesis - ***Never accept the alternative hypothesis***
## One sample t-test; the data
```{r results='as.is'}
library(knitr)
failure <- data.frame(Month = month.name, "Monthly failure rate" = c(2.9,2.99,2.48,1.48,2.71,4.17,3.74,3.04,1.23,2.72,3.23,3.4))
kable(failure)
me <- round(mean(failure$Monthly.failure.rate),3)
sd <- round(sd(failure$Monthly.failure.rate),3)
```
mean = $(2.9 + \dots + 3.40) / 12$ = `r me`
Standard deviation = `r sd`
Hypothesised Mean = 2.1
## One-sample t-test; key assumptions
- Observations are independent
- Observations are normally distributed
```{r}
hist(failure$Monthly.failure.rate,col="steelblue",xlab="Monthly Failure Rate",main="")
```
## One-sample t-test; results
```{r}
test <- t.test(failure$Monthly.failure.rate,mu=2.1)
stat <- round(test$statistic,3)
pval <- round(test$p.value,3)
degfree <- test$parameter
critvals <- c(qt(0.05, degfree),qt(0.95,degfree))
rect1 <- data.frame(xmin = -4,xmax = critvals[1], ymin=-Inf,ymax=Inf)
rect2 <- data.frame(xmin = critvals[2],xmax = 4, ymin=-Inf,ymax=Inf)
```
Test statistic: $$t_{n-1} = t_{11} = \frac{\bar{x} - \mu_0} {s.d. / \sqrt{n}} = \frac{2.84 - 2.10}{s.e.(\bar{x})} = $$`r stat`
```{r}
ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=11)) +
geom_rect(data=rect1,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_rect(data=rect2,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE)
```
## One-sample t-test; results
```{r}
ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=11)) +
geom_rect(data=rect1,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_rect(data=rect2,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_vline(xintercept = stat,lty=2,col="red")
```
## One-sample t-test; results
Test statistic: $$t_{n-1} = t_{11} = \frac{\bar{x} - \mu_0} {s.d. / \sqrt{n}} = \frac{2.84 - 2.10}{s.e.(\bar{x})} = $$`r stat`
df = 11
P = 0.01
***Reject*** $H_0$
- Evidence that mean monthly failure rate $\ne$ 2.1%
## One-sample t-test; results
- The mean monthly failure rate of microarrays in the Genomics core is 2.84 (95% CI: 2.30, 3.37).
- It is not equal to the hypothesized mean proposed by the company of 2.1.
- t=3.07, df=11, p=0.01
## Two-sample t-test
- Two types of two-sample t-test:
+ Independent:
+ e.g.the weight of two different breeds of mice
- Paired
+ e.g. a measurement of disease at two different parts of the body in the same patient / animal
+ e.g. measurements before and after treatment for the same individual
## Independent two-sample t-test: Does the mean of group A = mean of group B?
![](images/mice-breeds.png)
- e.g. research question: 40 male mice (20 of breed A and 20 of breed B) were weighed at 4 weeks old
- Does the weight of 4-week old male mice depend on breed?
## Independent two-sample t-test: Does the mean of group A = mean of group B?
- Null hypothesis, $H_0$
+ mean weight of breed A = mean weight of breed B
- Alternative hypothesis, $H_1$
+ mean weight of breed B $\ne$ mean weight of breed B
- Tails: two-tailed
+ the tails on the distribution - not on the mice!!!
- Either ***reject*** or ***do not reject*** the null hypothesis - ***never accept the alternative hypothesis***
## Independent two-sample t-test: the data
![](images/mice-data.png)
## Independent two-sample t-test: key assumptions
- Observations are independent
- Observations are normally-distributed
```{r}
mice <- read.csv("Manual/Independent two-sample t-test.csv")
par(mfrow=c(1,2))
ggplot(mice, aes(x = Weight)) + geom_histogram(fill="steelblue",col="grey") + facet_wrap(~Breed)
```
## Independent two-sample t-test: *More* key assumptions
- Equal variance in the two comparison groups
+ Use "Welch's correction" if variances are different
+ alters the t-statistic and degrees of freedom
```{r}
ggplot(mice, aes(x=Breed,y=Weight)) + geom_boxplot(fill="steelblue") + coord_flip()
test <- t.test(Weight~Breed,data=mice,var.equal = FALSE)
tsat <-round(test$statistic,2)
degfree <- round(test$parameter,2)
pval <- round(test$p.value,2)
critvals <- c(qt(0.05, degfree),qt(0.95,degfree))
rect1 <- data.frame(xmin = -4,xmax = critvals[1], ymin=-Inf,ymax=Inf)
rect2 <- data.frame(xmin = critvals[2],xmax = 4, ymin=-Inf,ymax=Inf)
```
## Independent two-sample t-test: result
$t_{df} = \frac{\bar{X_A} - \bar{X_B}}{s.e.(\bar{X_A} - \bar{X_B})}$ = `r tsat`
df = `r degfree` (with Welch's correction)
```{r}
ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=degfree)) +
geom_rect(data=rect1,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_rect(data=rect2,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_vline(xintercept = tsat,lty=2,col="red")
```
P-value: `r round(test$p.value,2)`
***Do not reject*** $H_0$
(No evidence that mean weight of breed A $\ne$ mean weight of breed B)
## Independent two-sample t-test: result
- The difference in mean weight between the two breeds is -1.30 (95% CI: -3.48, 0.89)
+ [NB as this is negative, breed B mice tend to be bigger than breed A].
- There is no evidence of a difference in weights between breed A and breed B.
- t=`r tsat`, df= `r degfree` (Welch’s correction), p=`r pval`
## Paired two-sample t-test: Does the mean difference = 0?
- e.g. Research question: 20 patients with ovarian cancer were studied using MRI imaging. Cellularity was measured for each patient at two sites of disease.
- Does the cellularity differ between two different sites of disease?
## Paired two-sample t-test: Does the mean difference = 0?
- Null hypothesis, $H_0$:
+ Cellularity at site A = Cellularity at site B
- Alternative hypothesis, $H_1$
+ Cellularity at site A $\ne$ Cellularity at site B
- Tails: two-tailed
- Either ***reject*** or ***do not reject*** the null hypothesis - ***never accept the alternative hypothesis***
## Paired two-sample t-test; null hypothesis
- $H_0$; Cellularity at site A = Cellularity at site B
+ ***or***
- $H_0$: Cellularity at site A - Cellularity at site B = 0
## Paired two-sample t-test; the data
![](images/cellularity-table.png)
## Paired two-sample t-test; key assumptions
- Observations are independent
- The ***paired differences*** are normally-distributed
```{r}
data <- read.csv("Manual/Paired two-sample t-test.csv")
data$diff <- data$A - data$B
ggplot(data, aes(x=diff)) + geom_histogram(fill="steelblue")
test <- t.test(data$diff)
stat <- round(test$statistic,2)
degfree <- round(test$parameter,2)
pval <- round(test$p.value,3)
critvals <- c(qt(0.05, degfree),qt(0.95,degfree))
rect1 <- data.frame(xmin = -4,xmax = critvals[1], ymin=-Inf,ymax=Inf)
rect2 <- data.frame(xmin = critvals[2],xmax = 4, ymin=-Inf,ymax=Inf)
```
## Paired two-sample t-test; results
$t_{n-1} = t_{19} = \frac{\bar{X_{A-B}}}{s.e.(\bar{X_{A-B}})} =$ `r stat`
df = `r degfree`
```{r}
ggplot(data.frame(x=c(-4,4)),aes(x)) + stat_function(fun=dt, args=list(df=degfree)) +
geom_rect(data=rect1,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_rect(data=rect2,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),fill="yellow", alpha=0.5, inherit.aes = FALSE) + geom_vline(xintercept = stat,lty=2,col="red")
```
P-value: `r pval`
***Reject*** $H_0$
(evidence that cellularity at Site A $\ne$ site B)
## Paired two-sample t-test; results
- The difference in cellularity between the two sites is 19.14 (95% CI: 8.20, 30.08).
- There is evidence of a difference in cellularity between the two sites.
- t=`r stat`, df=`r degfree`, p=`r pval`.
## What if normality is not reasonable?
- Transform your data, e.g. log transformation
- Non-parametric tests....
## Summary - continuous variables
- One-sample t-test
+ Use when we have one group.
- Independent two-sample t-test
+ Use when we have two independent groups. A Welch correction may be needed if the two groups have different spread.
- Paired two-sample t-test
+ Use when we have two non-independent groups.
- Non-parametric tests or transformations
+ Use when we cannot assume normality.
## Summary - t-test
- Turn scientific question to null and alternative hypothesis
- Think about test assumptions
- Calculate summary statistics
- Carry out t-test if appropriate
## Practical
- Complete the t-test practical
+ Parametric tests only!
- We will start the next lecture after lunch
+ 13:30
# Tests for continuous variables: non-parametric methods
## Mann-Whitney U test
- Also called the Wilcoxon Rank Sum test
- Assumptions:
+ Two independent groups
+ At least ordinal dependent variable
+ Randomly selected observations
+ Population distributions same shape
- Hypotheses:
+ $H_0$: populations have the same median
+ $H_0$: populations have the same spread and shape
## Misunderstood test
## Method
- Construct hypotheses and decide on $\alpha$
- Rank whole sample from smallest to largest
- Assign average rank to ties
- Calculate sum of ranks for each group
- Calculate:
$$ U_1 = n_1n_2 + \frac{n_1(n_1+1)}{2} - R_1 $$
$$ U_2 = n_1n_2 + \frac{n_2(n_2+1)}{2} - R_2 $$
$U = min (U_1, U_2)$
- Compare U to critical value in the tables
## Example
- Fisher’s book, coronary artery surgery study
- Exercise times in seconds, control and 3 vessel’s group
- Is there a difference in exercise times between the two groups, two-sided test
![](images/fisher-table1.png)
## Example
- Fisher’s book, coronary artery surgery study
- Exercise times in seconds, control and 3 vessel’s group
- Is there a difference in exercise times between the two groups, two-sided test
![](images/fisher-table2.png)
- Sum of ranks: control group = 101, 3 vessels = 70
## Example
$$ U_1 = 8 \times 10 + \frac{8 (8+1)}{2} - 101 = 15 $$
$$ U_2 = 8 \times 10 + \frac{10 (10+1)}{2} - 70 = 65 $$
$$ U - min(U_1,U_2) = min(15,65) = 15$$
- Look-up $n_1$ = 8, $n_2$=10, p = 2.3 (as 2-sided)
- U = 15 < 17, from the tables
- Presentation of the results
+ "The Mann-Whitney U test showed that the individuals in the control group exercised for significantly longer than the individuals in the 3 vessels group (U=15, p=0.0025)"
## Advantages and limitations
- Almost as powerful as t-test
+ almost as likely as t-test to reject H0 if false
- Sensitive to central tendencies of scores
- Often misinterpreted:
+ Difference in medians if same shape distributions
+ Otherwise tests for a combination of differences between the distributions, including spread and shape
## The sign-test
- A very simple non-parametric test
+ based on the Binomial distribution
- Uses directions of differences
- One-sample case: compares to fixed value
- Paired two-sample case: compares medians
## Sign-test
- Assumptions:
+ Order in coding system (minimally requires ordinal data)
+ Randomly selected observations (independent)
+ For the two-sample case, requires paired data
- Hypotheses:
+ $H_0$: medians in the two groups are equal
+ $H_A$: medians in the two groups differ
## Method: One-sample sign Test:
- Compare values to a specific value
+ "+" if bigger
+ "-" if smaller
+ "=" if equal
- Count the number of "+"'s and "-"'s and calculate
+ x = smallest of the positives and negatives
+ n = number of non-ties
- Compare to binomial tables
+ with p=0.5 (binomial success probability, not p-value!)
## One-Sample example
- General health section of SF-36 collected in a breast cancer study
- Expected value in general population 72
![](images/sign-test1.png)
- Number of non–ties: n = 14
- Number of negatives: 9
- Number of positives: 5
- Look up n=14, p=0.5, x=5 in the binomial tables
## One-Sample example
- General health section of SF-36 collected in a breast cancer study
- Expected value in general population 72
![](images/sign-test1.png)
- P = 0.42
- Therefore insufficient evidence to reject H0
- Conclusion: insufficient evidence to suggest that the median value is different from 72
## Method: two-sample sign test (paired data)
- Compare paired values between the two samples:
+ "+" : if the value in sample 1 is bigger
+ "–" : if the value in sample 1 is smaller
+ "=" : if the value in the two samples is equal
- Count the number of +’s and -’s, and calculate:
+ x = smallest of the positives and negatives
+ n = number of non-ties
- Compare to binomial tables
+ With p = 0.5 (binomial success probability, not p-value)
## Two-sample Example
- General health values collected in same study at a 2nd time point
- Is there a difference between the time points?
![](images/sign-test2.png)
- Number of non–ties: n = 15
- Number of negatives: 12
- Number of positives: 3
-Look up n=15, p=0.5, x=3 in binomial tables
## Two-sample Example
- General health values collected in same study at a 2nd time point
- Is there a difference between the time points?
![](images/sign-test2.png)
- P = 0.035, sufficient evidence to reject H0
- Conclusion: there is a difference in general health between the two time points
## Presentation of the results
- One-sample case:
+ "There is insufficient evidence to suggest a significant difference between the median general health value (60) observed in this sample and the value (72) observed in the general population (p=0.42, sign test)."
- Two-sample case:
+ "The median general health value observed at the second time point (70) was found to be significantly higher than the median (60) observed at the first time point (p=0.035, sign test)."
## Advantages and limitations
- Simple - few assumptions thus widely applicable
- Significance threshold can be adjusted
- Less powerful than other tests
+ Does not consider magnitude of differences
+ May fail to reject null hypothesis when other tests would achieve significance.
- Can be used for quick assessment of direction
## Wilcoxon signed rank test
- Alternative to sign test
- Assumptions:
+ Paired data (e.g. matched samples, repeated measurements)
+ Each pair is independent
+ Continuous or ordinal data (Normality not assumed)
+ Symmetry of difference scores about true median difference (test by looking at histogram/boxplot)
- Hypothesis:
+ $H_0$: sum of positive ranks equals sum of negative ranks
+ $H_A$: sum of positive ranks not equal to sum of negative ranks
## Method
- Calculate differences for each pair
- Rank the paired differences by magnitude
- Split the ranks into two groups:
+ positive and negative signed differences
- Calculate sum of positive ranks: $W^+$
- Calculate sum of negative ranks: $W^-$
- Compare smaller of and $W^+$ and $W^-$ to the critical value from the tables
## Example
- Taken from Glanz’ book, data correspond to urine production before/after diuretic
- Is there a difference?
![](images/wilcoxon-signed.png)
## Example
![](images/wilcoxon-signed2.png)
- $W^+$ = 4, $W^-$=17
- Look-up W=4, n=6 in the tables
## Results
- Result: p=0.22
+ insufficient evidence to reject null hypothesis
- Conclusion: there is no evidence of a change in urine production before and after the drug
- Presentation of the results:
+ "The Wilcoxon signed rank test was unable to detect a significant change in urine production before and after treatment (W=4, p=0.22)".
## Advantages and Limitations
- Easy to apply
- Powerful
+ Utilises more information than the Sign test (but less than the paired t-test)
- Sometimes misinterpreted
+ Assumes symmetry of difference scores about the true median difference
## Summary: Two independent Samples
- ***t-test:***
+ Compares means when the data can be assumed to be Normally distributed.
- ***Mann-Whitney U test (Wilcoxon Rank Sum test):***
+ Compares medians in two independent groups, without assuming Normality. However, does assume similarity of distributions. Otherwise, compares the shape and spread of the two groups, leading to potential misinterpretation of results.
## Summary: Paired Groups
- ***One-sample Sign test: ***
+ Compares the median to a proposed value.
- ***Two-sample Sign test: ***
+ Compares the medians between matched pairs.
- ***Wilcoxon Signed Rank test: ***
+ Compares means, providing the distribution of differences is symmetric.
- ***Paired t-test: ***
+ Compares means, providing paired differences can be assumed to be Normally distributed.
# Tests for categorical variables
## Associations between categorical variables
- All about frequencies!
- Row x Column table (2 x 2 simplest)
- Categorical data
- Look for association (relationship) between row variable and column variable
```{r}
df <- data.frame(Tumour.Did.Not.Shrink = c(44,24), Tumour.Did.Shrink = c(40,16))
rownames(df) <- c("Treatment","Placebo")
kable(df)
```
## Chi-square test
- E.g. Research question: A trial to assess the effectiveness of a new treatment versus a placebo in reducing tumour size in patients with ovarian cancer.
```{r}
kable(df)
```
- Is there an association between treatment group and tumour shrinkage
- Null hypothesis, $H_0$: No association
- Alternative hypothesis, $H_1$: Some association
## Chi-square test: calculating expected frequencies
```{r}
df2 <- cbind(df, rowSums(df))
df2 <- rbind(df2, colSums(df))
colnames(df2)[3] <- rownames(df2)[3] <- "Total"
kable(df2)
```
$$E = \frac{row total \times col total}{overall total} $$
- e.g. for row 1, column 1
$$\frac{84}{124} \times \frac{68}{124} \times 124 = \frac{84\times68}{124} = 46.1$$
## Chi-square test: calculating the chi-square statistic
***Observed frequencies:***
```{r}
csq <- chisq.test(df,correct = FALSE)
kable(df)
```
***Expected frequencies:***
```{r}
kable(round(csq$expected,1))
```
$$\chi^2_1 = \frac{(44-46.06)^2}{46.06} + \frac{(40-37.94)^2}{37.94} + \frac{(24-21.94)^2}{21.94} + \frac{(16-18.06)^2}{18.06} = 0.63$$
## Chi-square test
```{r}
degf <- csq$parameter
xmax <- max(4,csq$statistic)
xs <- seq(0, xmax, length.out = 10000)
df <- data.frame(X = xs, Y = dchisq(xs,degf))
title <- substitute(paste(chi^2, " with ", degf, " degrees of freedom"),list(degf=degf))
p <- ggplot(df, aes(x=X,y=Y)) + geom_line() + geom_vline(xintercept=csq$statistic,col="red") + xlim(0,xmax+1) + ggtitle(title)
p
pval <- round(csq$p.value,2)
```
Test statistic: ${\chi_1}^2$ = `r pval`
df = `degf`
P-value = `r pval`
***Do not reject $H_0$ (No evidence of an association between treatment group and tumour shrinkage)***
## Limitations of the chi-square test