-
Notifications
You must be signed in to change notification settings - Fork 96
/
statinference_project_part1_full.Rmd
81 lines (66 loc) · 3.03 KB
/
statinference_project_part1_full.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# Statistical Inference Course Project, Part 1: Simulation Exercises
The exponential distribution can be simulated in R with `rexp(n, lambda)` where
`lambda` $\lambda$ is the rate parameter. The mean of exponential distribution is
$1/\lambda$ and the standard deviation is also $1/\lambda$. For this simulation,
we set $\lambda=0.2$. In this simulation, we investigate the distribution of
averages of 40 numbers sampled from exponential distribution with $\lambda=0.2$.
Let's do a thousand simulated averages of 40 exponentials.
```{r}
set.seed(3)
lambda <- 0.2
num_sim <- 1000
sample_size <- 40
sim <- matrix(rexp(num_sim*sample_size, rate=lambda), num_sim, sample_size)
row_means <- rowMeans(sim)
```
The distribution of sample means is as follows.
```{r}
# plot the histogram of averages
hist(row_means, breaks=50, prob=TRUE,
main="Distribution of averages of samples,
drawn from exponential distribution with lambda=0.2",
xlab="")
# density of the averages of samples
lines(density(row_means))
# theoretical center of distribution
abline(v=1/lambda, col="red")
# theoretical density of the averages of samples
xfit <- seq(min(row_means), max(row_means), length=100)
yfit <- dnorm(xfit, mean=1/lambda, sd=(1/lambda/sqrt(sample_size)))
lines(xfit, yfit, pch=22, col="red", lty=2)
# add legend
legend('topright', c("simulation", "theoretical"), lty=c(1,2), col=c("black", "red"))
```
The distribution of sample means is centered at `r mean(row_means)`
and the theoretical center of the distribution is $\lambda^{-1}$ = `r 1/lambda`.
The variance of sample means is `r var(row_means)` where the theoretical variance
of the distribution is $\sigma^2 / n = 1/(\lambda^2 n) = 1/(0.04 \times 40)$ =
`r 1/(0.04 * 40)`.
Due to the central limit theorem, the averages of samples follow normal
distribution. The figure above also shows the density computed using the histogram and the
normal density plotted with theoretical mean and variance values. Also, the
q-q plot below suggests the normality.
```{r}
qqnorm(row_means); qqline(row_means)
```
Finally, let's evaluate the coverage of the confidence interval for
$1/\lambda = \bar{X} \pm 1.96 \frac{S}{\sqrt{n}}$
```{r}
lambda_vals <- seq(4, 6, by=0.01)
coverage <- sapply(lambda_vals, function(lamb) {
mu_hats <- rowMeans(matrix(rexp(sample_size*num_sim, rate=0.2),
num_sim, sample_size))
ll <- mu_hats - qnorm(0.975) * sqrt(1/lambda**2/sample_size)
ul <- mu_hats + qnorm(0.975) * sqrt(1/lambda**2/sample_size)
mean(ll < lamb & ul > lamb)
})
library(ggplot2)
qplot(lambda_vals, coverage) + geom_hline(yintercept=0.95)
```
The 95% confidence intervals for the rate parameter ($\lambda$) to be estimated
($\hat{\lambda}$) are
$\hat{\lambda}_{low} = \hat{\lambda}(1 - \frac{1.96}{\sqrt{n}})$ agnd
$\hat{\lambda}_{upp} = \hat{\lambda}(1 + \frac{1.96}{\sqrt{n}})$.
As can be seen from the plot above, for selection of $\hat{\lambda}$ around 5,
the average of the sample mean falls within the confidence interval at least 95% of the time.
Note that the true rate, $\lambda$ is 5.