-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathindex.Rmd
330 lines (233 loc) · 16.8 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
---
title: "Statistical Programming with R 2021"
output:
flexdashboard::flex_dashboard:
orientation: columns
vertical_layout: scroll
---
# Intro {.sidebar}
This dashboard covers the course materials for the course [**USS24: Statistical programming with R**](https://www.utrechtsummerschool.nl/courses/social-sciences/data-science-statistical-programming-with-r).
<br> <br>
We usually adapt the course as we go, so we suggest to access the materials online when we consider them.
---
Course leader: [Gerko Vink](https://www.gerkovink.com) <br>
Instructors:
- [Erik-Jan van Kesteren](https://erikjanvankesteren.nl)
- [Maarten Cruyff](https://www.uu.nl/medewerkers/mcruyff)
- [Ayoub Bagheri](https://bagheria.github.io)
- [Thom Volker](https://thomvolker.github.io)
Study load: 1.5 ECTS <br>
Location of Start: [Koningsberger building - Pangea hall](https://www.uu.nl/en/victor-j-koningsberger-building)
---
# Quick Overview
## Column 1
### Outline
R is rapidly becoming the standard platform for data manipulation, visualization and analysis and has a number of advantages over other statistical software packages. A wide community of users contribute to R, resulting in an enormous coverage of statistical procedures, including many that are not available in any other statistical program. Furthermore, it is highly flexible for programming and scripting purposes, for example when manipulating data or creating professional plots. However, R lacks standard GUI menus, as in SPSS for example, from which to choose what statistical test to perform or which graph to create. As a consequence, R is more challenging to master. Therefore, this course offers an elaborate introduction into statistical programming in R. Students learn to operate R, make plots, fit, assess and interpret a variety of basic statistical models and do advanced statistical programming and data manipulation. The topics in this course include regression models for linear, dichotomous, ordinal and multivariate data, statistical inference, statistical learning, bootstrapping and Monte Carlo simulation techniques.
The course deals with the following topics:
1. An introduction to the R environment.
2. Basic to advanced programming skills: data generation, manipulation, pipelines, summaries and plotting.
3. Fitting statistical models: estimation, prediction and testing.
4. Drawing statistical inference from data.
5. Basic statistical learning techniques.
6. Bootstrapping and Monte Carlo simulation.
The course starts at a very basic level and builds up gradually. At the end of the week, participants will master advanced programming skills with R. No previous experience with R is required.
Prerequisites:
Participants are requested to bring their own laptop for lab meetings.
### Certificate
Participants will receive a certificate at the end of the course.
## Column 2
### Daily schedule
| When? | | What? |
|:--------|:-------|:-------------|
| 09.00 | 09.30 | Lecture |
| 09:30 | 10.15 | Practical |
| 10.15 | 10.45 | Discussion |
| | **break** | |
| 11.00 | 11.45 | Lecture |
| 11:45 | 12.30 | Practical |
| | [Lunch](https://www.uu.nl/en/educatorium) | |
| 14:00 | 14.30 | Discussion |
| 14:30 | 15:30 | Lecture |
| | **break** | |
| 15:45 | 16.30 | Practical |
| 16:30 | 17:00 | Discussion |
### Lecture Hall locations
| When? | Where |
|:--------|:-------|
| Tuesday | [Ruppert Blauw](https://goo.gl/maps/jG6Vzv5ugfCXvR8x8) |
| Wednesday| [Koningsberger Cosmos](https://goo.gl/maps/ufn6aXVL9K48Gn7S6) |
| Thursday | [Ruppert 040](https://goo.gl/maps/jG6Vzv5ugfCXvR8x8) |
| Friday | [Koningsberger Cosmos](https://goo.gl/maps/ufn6aXVL9K48Gn7S6) |
# How to prepare
## Column 1
### Preparing your machine for the course
Dear all,
This summer you will participate in the [**USS24: Statistical programming with R**](https://www.utrechtsummerschool.nl/courses/social-sciences/data-science-statistical-programming-with-r) course in Utrecht, the Netherlands. To realize a steeper learning curve, we will use some functionality that is not part of the base installation for `R`. The below steps guide you through installing both `R` as well as the necessary additions.
**If you follow this course online; please have a look at [this instructional page on MS Teams](https://www.uu.nl/en/education/quality-and-innovation/remote-teaching/microsoft-teams)**
We look forward to see you all in Utrecht and online,
*The Statistical Programming with R Team*
### **System requirements**
Bring a laptop computer to the course and make sure that you have full write access and administrator rights to the machine. We will explore programming and compiling in this course. This means that you need full access to your machine. Some corporate laptops come with limited access for their users, we therefore advice you to bring a personal laptop computer, if you have one.
### **1. Install the latest version of `R`**
`R` can be obtained [here](https://cran.r-project.org). We won't use `R` directly in the course, but rather call `R` through `RStudio`. Therefore it needs to be installed.
### **2. Install the latest `RStudio` Desktop**
Rstudio is an Integrated Development Environment (IDE). It can be obtained as stand-alone software [here](https://www.rstudio.com/products/rstudio/download/#download). The free and open source `RStudio Desktop` version is sufficient.
### **3. Start RStudio and install the following packages. **
Execute the following lines of code in the console window:
```{r eval=FALSE, echo = TRUE}
install.packages(c("ggplot2", "tidyverse", "magrittr", "knitr", "rmarkdown",
"plotly", "ggplot2", "shiny", "devtools", "boot", "class",
"car", "MASS", "ggplot2movies", "ISLR", "DAAG", "mice",
"purrr", "furrr", "future"), dependencies = TRUE)
```
If you are not sure where to execute code, use the following figure to identify the console - ignore the outdated version in the example:
<center>
<img src="console.png" alt="HTML5 Icon" width = 70%>
</center>
Just copy and paste the installation command and press the return key. When asked
```{r eval = FALSE, echo = TRUE}
Do you want to install from sources the package which needs
compilation? (Yes/no/cancel)
```
type `Yes` in the console and press the return key.
## Column 2
### **What if the steps to the left do not work for me?**
If all fails and you have insufficient rights to your machine, the following web-based service will offer a solution.
1. You will receive an account to Utrecht University's [MyWorkPlace](https://myworkplace.uu.nl/). You would have access to `R` and `RStudio` there. You may need to install packages for new sessions during the course.
2. Open a free account on [rstudio.cloud](https://rstudio.cloud). You can run your own cloud-based `RStudio` environment there.
Naturally, you will need internet access for these services to be accessed. Wireless internet access will be available at the course location.
# Monday
## Column 1
### Monday's materials
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advice all course participants to access the materials online.
- Part A: Introduction
- [Lecture A](Contents/Material/Part A - Introduction/Lecture_A.html)
- [Lecture A Handout](Contents/Material/Part A - Introduction/Lecture_A_handout.html)
- [Practical A](Contents/Material/Part A - Introduction/Practical_A_walkthrough.html)
- [`notebook.R`](Contents/Material/Part A - Introduction/notebook.R)
- [`markdown.Rmd`](Contents/Material/Part A - Introduction/markdown.Rmd)
- [Install R on UU laptop](https://gist.github.com/vankesteren/f2e198aa5ab4f6262b21a3d13bbea0b5)
- Part B: How is `R` organized?
- [Lecture B](Contents/Material/Part B - How is R organized/Lecture_B.html)
- [Lecture B Handout](Contents/Material/Part B - How is R organized/Lecture_B_handout.html)
- [Practical B](Contents/Material/Part B - How is R organized/Practical_B_walkthrough.html)
- [Impractical B](Contents/Material/Part B - How is R organized/Practical_B.html)
- [`boys.RData`](Contents/Material/Part B - How is R organized/boys.RData)
- Part C: `R` Functionality
- [Lecture C](Contents/Material/Part C - R-functionality/Lecture_C.html)
- [Lecture C Handout](Contents/Material/Part C - R-functionality/Lecture_C_handout.html)
- [Practical C](Contents/Material/Part C - R-functionality/Practical_C_walkthrough.html)
All lectures are in `html` format. Practicals are walkthrough files that guide you through the exercises. `Impractical` files contain the exercises, without walkthrough, explanations and solutions.
## Column 2
### Useful references
- [The tidyverse style guide](https://style.tidyverse.org)
- [The Google R style guide](https://google.github.io/styleguide/Rguide.xml)
The above links are useful references that connect to today's materials.
### About `rmarkdown`
<center>
<iframe src="https://player.vimeo.com/video/178485416?color=428bca&title=0&byline=0&portrait=0" width="500" height="315" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>
<p><a href="https://vimeo.com/178485416">What is R Markdown?</a> from <a href="https://vimeo.com/rstudioinc">RStudio, Inc.</a> on <a href="https://vimeo.com">Vimeo</a>.</p>
</center>
See also [this `rmarkdown` cheat sheet](Contents/Material/Part A - Introduction/rmarkdown-reference_sheet.pdf).
# Tuesday
## Column 1
### Tuesday's materials
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advice all course participants to access the materials online.
- Part D: Generating Data
- [Lecture D](Contents/Material/Part D - Data generation/Lecture_D.html)
- [Practical D](Contents/Material/Part D - Data generation/Practical_D_walkthrough.html)
- [Impractical D](Contents/Material/Part D - Data generation/Practical_D.html)
- Part E: Custom functions
- [Lecture E](Contents/Material/Part E - Functions apply and looping/Lecture_E.html)
- [Practical E](Contents/Material/Part E - Functions apply and looping/Practical_E_walkthrough.html)
- [Impractical E](Contents/Material/Part E - Functions apply and looping/Practical_E.html)
- Part F: Pipes
- [Lecture F](Contents/Material/Part F - Pipes/Lecture_F.html)
- [Practical F](Contents/Material/Part F - Pipes/Practical_F_walkthrough.html)
- [Impractical F](Contents/Material/Part F - Pipes/Practical_F.html)
All lectures are in `html` format. Practicals are walkthrough files that guide you through the exercises. `Impractical` files contain the exercises, without walkthrough, explanations and solutions.
## Column 2
### Useful References
- [`magrittr`](https://magrittr.tidyverse.org)
- [`R` for Data Science](http://r4ds.had.co.nz) - [Chapter 18 on pipes](http://r4ds.had.co.nz/pipes.html)
The above links are useful references that connect to today's materials.
# Wednesday
## Column 1
### Wednesday's materials
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advice all course participants to access the materials online.
- Part G: Statistical models
- [Lecture G](Contents/Material/Part G - Statistical models/Lecture_G.html)
- [Practical G](Contents/Material/Part G - Statistical models/PracticalG_walkthrough.html)
- [Impractical G](Contents/Material/Part G - Statistical models/PracticalG.html)
- Part H: Statistical Inference
- [Lecture H](Contents/Material/Part H - Statistical inference/Lecture_H.html)
- [Practical H](Contents/Material/Part H - Statistical inference/PracticalH_walkthrough.html)
- [Impractical H](Contents/Material/Part H - Statistical inference/PracticalH.html)
- [A plot that demonstrates the (mis)use of Confidence Intervals](https://www.gerkovink.com/markup/Archive/2019/Wk4/Solution_to_Ex5.html)
- Part I: Data visualization
- [Lecture I](Contents/Material/Part I - Data visualization/Lecture_i.html)
- [Practical I](Contents/Material/Part I - Data visualization/Practical_i_walkthrough.html)
- [Impractical I](Contents/Material/Part I - Data visualization/Practical_i.html).
All lectures are in `html` format. Practicals are walkthrough files that guide you through the exercises. `Impractical` files contain the exercises, without walkthrough, explanations and solutions.
## Column 2
### Useful links
- [The `ggplot2` reference page](https://ggplot2.tidyverse.org/reference/)
- [A great reference on contrasts with linear modeling](https://rstudio-pubs-static.s3.amazonaws.com/65059_586f394d8eb84f84b1baaf56ffb6b47f.html)
- These papers are a nice reference for editors:
- [A comment in Nature from 800+ signatories](https://www.nature.com/articles/d41586-019-00857-9)
- [Naomi Altman and Martin Krzywinski in Nature Methods](https://www.nature.com/articles/nmeth.4120)
- [Statement by the Americal Statistical Association](https://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108)
### Releveling
```{r echo = TRUE, message=FALSE}
library(mice)
library(magrittr)
boys %$% lm(age ~ reg) %>% coef()
boys$reg <- relevel(boys$reg, ref = "south")
boys %$% lm(age ~ reg) %>% coef()
```
# Thursday
## Column 1
### Thursday's materials
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advice all course participants to access the materials online.
- Part J: Model estimation
- [Lecture JK](Contents/Material/Part JK - Model estimation/Lecture_J.html)
- [Practical JK](Contents/Material/Part JK - Model estimation/Practical_J_walkthrough.html)
- [Impractical JK](Contents/Material/Part JK - Model estimation/Practical_J.html)
- Part K: Model estimation
- [Lecture JK](Contents/Material/Part JK - Model estimation/Lecture_J.html)
- [Practical JK](Contents/Material/Part JK - Model estimation/Practical_J_walkthrough.html)
- [Impractical JK](Contents/Material/Part JK - Model estimation/Practical_J.html)
- Part L: GLM's
- [Lecture L](Contents/Material/Part L - GLMs/Lecture_L.html)
- [The other, much longer, Lecture L](Contents/Material/Part L - GLMs/ch8_ioslides.html)
- [Practical L](Contents/Material/Part L - GLMs/Practical_L_walkthrough.html)
- [Impractical L](Contents/Material/Part L - GLMs/Practical_L.html)
All lectures are in `html` format. Practicals are walkthrough files that guide you through the exercises. `Impractical` files contain the exercises, without walkthrough, explanations and solutions.
# Friday
## Column 1
### Friday's materials
We adapt the course as we go. To ensure that you work with the latest iteration of the course materials, we advice all course participants to access the materials online.
- Part M: Classification & Clustering
- [Lecture M](Contents/Material/Part M - Unsupervised learning/Lecture_M.html)
- [Practical M](Contents/Material/Part M - Unsupervised learning/Practical_M_walkthrough.html)
- [Impractical M](Contents/Material/Part M - Unsupervised learning/Practical_M.html)
- Part N: Bootstrapping
- [Lecture N](Contents/Material/Part N - Bootstrapping/Lecture_N.html)
- [Practical N](Contents/Material/Part N - Bootstrapping/Practical_N_walkthrough.html)
- [Impractical N](Contents/Material/Part N - Bootstrapping/Practical_N.html)
- Part O: Monte Carlo simulation
- [Lecture O](Contents/Material/Part O - Monte carlo simulation/Lecture_O.html)
- [Practical O](Contents/Material/Part O - Monte carlo simulation/Practical_O_walkthrough.html)
- [Impractical O](Contents/Material/Part O - Monte carlo simulation/Practical_O.html)
All lectures are in `html` format. Practicals are walkthrough files that guide you through the exercises. `Impractical` files contain the exercises, without walkthrough, explanations and solutions.
# To continue
## Column 1
### What to do after the course
- [R for Data Science](https://r4ds.had.co.nz): a wonderful book that details a usefull toolset for current and aspiring data scientists.
- [Introduction to Statistical Learning](https://www.statlearning.com): an introductory book on statistical learning, with applications in R.
- [Data Analysis and Graphics Using R](https://maths-people.anu.edu.au/~johnm/r-book/daagur3.html): a detailed book that covers a lot about categorical data analysis and fitting `glm`s in `R`.
The above references are (currently) available for free in these links. I deem them very useful and I would highly recommend them.
## Column 2
### For fun
<blockquote class="twitter-tweet" data-lang="en"><p lang="en" dir="ltr">My wife asked me what machine learning is and I said: remember when we ordered the hot plate for the boat and amazon suggested buying all the equipment needed to make a full meth lab?</p>— (((Kane Baccigalupi))) (@rubyghetto) <a href="https://twitter.com/rubyghetto/status/1058220004301127680?ref_src=twsrc%5Etfw">November 2, 2018</a></blockquote>
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>