forked from grunwaldlab/Reproducible-science-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path05--RMarkdown.Rmd
475 lines (335 loc) · 17.9 KB
/
05--RMarkdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
---
title: RMarkdown
---
```{r init, echo=FALSE}
library(knitr)
library(xtable)
library(tools)
source("functions.R")
step <- make_step_counter_function()
rmd_example <- make_markdown_example_function()
gloss <- make_glossary_function()
opts_chunk$set(warning = FALSE,
message = FALSE,
echo = FALSE)
```
```{r java, results='asis', echo = FALSE}
# Add javascript to cause the iframes to be scrolled down all the way when the page loads.
pre_scroll_iframe()
```
While Markdown is useful for simple note taking and writing, `r gloss("RMarkdown", "An extension of the Markdown syntax that allows R code to be displayed and run when the document is rendered")` can be used to mix complex analyses and programming in the R language with all the features of Markdown.
**RMarkdown is an extension of the Markdown syntax that allows R code to be displayed and run when the document is rendered.**
Graphs, tables, file input/output, and bibliographies can all be incorporated using embedded R code.
Typical uses range from adding minor bits of code to time-stamp notes to running complex analyses and making a report of the results.
RMarkdown is thus ideal for lab notebooks and reproducible science as the scripts can be run over and over as they are modified.
Imagine you find one small mistake in a data point after finishing a complicated analysis.
If you used RMarkdown, you simply change the data, rerun the analysis with a single click, and the change will be applied to every graph and result.
If you use excel and/or manually-manipulated graphs, you have to everything again the exact way you did it last time.
In this section of the tutorial, we will be summarizing the most commonly used features of RMarkdown, but there are many aspects that we will not cover.
**Unlike pure Markdown, RMarkdown is very flexible and complex.
Therefore, as you go through the exercises, it is probably best to focus on remembering *what* RMarkdown can do rather than *how* to do it.**
At the end of the section there are resources that you can reference when you need to look up how to do a specific task.
# Mixing R and Markdown
There are two ways to add R code to Markdown: as inline or as code chunks.
The syntax for adding inline code is the same as specifying a monospace font (surrounding text with <code>\`</code>), except that an `r` is added after the first <code>\`</code>.
Inline code syntax is used for short, simple pieces of code that display their results mid-sentence.
Multiple lines of code can be added by putting the lines of code between <code>\`\`\`{r}</code> and <code>\`\`\`</code>, each on their own line.
This is refereed to as a `r gloss("chunk", "One or more consecuative lines of code in a RMarkdown file")`.
Unlike the inline method, chunks of code defined this way have many options controlling how they are processed when a document is rendered; we will cover some of these options soon.
Lets work through a set of demonstrations to learn the basic syntax of RMarkdown:
`r step('Create a new markdown document and delete all the default template content. Paste in the code below, save the file as "rmarkdown-tutorial.Rmd", and press "Knit HTML".')`
```{r intro, results='asis', echo = FALSE}
rmd_example(show_knit_button= TRUE, "
### **R**Markdown
Date: `r Sys.time()`
\`\`\`{r}
my_data <- c(1, 3, 2.4, 6, 7.2)
mean(my_data)
median(my_data)
\`\`\`
")
```
## Code evaluation
In the above example, the code in the chunk was both displayed and executed.
However, it is possible to show code without executing it.
`eval` is one of many chunk options that modify how the code is handled; these options are defined in the curly braces after the `r`.
Setting the chunk option `eval` to `FALSE` causes the code not to be executed.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r code_eval, results='asis', echo = FALSE}
rmd_example("
### Code evaluation
#### Normal evaluation:
\`\`\`{r}
var( c(1, 3, 2.4, 6, 7.2) ) # how to calculate variance in R
\`\`\`
#### Unevaluated code:
\`\`\`{r, eval=FALSE}
var( c(1, 3, 2.4, 6, 7.2) )
\`\`\`
")
```
## Text results
What results of the code are included in the rendered document can also be controlled with chunk options.
There are at least 9 options that influence how code affects the output document, but we will only cover the most commonly used here.
### Displaying code
The `echo` option controls whether the code is displayed or not.
Setting `echo = FALSE` will cause the code to not be shown in the output, but will not affect its execution.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r echo_example, results='asis', echo = FALSE}
rmd_example("
### Text Results
#### Invisible code:
\`\`\`{r, echo=FALSE}
var( c(1, 3, 2.4, 6, 7.2) )
\`\`\`
")
```
### Displaying results
You can hide the typical text output of chunks by setting `results = 'hide'`.
This only affects the printing of variables, not things like warnings, errors, and graphs.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r results_example, results='asis', echo = FALSE}
rmd_example("
#### Invisible results:
\`\`\`{r, results = 'hide'}
x <- 'This code was executed'
print(x)
\`\`\`
\`\`\`{r}
print(x)
\`\`\`
")
```
### Hiding warnings and messages
As R is an interactive language, some functions will occasionally display text that conveys a message to you, but is not necessarily part of your data. These are known as `r gloss("messages", "text displayed to convey extra information to the user such as a function's progress.")`, `r gloss("warnings", "text displayed to convey non-normal behavior that the user might be concerned about.")`, and `r gloss("errors", "text displayed to the user to let them know what went wrong in their code.")`. By default, the options are set to show you all warnings and messages and to stop the execution of your document if an error comes up. You can control these by using the corresponding chunk options.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r other_text_output, results='asis', echo = FALSE}
rmd_example("
#### Messages, warnings, and errors:
The `poppr.amova()` function from *poppr*, will normally display
progress messages of how it treats the data before the analysis. The following
code will only display messages.
\`\`\`{r, results = 'hide', message = FALSE}
library('poppr') # This normally prints a message
data(monpop)
splitStrata(monpop) <- ~Tree/Year/Symptom
poppr.amova(monpop, hier = ~Tree) # Here, we get warnings of Zero distance(s)
\`\`\`
This will display the output and messages, but no warnings.
\`\`\`{r, warning = FALSE}
poppr.amova(monpop, hier = ~Tree) # No warnings, but we get messages
\`\`\`
The code below will not display any warnings or messages. This is the cleanest way
to display things for publication.
\`\`\`{r, warning = FALSE, message = FALSE}
poppr.amova(monpop, hier = ~Tree) # No warnings or messages :)
\`\`\`
")
```
Unlike `message` and `warning`, `error` is `FALSE` by default and any errors that occur stop the rendering of the document.
If `error` is set to `TRUE`, errors will be displayed and not stop the rendering of the document.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r error_example, results='asis', echo = FALSE}
rmd_example("
\`\`\`{r, error = TRUE}
poppr.amova(monpop, hier = Tree) # Forgot the tilde (~)
\`\`\`
")
```
## Tables
Like much in R, there are multiple ways to create tables.
Tables can be created "by hand" using an RMarkdown syntax, or made using a few R functions.
Data can be entered manually in a grid of `-` and `|`, with `:` being used to indicate the alignment of cell contents.
The same format is made automatically with the `kable` function from the `knitr` package using data stored in R variables.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r tables_example, results='asis', echo = FALSE}
rmd_example(height = 400, "
### Tables
#### Manual tables
| x| squared| cubed|
|--:|-------:|-----:|
| 1| 1| 1|
| 2| 4| 8|
| 3| 9| 27|
#### Kable tables
\`\`\`{r}
library(knitr)
data = data.frame(x = 1:3)
data$squared = data$x ^ 2
data$cubed = data$x ^ 3
kable(data)
\`\`\`
")
```
A simple HTML table can also be made using a package called `r gloss("xtable", "An R package to print PDF/HTML tables from data encoded in R variables")`.
However, this package is focused mostly on making tables for PDF output and we will not cover it here.
## Figures
One of the major advantages of mixing R with markdown is access to R's well known graphical prowess.
Graphs can be inserted into RMarkdown documents the same way text results are.
There are two commonly used ways to make graphs in R: the `r gloss("R base graphics", "original graphing capabilities that are part of the core R language")` and the newer `r gloss("ggplot2", "An R package used for graphing as an alternative to the base graphics")` package.
Both are very flexible and each have their adherents.
The R base graphics are known for their flexibility whereas ggplot2 is known for consistent syntax and aesthetics.
Both can be used to create complex and attractive graphics.
We will demonstrate both graphing systems, but will not go into details here.
### Typical usage
By default, any plots made are displayed after the line of code that created them.
The example below uses the R base graphics to plot the distribution of 10 numbers drawn from a normal distribution.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r figure_example, results='asis', echo = FALSE}
rmd_example(height = 500, "
### Adding figures
#### Typical usage
\`\`\`{r}
x = rnorm(10)
print(x)
hist(x)
\`\`\`
")
```
### Hiding figures
Figures can be excluded from the output document by setting the chunk option `fig.show` to `'hide'`
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r hide_figure_example, results='asis', echo = FALSE}
rmd_example("
#### Hiding figures
\`\`\`{r, fig.show='hide'}
x = rnorm(10)
print(x)
hist(x)
\`\`\`
")
```
### Figure sizes
The size of the figures made in a given chunk can be changed using the options `fig.width` and `fig.height`, specified in inches.
The example below uses `ggplot2` to make a graph similar to the previous ones.
`r step('Copy the text from the box below and paste it on to the end of your Markdown document and press "Knit HTML".')`
```{r scale_figure_example, results='asis', echo = FALSE}
rmd_example(height = 500, "
#### Figure sizes
\`\`\`{r, fig.width=2, fig.height=4}
library(ggplot2)
x = data.frame(var = rnorm(100))
ggplot(x) +
geom_histogram(aes(x = var), binwidth = 0.2)
\`\`\`
\`\`\`{r, fig.width=4, fig.height=2}
ggplot(x) +
geom_histogram(aes(x = var), binwidth = 0.2)
\`\`\`
")
```
## YAML headers
As you might have noticed, the top of the template document RStudio provides when creating a new RMarkdown document contains some settings in a distinct format.
This header information of RMarkdown file in a language called `r gloss("YAML", "A plain text computer language used to store heirarchical information")`, used to store information.
In this case, the YAML is used to change settings of the functions that render the Markdown to various formats when you press "Knit".
The YAML header can be used to do things like change the output format to PDF, add a table of contents, and specify themes.
For example, the YAML header below adds section numbers and a table of contents derived from headers (one or more `#` starting a line).
`r step('Copy the text from the box below and paste it _at the start_ of your Markdown document and press "Knit HTML".')`
<pre class='r'><code>---
title: "Example Rmarkdown document"
date: "2016-07-30"
output:
html_document:
toc: true
number_sections: true
---</code></pre>
On of the best uses of the YAML header is to specify the format you want the output document to be when it is rendered.
There are multiple output types, but the most used besides HTML is PDF.
The code below shows how to set the output type to PDF.
`r step('Replace the YAML header _at the start_ of your Markdown document with the one below and press "Knit HTML".')`
<pre class='r'><code>---
title: "Example Rmarkdown document"
date: "2016-07-30"
output:
pdf_document:
toc: true
number_sections: true
---</code></pre>
## Adding a bibliography
There are a few ways of adding citations/bibliographies to your RMarkdown document automatically.
We will be using an external file containing a list of references in a standard format called `r gloss("BibTeX", "A plain text format used to store citation/bibliography information")`.
Like RMarkdown, this is simply a plain text format viewable in any text editor.
Google scholar provides convenient links for references in BibTeX format that can easily be copied and pasted into file containing the references to be cited.
Text files of any format can be made in RStudio by clicking on the "New file" drop-down menu and choosing "Text file".
`r step('Click on the "New file" dropdown menu and choose "Text file". Paste the text below into the file and save it as "example_bibliography.bibtex".')`
```{r make_bib, include = FALSE}
current_file_name <- knitr:::knit_concord$get("infile")
bib_path <- "example_bibliography.bibtex"
bib_content <-
"@article{baumer2014r,
title={R Markdown: Integrating a reproducible analysis tool into introductory statistics},
author={Baumer, Ben and Cetinkaya-Rundel, Mine and Bray, Andrew and Loi, Linda and Horton, Nicholas J},
journal={arXiv preprint arXiv:1402.1894},
year={2014}
}
@article{racine2012rstudio,
title={RStudio: A Platform-Independent IDE for R and Sweave},
author={Racine, Jeffrey S},
journal={Journal of Applied Econometrics},
volume={27},
number={1},
pages={167--172},
year={2012},
publisher={Wiley Online Library}
}"
cat(bib_content, file = bib_path)
```
```{r print_bib, results='asis', echo = FALSE}
cat(paste0("<pre class='r'><code>",
bib_content,
"</code></pre>"))
```
Now the information for a few papers are stored in a file format both computers and humans can read.
Next, we will associate the bibliography file with the RMarkdown file and add a few citations.
This is done using a setting in the YAML header.
There is an ID for each article stored in the BibTeX file located in the lines that start with `@article{`.
Inline citations can be inserted by adding `@` followed by the ID of the paper.
Surrounding the `@ID` with `[` and `]` changes the format of the citation slightly.
Any papers cited will appear in a bibliography added to the bottom of the rendered document.
`r step('Create a new markdown document and delete all the default template content. Paste in the code below, save the file as "rmarkdown-bibliography.Rmd", and press "Knit HTML".')`
```{r bib_example, results='asis', echo = FALSE}
rmd_example(height = 200, "
---
output:
html_document:
toc: true
number_sections: true
bibliography: example_bibliography.bibtex
---
### Adding a bibliography
RMarkdown [@baumer2014r] is referenced in the @racine2012rstudio paper.
### References
")
```
```{r delete_bib, include=FALSE}
file.remove(bib_path)
```
# Additional resources
RMarkdown is much more complex than pure Markdown due to all the subtle ways code can be handled and the results displayed.
Most people will need to constantly reference documentation and "cheat sheets" to remind themselves of ways to do occasionally necessary tasks.
It is best to commit to memory only the most commonly used tools and know where to look up the rest.
Note that the rendering of RMarkdown is implemented using an R package called `r gloss("knitr", "An R package used to render RMarkdown, as well as other mixes of programming and markup languages")` so you might see references to things like "knitr options".
This is also the reason the button you press to render RMarkdown into an output document is labeled "Knit HTML".
Albert Einstein once said "Never memorize something that you can look up."
In that spirit, here are some great places to look things up:
- [The RMarkdown website](http://rmarkdown.rstudio.com/): This is the official website for RMarkdown. It contains examples of most of the features of RMarkdown.
- [The Knitr website](http://yihui.name/knitr/): Information on the primary tool used by RMarkdown to render documents. This is official source of information on chunk options.
- [An RMarkdown cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf): This summarizes the main tools of RMarkdown in just two pages.
# Try it! {#exercise}
---------------
`r step('Replicate the rest of the AUDPC analysis.')`
In a new RMarkdown document called `audpc_report.Rmd`, try to recreate
`audpc_report_to_replicate.html` using the information in
`original_audpc_report.docx` and `original_audpc_code.R`.
These files are in the repository you downloaded earlier (https://github.com/grunwaldlab/audpc_example).
You can start by reusing the markdown from the `README.md` file you created in the exercise for the previous section ("Markdown").
`r step('Replicate genetic analysis')`
Much like the AUDPC example, we have placed an example of a reproducible
analysis of the population genetic structure of *Phytophthora infestans* at
https://github.com/grunwaldlab/pinfestans_example and wrote it up in a docx
document with the R script used to produce the figures and tables. Download this
repository and attempt to recreate the word document using RMarkdown.
# Glossary
```{r glossary, results = 'asis'}
gloss(display = TRUE)
```