This repository was archived by the owner on Mar 20, 2018. It is now read-only.
forked from swcarpentry/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 19
/
Copy path10-functions.Rmd
535 lines (453 loc) · 16.4 KB
/
10-functions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
---
title: Functions Explained
teaching: 45
exercises: 15
questions:
- "How can I write a new function in R?"
objectives:
- "Define a function that takes arguments."
- "Return a value from a function."
- "Check argument conditions with `stopifnot()` in functions."
- "Test a function."
- "Set default values for function arguments."
- "Explain why we should divide programs into small, single-purpose functions."
keypoints:
- "Use `function` to define a new function in R."
- "Use parameters to pass values into functions."
- "Use `stopifnot()` to flexibly check function arguments in R."
- "Load functions into programs using `source()`."
source: Rmd
---
```{r, include=FALSE}
source("../bin/chunk-options.R")
knitr_fig_path("10-")
# Silently load in the data so the rest of the lesson works
gapminder <- read.csv("data/gapminder-FiveYearData.csv", header=TRUE)
```
If we only had one data set to analyze, it would probably be faster to load the
file into a spreadsheet and use that to plot simple statistics. However, the
gapminder data is updated periodically, and we may want to pull in that new
information later and re-run our analysis again. We may also obtain similar data
from a different source in the future.
In this lesson, we'll learn how to write a function so that we can repeat
several operations with a single command.
> ## What is a function?
>
> Functions gather a sequence of operations into a whole, preserving it for
> ongoing use. Functions provide:
>
> * a name we can remember and invoke it by
> * relief from the need to remember the individual operations
> * a defined set of inputs and expected outputs
> * rich connections to the larger programming environment
>
> As the basic building block of most programming languages, user-defined
> functions constitute "programming" as much as any single abstraction can. If
> you have written a function, you are a computer programmer.
{: .callout}
## Defining a function
Let's open a new R script file in the `functions/` directory and call it
functions-lesson.R.
```{r}
my_sum <- function(a, b) {
the_sum <- a + b
return(the_sum)
}
```
Let's define a function `fahr_to_kelvin()` that converts temperatures from
Fahrenheit to Kelvin:
```{r}
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
```
We define `fahr_to_kelvin()` by assigning it to the output of `function`. The
list of argument names are contained within parentheses. Next, the
[body]({{ page.root }}/reference/#function-body) of the function--the
statements that are executed when it runs--is contained within curly braces
(`{}`). The statements in the body are indented by two spaces. This makes the
code easier to read but does not affect how the code operates.
When we call the function, the values we pass to it as arguments are assigned to
those variables so that we can use them inside the function. Inside the
function, we use a [return
statement]({{ page.root }}/reference/#return-statement) to send a result back to
whoever asked for it.
> ## Tip
>
> One feature unique to R is that the return statement is not required.
> R automatically returns whichever variable is on the last line of the body
> of the function. But for clarity, we will explicitly define the
> return statement.
{: .callout}
Let's try running our function.
Calling our own function is no different from calling any other function:
```{r}
# freezing point of water
fahr_to_kelvin(32)
```
```{r}
# boiling point of water
fahr_to_kelvin(212)
```
> ## Challenge 1
>
> Write a function called `kelvin_to_celsius()` that takes a temperature in
> Kelvin and returns that temperature in Celsius.
>
> Hint: To convert from Kelvin to Celsius you subtract 273.15
>
> > ## Solution to challenge 1
> >
> > Write a function called `kelvin_to_celsius` that takes a temperature in Kelvin
> > and returns that temperature in Celsius
> >
> > ```{r}
> > kelvin_to_celsius <- function(temp) {
> > celsius <- temp - 273.15
> > return(celsius)
> > }
> > ```
> {: .solution}
{: .challenge}
## Combining functions
The real power of functions comes from mixing, matching and combining them
into ever-larger chunks to get the effect we want.
Let's define two functions that will convert temperature from Fahrenheit to
Kelvin, and Kelvin to Celsius:
```{r}
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
```
> ## Challenge 2
>
> Define the function to convert directly from Fahrenheit to Celsius,
> by reusing the two functions above (or using your own functions if you
> prefer).
>
>
> > ## Solution to challenge 2
> >
> > Define the function to convert directly from Fahrenheit to Celsius,
> > by reusing these two functions above
> >
> >
> > ```{r}
> > fahr_to_celsius <- function(temp) {
> > temp_k <- fahr_to_kelvin(temp)
> > result <- kelvin_to_celsius(temp_k)
> > return(result)
> > }
> > ```
> {: .solution}
{: .challenge}
## Interlude: Defensive Programming
Now that we've begun to appreciate how writing functions provides an efficient
way to make R code re-usable and modular, we should note that it is important
to ensure that functions only work in their intended use-cases. Checking
function parameters is related to the concept of _defensive programming_.
Defensive programming encourages us to frequently check conditions and throw an
error if something is wrong. These checks are referred to as assertion
statements because we want to assert some condition is `TRUE` before proceeding.
They make it easier to debug because they give us a better idea of where the
errors originate.
### Checking conditions with `stopifnot()`
Let's start by re-examining `fahr_to_kelvin()`, our function for converting
temperatures from Fahrenheit to Kelvin. It was defined like so:
```{r}
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
```
For this function to work as intended, the argument `temp` must be a `numeric`
value; otherwise, the mathematical procedure for converting between the two
temperature scales will not work. To create an error, we can use the function
`stop()`. For example, since the argument `temp` must be a `numeric` vector, we
could check for this condition with an `if` statement and throw an error if the
condition was violated. We could augment our function above like so:
```{r}
fahr_to_kelvin <- function(temp) {
if (!is.numeric(temp)) {
stop("temp must be a numeric vector.")
}
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
```
If we had multiple conditions or arguments to check, it would take many lines
of code to check all of them. Luckily R provides the convenience function
`stopifnot()`. We can list as many requirements that should evaluate to `TRUE`;
`stopifnot()` throws an error if it finds one that is `FALSE`. Listing these
conditions also serves a secondary purpose as extra documentation for the
function.
Let's try out defensive programming with `stopifnot()` by adding assertions to
check the input to our function `fahr_to_kelvin()`.
We want to assert the following: `temp` is a numeric vector. We may do that like
so:
```{r}
fahr_to_kelvin <- function(temp) {
stopifnot(is.numeric(temp))
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
```
It still works when given proper input.
```{r}
# freezing point of water
fahr_to_kelvin(temp = 32)
```
But fails instantly if given improper input.
```{r}
# Metric is a factor instead of numeric
fahr_to_kelvin(temp = as.factor(32))
```
> ## Challenge 3
>
> Use defensive programming to ensure that our `fahr_to_celsius()` function
> throws an error immediately if the argument `temp` is specified
> inappropriately.
>
>
> > ## Solution to challenge 3
> >
> > Extend our previous definition of the function by adding in an explicit call
> > to `stopifnot()`. Since `fahr_to_celsius()` is a composition of two other
> > functions, checking inside here makes adding checks to the two component
> > functions redundant.
> >
> >
> > ```{r}
> > fahr_to_celsius <- function(temp) {
> > stopifnot(!is.numeric(temp))
> > temp_k <- fahr_to_kelvin(temp)
> > result <- kelvin_to_celsius(temp_k)
> > return(result)
> > }
> > ```
> {: .solution}
{: .challenge}
## More on combining functions
Now, we're going to define a function that calculates the Gross Domestic Product
of a nation from the data available in our dataset:
```{r}
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat) {
gdp <- dat$pop * dat$gdpPercap
return(gdp)
}
```
We define `calcGDP()` by assigning it to the output of `function`. The list of
argument names are contained within parentheses. Next, the body of the function
-- the statements executed when you call the function -- is contained within
curly braces (`{}`).
We've indented the statements in the body by two spaces. This makes the code
easier to read but does not affect how it operates.
When we call the function, the values we pass to it are assigned to the
arguments, which become variables inside the body of the function.
Inside the function, we use the `return()` function to send back the result.
This `return()` function is optional: R will automatically return the results of
whatever command is executed on the last line of the function.
```{r}
calcGDP(head(gapminder))
```
That's not very informative. Let's add some more arguments so we can extract
that per year and country.
```{r}
# Takes a dataset and multiplies the population column
# with the GDP per capita column.
calcGDP <- function(dat, year=NULL, country=NULL) {
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
```
If you've been writing these functions down into a separate R script
(a good idea!), you can load in the functions into our R session by using the
`source()` function:
```{r, eval=FALSE}
source("functions/functions-lesson.R")
```
Ok, so there's a lot going on in this function now. In plain English, the
function now subsets the provided data by year if the year argument isn't empty,
then subsets the result by country if the country argument isn't empty. Then it
calculates the GDP for whatever subset emerges from the previous two steps. The
function then adds the GDP as a new column to the subsetted data and returns
this as the final result. You can see that the output is much more informative
than a vector of numbers.
Let's take a look at what happens when we specify the year:
```{r}
head(calcGDP(gapminder, year=2007))
```
Or for a specific country:
```{r}
calcGDP(gapminder, country="Australia")
```
Or both:
```{r}
calcGDP(gapminder, year=2007, country="Australia")
```
Let's walk through the body of the function:
``` {r, eval=FALSE}
calcGDP <- function(dat, year=NULL, country=NULL) {
```
Here we've added two arguments, `year`, and `country`. We've set
*default arguments* for both as `NULL` using the `=` operator
in the function definition. This means that those arguments will
take on those values unless the user specifies otherwise.
```{r, eval=FALSE}
if(!is.null(year)) {
dat <- dat[dat$year %in% year, ]
}
if (!is.null(country)) {
dat <- dat[dat$country %in% country,]
}
```
Here, we check whether each additional argument is set to `null`, and whenever
they're not `null` overwrite the dataset stored in `dat` with a subset given by
the non-`null` argument.
I did this so that our function is more flexible for later. We can ask it to
calculate the GDP for:
* The whole dataset;
* A single year;
* A single country;
* A single combination of year and country.
By using `%in%` instead, we can also give multiple years or countries to those
arguments.
> ## Tip: Pass by value
>
> Functions in R almost always make copies of the data to operate on
> inside of a function body. When we modify `dat` inside the function
> we are modifying the copy of the gapminder dataset stored in `dat`,
> not the original variable we gave as the first argument.
>
> This is called "pass-by-value" and it makes writing code much safer:
> you can always be sure that whatever changes you make within the
> body of the function, stay inside the body of the function.
{: .callout}
> ## Tip: Function scope
>
> Another important concept is scoping: any variables (or functions!) you
> create or modify inside the body of a function only exist for the lifetime
> of the function's execution. When we call `calcGDP()`, the variables `dat`,
> `gdp` and `new` only exist inside the body of the function. Even if we
> have variables of the same name in our interactive R session, they are
> not modified in any way when executing a function.
{: .callout}
```{r, eval=FALSE}
gdp <- dat$pop * dat$gdpPercap
new <- cbind(dat, gdp=gdp)
return(new)
}
```
Finally, we calculated the GDP on our new subset, and created a new data frame
with that column added. This means when we call the function later we can see
the context for the returned GDP values, which is much better than in our first
attempt where we got a vector of numbers.
> ## Challenge 3
>
> Test out your GDP function by calculating the GDP for New Zealand in 1987. How
> does this differ from New Zealand's GDP in 1952?
>
> > ## Solution to challenge 3
> >
> > ```{r, eval=FALSE}
> > calcGDP(gapminder, year = c(1952, 1987), country = "New Zealand")
> > ```
> > GDP for New Zealand in 1987: 65050008703
> >
> > GDP for New Zealand in 1952: 21058193787
> {: .solution}
{: .challenge}
> ## Challenge 4
>
> The `paste()` function can be used to combine text together, e.g:
>
> ```{r}
> best_practice <- c("Write", "programs", "for", "people", "not", "computers")
> paste(best_practice, collapse=" ")
> ```
>
> Write a function called `fence()` that takes two vectors as arguments, called
> `text` and `wrapper`, and prints out the text wrapped with the `wrapper`:
>
> ```{r, eval=FALSE}
> fence(text=best_practice, wrapper="***")
> ```
>
> *Note:* the `paste()` function has an argument called `sep`, which specifies
> the separator between text. The default is a space: " ". The default for
> `paste0()` is no space "".
>
> > ## Solution to challenge 4
> >
> > Write a function called `fence()` that takes two vectors as arguments,
> > called `text` and `wrapper`, and prints out the text wrapped with the
> > `wrapper`:
> >
> > ```{r}
> > fence <- function(text, wrapper){
> > text <- c(wrapper, text, wrapper)
> > result <- paste(text, collapse = " ")
> > return(result)
> > }
> > best_practice <- c("Write", "programs", "for", "people", "not", "computers")
> > fence(text=best_practice, wrapper="***")
> > ```
> {: .solution}
{: .challenge}
> ## Tip
>
> R has some unique aspects that can be exploited when performing more
> complicated operations. We will not be writing anything that requires
> knowledge of these more advanced concepts. In the future when you are
> comfortable writing functions in R, you can learn more by reading the
> [R Language Manual][man] or this [chapter][] from
> [Advanced R Programming][adv-r] by Hadley Wickham.
{: .callout}
[man]: http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Environment-objects
[chapter]: http://adv-r.had.co.nz/Environments.html
[adv-r]: http://adv-r.had.co.nz/
> ## Tip: Testing and documenting
>
> It's important to both test functions and document them:
> Documentation helps you, and others, understand what the
> purpose of your function is, and how to use it, and its
> important to make sure that your function actually does
> what you think.
>
> When you first start out, your workflow will probably look a lot
> like this:
>
> 1. Write a function
> 2. Comment parts of the function to document its behaviour
> 3. Load in the source file
> 4. Experiment with it in the console to make sure it behaves
> as you expect
> 5. Make any necessary bug fixes
> 6. Rinse and repeat.
>
> Formal documentation for functions, written in separate `.Rd`
> files, gets turned into the documentation you see in help
> files. The [roxygen2][] package allows R coders to write documentation
> alongside the function code and then process it into the appropriate `.Rd`
> files. You will want to switch to this more formal method of writing
> documentation when you start writing more complicated R projects.
>
> Formal automated tests can be written using the [testthat][] package.
{: .callout}
[roxygen2]: http://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html
[testthat]: http://r-pkgs.had.co.nz/tests.html