-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy pathREADME.Rmd
414 lines (305 loc) · 11.5 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
```{r knitsetup, echo=FALSE, results='hide', warning=FALSE, message=FALSE, cache=FALSE}
library(knitr)
opts_knit$set(base.dir='./', fig.path='', out.format='md')
opts_chunk$set(prompt=FALSE, comment='', results='markup')
library(pipeR)
```
# pipeR
[![Linux Build Status](https://travis-ci.org/renkun-ken/pipeR.png?branch=master)](https://travis-ci.org/renkun-ken/pipeR)
[![Windows Build status](https://ci.appveyor.com/api/projects/status/github/renkun-ken/pipeR?svg=true)](https://ci.appveyor.com/project/renkun-ken/pipeR)
[![codecov.io](http://codecov.io/github/renkun-ken/pipeR/coverage.svg?branch=master)](http://codecov.io/github/renkun-ken/pipeR?branch=master)
[![CRAN Version](http://www.r-pkg.org/badges/version/pipeR)](https://cran.r-project.org/package=pipeR)
pipeR provides various styles of function chaining methods:
* Pipe operator
* Pipe object
* pipeline function
Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression
* As the first unnamed argument of the function
* As dot symbol (`.`) in the expression
* As a named variable defined by a formula
* For side-effect that carries over the input to the next
* For assignment that saves an intermediate value
The syntax is designed to make the pipeline more readable and friendly to
a wide variety of operations.
**[pipeR Tutorial](http://renkun.me/pipeR-tutorial) is a highly recommended complete guide to pipeR.**
This document is also translated into [日本語](https://github.com/renkun-ken/pipeR/blob/master/README.ja.md) (by [@hoxo_m](https://github.com/hoxo-m/)).
## Installation
Install the latest development version from GitHub:
```r
devtools::install_github("renkun-ken/pipeR")
```
Install from [CRAN](https://cran.r-project.org/package=pipeR):
```r
install.packages("pipeR")
```
## Getting started
The following code is an example written in traditional approach:
It basically performs bootstrap on `mpg` values in built-in dataset `mtcars` and plots its density function estimated by Gaussian kernel.
```r
plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE),
kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")
```
The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, `Pipe()` function and `pipeline()` function, respectively.
* Operator-based pipeline
```r
mtcars$mpg %>>%
sample(size = 10000, replace = TRUE) %>>%
density(kernel = "gaussian") %>>%
plot(col = "red", main = "density of mpg (bootstrap)")
```
* Object-based pipeline (`Pipe()`)
```r
Pipe(mtcars$mpg)$
sample(size = 10000, replace = TRUE)$
density(kernel = "gaussian")$
plot(col = "red", main = "density of mpg (bootstrap)")
```
* Argument-based pipeline
```r
pipeline(mtcars$mpg,
sample(size = 10000, replace = TRUE),
density(kernel = "gaussian"),
plot(col = "red", main = "density of mpg (bootstrap)"))
```
* Expression-based pipeline
```r
pipeline({
mtcars$mpg
sample(size = 10000, replace = TRUE)
density(kernel = "gaussian")
plot(col = "red", main = "density of mpg (bootstrap)")
})
```
## Usage
### `%>>%`
Pipe operator `%>>%` basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.
#### Pipe to first-argument of function
Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.
On the right-hand side of `%>>%`, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.
```r
rnorm(100) %>>%
plot
```
```r
rnorm(100) %>>%
plot(col="red")
```
Sometimes the value on the left is needed at multiple places. One can use `.` to represent it anywhere in the function call.
```r
rnorm(100) %>>%
plot(col="red", main=length(.))
```
There are situations where one calls a function in a namespace with `::`. In this case, the call must end up with `()`.
```r
rnorm(100) %>>%
stats::median()
rnorm(100) %>>%
graphics::plot(col = "red")
```
#### Pipe to `.` in an expression
Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by `{}` or `()` so that `%>>%` will use `.` to represent the value on the left.
```r
mtcars %>>%
{ lm(mpg ~ cyl + wt, data = .) }
```
```r
mtcars %>>%
( lm(mpg ~ cyl + wt, data = .) )
```
#### Pipe by formula as lambda expression
Sometimes, it may look confusing to use `.` to represent the value being piped. For example,
```r
mtcars %>>%
(lm(mpg ~ ., data = .))
```
Although it works perfectly, it may look ambiguous if `.` has several meanings in one line of code.
`%>>%` accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within `()`, for example, `(x ~ f(x))`. It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.
```r
mtcars %>>%
(df ~ lm(mpg ~ ., data = df))
```
```r
mtcars %>>%
subset(select = c(mpg, wt, cyl)) %>>%
(x ~ plot(mpg ~ ., data = x))
```
#### Pipe for side effect
In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling `plot()` to draw scatter plot returns `NULL`, and if one directly calls `plot()` in the middle of a pipeline, it would break the pipeline by changing the subsequent input to `NULL`.
One-sided formula that starts with `~` indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.
```r
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ cat("rows:",nrow(.),"\n")) %>>% # cat() returns NULL
summary
```
```r
mtcars %>>%
subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
(~ plot(mpg ~ wt, data = .)) %>>% # plot() returns NULL
(lm(mpg ~ wt, data = .)) %>>%
summary()
```
With `~`, side-effect operations can be easily distinguished from mainstream pipeline.
An easier way to print the intermediate value it to use `(? expr)` syntax like asking question.
```r
mtcars %>>%
(? ncol(.)) %>>%
summary
```
#### Pipe with assignment
In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).
If one needs to assign the value to a symbol, just insert a step like `(~ symbol)`, then the input value of that step will be assigned to `symbol` in the current environment.
```r
mtcars %>>%
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
```
If the input value is not directly to be saved but after some transformation, then one can use `=`, `<-`, or more natural `->` to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).
```r
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary
```
```r
mtcars %>>%
(~ summary(.) -> summ) %>>%
mtcars %>>%
(~ summ <- summary(.)) %>>%
```
An easier way to saving intermediate value that is to be further piped is to use `(symbol = expression)` syntax:
```r
mtcars %>>%
(~ summ = summary(.)) %>>% # side-effect assignment
(lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>% # continue piping
summary
```
or `(expression -> symbol)` syntax:
```r
mtcars %>>%
(~ summary(.) -> summ) %>>% # side-effect assignment
(lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>% # continue piping
summary
```
#### Extract element from an object
`x %>>% (y)` means extracting the element named `y` from object `x` where `y` must be a valid symbol name and `x` can be a vector, list, environment or anything else for which `[[]]` is defined, or S4 object.
```r
mtcars %>>%
(lm(mpg ~ wt + cyl, data = .)) %>>%
(~ lm_mtcars) %>>%
summary %>>%
(r.squared)
```
#### Compatibility
* Working with [dplyr](https://github.com/hadley/dplyr/):
```r
library(dplyr)
mtcars %>>%
filter(mpg <= mean(mpg)) %>>%
select(mpg, wt, cyl) %>>%
(~ plot(.)) %>>%
(model = lm(mpg ~ wt + cyl, data = .)) %>>%
(summ = summary(.)) %>>%
(coefficients)
```
* Working with [ggvis](http://ggvis.rstudio.com/):
```r
library(ggvis)
mtcars %>>%
ggvis(~mpg, ~wt) %>>%
layer_points()
```
* Working with [rlist](http://renkun.me/rlist/):
```r
library(rlist)
1:100 %>>%
list.group(. %% 3) %>>%
list.mapv(g ~ mean(g))
```
### `Pipe()`
`Pipe()` creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with `Pipe()` and end with `$value` or `[]` to extract the final value of the Pipe.
Pipe object provides an internal function `.(...)` that work exactly in the same way with `x %>>% (...)`, and it has more features than `%>>%`.
> NOTE: `.()` does not support assignment with `=` but supports `~`, `<-` and `->`.
#### Piping
```r
Pipe(rnorm(1000))$
density(kernel = "cosine")$
plot(col = "blue")
```
```r
Pipe(mtcars)$
.(mpg)$
summary()
```
```r
Pipe(mtcars)$
.(~ summary(.) -> summ)$
lm(formula = mpg ~ wt + cyl)$
summary()$
.(coefficients)
```
#### Subsetting and extracting
```r
pmtcars <- Pipe(mtcars)
pmtcars[c("mpg","wt")]$
lm(formula = mpg ~ wt)$
summary()
pmtcars[["mpg"]]$mean()
```
#### Assigning values
```r
plist <- Pipe(list(a=1,b=2))
plist$a <- 0
plist$b <- NULL
```
#### Side effect
```r
Pipe(mtcars)$
.(? ncol(.))$
.(~ plot(mpg ~ ., data = .))$ # side effect: plot
lm(formula = mpg ~ .)$
.(~ lm_mtcars)$ # side effect: assign
summary()$
```
#### Compatibility
* Working with dplyr:
```r
Pipe(mtcars)$
filter(mpg >= mean(mpg))$
select(mpg, wt, cyl)$
lm(formula = mpg ~ wt + cyl)$
summary()$
.(coefficients)$
value
```
* Working with ggvis:
```r
Pipe(mtcars)$
ggvis(~ mpg, ~ wt)$
layer_points()
```
* Working with rlist:
```r
Pipe(1:100)$
list.group(. %% 3)$
list.mapv(g ~ mean(g))$
value
```
### `pipeline()`
`pipeline()` provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in `{}` in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by `%>>%` so that they behave exactly the same.
One notable difference is that in `pipeline()`'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within `()` because no operator priority issues arise as they do in using `%>>%`.
```r
pipeline({
mtcars
lm(formula = mpg ~ cyl + wt)
~ lmodel
summary
? .$r.squared
coef
})
```
Thanks [@hoxo_m](https://twitter.com/hoxo_m) for the idea presented in this [post](http://qiita.com/hoxo_m/items/3fd3d2520fa014a248cb).
## License
This package is under [MIT License](http://opensource.org/licenses/MIT).