README.Rmd

```{r knitsetup, echo=FALSE, results='hide', warning=FALSE, message=FALSE, cache=FALSE}
library(knitr)
opts_knit$set(base.dir='./', fig.path='', out.format='md')
opts_chunk$set(prompt=FALSE, comment='', results='markup')
library(pipeR)
```

# pipeR

[![Linux Build Status](https://travis-ci.org/renkun-ken/pipeR.png?branch=master)](https://travis-ci.org/renkun-ken/pipeR) 
[![Windows Build status](https://ci.appveyor.com/api/projects/status/github/renkun-ken/pipeR?svg=true)](https://ci.appveyor.com/project/renkun-ken/pipeR)
[![codecov.io](http://codecov.io/github/renkun-ken/pipeR/coverage.svg?branch=master)](http://codecov.io/github/renkun-ken/pipeR?branch=master)
[![CRAN Version](http://www.r-pkg.org/badges/version/pipeR)](https://cran.r-project.org/package=pipeR)

pipeR provides various styles of function chaining methods: 

* Pipe operator
* Pipe object
* pipeline function

Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression

* As the first unnamed argument of the function
* As dot symbol (`.`) in the expression
* As a named variable defined by a formula
* For side-effect that carries over the input to the next
* For assignment that saves an intermediate value

The syntax is designed to make the pipeline more readable and friendly to
a wide variety of operations.

**[pipeR Tutorial](http://renkun.me/pipeR-tutorial) is a highly recommended complete guide to pipeR.**

This document is also translated into [日本語](https://github.com/renkun-ken/pipeR/blob/master/README.ja.md) (by [@hoxo_m](https://github.com/hoxo-m/)).

## Installation

Install the latest development version from GitHub:

```r
devtools::install_github("renkun-ken/pipeR")
```

Install from [CRAN](https://cran.r-project.org/package=pipeR):

```r
install.packages("pipeR")
```

## Getting started

The following code is an example written in traditional approach:

It basically performs bootstrap on `mpg` values in built-in dataset `mtcars` and plots  its density function estimated by Gaussian kernel.

```r
plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE), 
  kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")
```

The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, `Pipe()` function and `pipeline()` function, respectively.

* Operator-based pipeline

```r
mtcars$mpg %>>%
  sample(size = 10000, replace = TRUE) %>>%
  density(kernel = "gaussian") %>>%
  plot(col = "red", main = "density of mpg (bootstrap)")
```

* Object-based pipeline (`Pipe()`)

```r
Pipe(mtcars$mpg)$
  sample(size = 10000, replace = TRUE)$
  density(kernel = "gaussian")$
  plot(col = "red", main = "density of mpg (bootstrap)")
```

* Argument-based pipeline

```r
pipeline(mtcars$mpg,
  sample(size = 10000, replace = TRUE),
  density(kernel = "gaussian"),
  plot(col = "red", main = "density of mpg (bootstrap)"))
```

* Expression-based pipeline

```r
pipeline({
  mtcars$mpg
  sample(size = 10000, replace = TRUE)
  density(kernel = "gaussian")
  plot(col = "red", main = "density of mpg (bootstrap)")  
})
```

## Usage

### `%>>%`

Pipe operator `%>>%` basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.

#### Pipe to first-argument of function

Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.

On the right-hand side of `%>>%`, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.

```r
rnorm(100) %>>%
  plot
```

```r
rnorm(100) %>>%
  plot(col="red")
```

Sometimes the value on the left is needed at multiple places. One can use `.` to represent it anywhere in the function call.

```r
rnorm(100) %>>%
  plot(col="red", main=length(.))
```

There are situations where one calls a function in a namespace with `::`. In this case, the call must end up with `()`.

```r
rnorm(100) %>>%
  stats::median()
  
rnorm(100) %>>%
  graphics::plot(col = "red")
```

#### Pipe to `.` in an expression

Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by `{}` or `()` so that `%>>%` will use `.` to represent the value on the left.

```r
mtcars %>>%
  { lm(mpg ~ cyl + wt, data = .) }
```

```r
mtcars %>>%
  ( lm(mpg ~ cyl + wt, data = .) )
```

#### Pipe by formula as lambda expression

Sometimes, it may look confusing to use `.` to represent the value being piped. For example,

```r
mtcars %>>%
  (lm(mpg ~ ., data = .))
```

Although it works perfectly, it may look ambiguous if `.` has several meanings in one line of code. 

`%>>%` accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within `()`, for example, `(x ~ f(x))`. It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.

```r
mtcars %>>%
  (df ~ lm(mpg ~ ., data = df))
```

```r
mtcars %>>%
  subset(select = c(mpg, wt, cyl)) %>>%
  (x ~ plot(mpg ~ ., data = x))
```

#### Pipe for side effect

In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling `plot()` to draw scatter plot returns `NULL`, and if one directly calls `plot()` in the middle of a pipeline, it would break the pipeline by changing the subsequent input to `NULL`.

One-sided formula that starts with `~` indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.

```r
mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ cat("rows:",nrow(.),"\n")) %>>%   # cat() returns NULL
  summary
```

```r
mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ plot(mpg ~ wt, data = .)) %>>%    # plot() returns NULL
  (lm(mpg ~ wt, data = .)) %>>%
  summary()
```

With `~`, side-effect operations can be easily distinguished from mainstream pipeline.

An easier way to print the intermediate value it to use `(? expr)` syntax like asking question.

```r
mtcars %>>% 
  (? ncol(.)) %>>%
  summary
```

#### Pipe with assignment

In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).

If one needs to assign the value to a symbol, just insert a step like `(~ symbol)`, then the input value of that step will be assigned to `symbol` in the current environment.

```r
mtcars %>>%
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary
```

If the input value is not directly to be saved but after some transformation, then one can use `=`, `<-`, or more natural `->` to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).

```r
mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary
```

```r
mtcars %>>%
  (~ summary(.) -> summ) %>>%
  
mtcars %>>%
  (~ summ <- summary(.)) %>>%
```

An easier way to saving intermediate value that is to be further piped is to use `(symbol = expression)` syntax:

```r
mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>%  # continue piping
  summary
```

or `(expression -> symbol)` syntax:

```r
mtcars %>>%
  (~ summary(.) -> summ) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>%  # continue piping
  summary
```

#### Extract element from an object

`x %>>% (y)` means extracting the element named `y` from object `x` where `y` must be a valid symbol name and `x` can be a vector, list, environment or anything else for which `[[]]` is defined, or S4 object.

```r
mtcars %>>%
  (lm(mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary %>>%
  (r.squared)
```

#### Compatibility

* Working with [dplyr](https://github.com/hadley/dplyr/):

```r
library(dplyr)
mtcars %>>%
  filter(mpg <= mean(mpg)) %>>%  
  select(mpg, wt, cyl) %>>%
  (~ plot(.)) %>>%
  (model = lm(mpg ~ wt + cyl, data = .)) %>>%
  (summ = summary(.)) %>>%
  (coefficients)
```

* Working with [ggvis](http://ggvis.rstudio.com/):

```r
library(ggvis)
mtcars %>>%
  ggvis(~mpg, ~wt) %>>%
  layer_points()
```

* Working with [rlist](http://renkun.me/rlist/):

```r
library(rlist)
1:100 %>>%
  list.group(. %% 3) %>>%
  list.mapv(g ~ mean(g))
```

### `Pipe()`

`Pipe()` creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with `Pipe()` and end with `$value` or `[]` to extract the final value of the Pipe. 

Pipe object provides an internal function `.(...)` that work exactly in the same way with `x %>>% (...)`, and it has more features than `%>>%`.

> NOTE: `.()` does not support assignment with `=` but supports `~`, `<-` and `->`.

#### Piping

```r
Pipe(rnorm(1000))$
  density(kernel = "cosine")$
  plot(col = "blue")
```

```r
Pipe(mtcars)$
  .(mpg)$
  summary()
```

```r
Pipe(mtcars)$
  .(~ summary(.) -> summ)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)
```

#### Subsetting and extracting

```r
pmtcars <- Pipe(mtcars)
pmtcars[c("mpg","wt")]$
  lm(formula = mpg ~ wt)$
  summary()
pmtcars[["mpg"]]$mean()
```

#### Assigning values

```r
plist <- Pipe(list(a=1,b=2))
plist$a <- 0
plist$b <- NULL
```

#### Side effect

```r
Pipe(mtcars)$
  .(? ncol(.))$
  .(~ plot(mpg ~ ., data = .))$    # side effect: plot
  lm(formula = mpg ~ .)$
  .(~ lm_mtcars)$                  # side effect: assign
  summary()$
```

#### Compatibility

* Working with dplyr:

```r
Pipe(mtcars)$
  filter(mpg >= mean(mpg))$
  select(mpg, wt, cyl)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)$
  value
```

* Working with ggvis:

```r
Pipe(mtcars)$
  ggvis(~ mpg, ~ wt)$
  layer_points()
```

* Working with rlist:

```r
Pipe(1:100)$
  list.group(. %% 3)$
  list.mapv(g ~ mean(g))$
  value
```

### `pipeline()`

`pipeline()` provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in `{}` in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by `%>>%` so that they behave exactly the same.

One notable difference is that in `pipeline()`'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within `()` because no operator priority issues arise as they do in using `%>>%`.

```r
pipeline({
  mtcars
  lm(formula = mpg ~ cyl + wt)
  ~ lmodel
  summary
  ? .$r.squared
  coef
})
```

Thanks [@hoxo_m](https://twitter.com/hoxo_m) for the idea presented in this [post](http://qiita.com/hoxo_m/items/3fd3d2520fa014a248cb).

## License

This package is under [MIT License](http://opensource.org/licenses/MIT).