Skip to content

Commit

Permalink
building abstractions with pipes
Browse files Browse the repository at this point in the history
  • Loading branch information
KDruzhkin committed Jun 3, 2024
1 parent 5d2d5a7 commit 7091069
Show file tree
Hide file tree
Showing 8 changed files with 107 additions and 43 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# --8<-- [start:setup]
import polars as pl
# --8<-- [end:setup]


# --8<-- [start:hypothenuse]
def hypothenuse(df: pl.DataFrame, col_x: str, col_y: str, col_r: str) -> pl.DataFrame:
"Apply the Pythagorean theorem."
x_squared = pl.col(col_x).pow(2)
y_squared = pl.col(col_y).pow(2)
r_squared = x_squared + y_squared
r = r_squared.sqrt()
return df.with_columns(r.alias(col_r))
# --8<-- [end:hypothenuse]


# --8<-- [start:pipe]
df = pl.DataFrame(
{
"x": [1.1, 2.2, 3.3],
"y": [3.1, 2.2, 1.3],
}
).pipe(hypothenuse, "x", "y", "r")

print(df)
# --8<-- [end:pipe]
42 changes: 0 additions & 42 deletions docs/user-guide/misc/debugging_with_pipes.md

This file was deleted.

26 changes: 26 additions & 0 deletions docs/user-guide/pipes/building_abstractions_with_pipes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Building abstractions with pipes

All programming languages (e.g. Rust or Python) provide some _primitive operations_ (e.g. `+` or `sqrt`), some means of _combining_ them into complex pipelines, and some means of _hiding complexity behind abstractions_. An abstraction (e.g. a named function) is a simple name for a piece of complex code.

The API of Polars is a small domain-specific language. This language cannot (and should not) accomodate all needs with an ever-growing vocabulary of primitive operations. Instead it gives you the tools to build your own abstractions.

```python exec="on" session="user-guide/pipes/building_abstractions_with_pipes"
--8<-- "python/user-guide/pipes/building_abstractions_with_pipes.py:setup"
```

Suppose, for example, that you frequently have to apply the Pythagorean theorem to your data. Create a function for that:

{{code_block('user-guide/pipes/building_abstractions_with_pipes','hypothenuse',[])}}

```python exec="on" session="user-guide/pipes/building_abstractions_with_pipes"
--8<-- "python/user-guide/pipes/building_abstractions_with_pipes.py:hypothenuse"
```

... and apply it with `pipe`:

{{code_block('user-guide/pipes/building_abstractions_with_pipes','pipe',['pipe'])}}

```python exec="on" result="text" session="user-guide/pipes/building_abstractions_with_pipes"
--8<-- "python/user-guide/pipes/building_abstractions_with_pipes.py:pipe"
```

42 changes: 42 additions & 0 deletions docs/user-guide/pipes/debugging_with_pipes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Debugging with pipes

Suppose that you write a long chain of transformations:

{{code_block('user-guide/pipes/debugging_with_pipes','pipeline1',[])}}

```python exec="on" session="user-guide/pipes/debugging_with_pipes"
--8<-- "python/user-guide/pipes/debugging_with_pipes.py:setup"
```

```python exec="on" session="user-guide/pipes/debugging_with_pipes"
--8<-- "python/user-guide/pipes/debugging_with_pipes.py:pipeline1"
```

... and in the middle of the chain something breaks.

How do you insert `print` and `assert` statements into the middle of the chain?

Consider writing your own helper functions and saving them
(as you might need them multiple times in the future). For example:

{{code_block('user-guide/pipes/debugging_with_pipes','assert_schema',[])}}

```python exec="on" session="user-guide/pipes/debugging_with_pipes"
--8<-- "python/user-guide/pipes/debugging_with_pipes.py:assert_schema"
```

{{code_block('user-guide/pipes/debugging_with_pipes','print_expr',[])}}

```python exec="on" session="user-guide/pipes/debugging_with_pipes"
--8<-- "python/user-guide/pipes/debugging_with_pipes.py:print_expr"
```

Now you can insert a couple of lines here:

{{code_block('user-guide/pipes/debugging_with_pipes','pipeline2',[])}}

```python exec="on" result="text" session="user-guide/pipes/debugging_with_pipes"
--8<-- "python/user-guide/pipes/debugging_with_pipes.py:pipeline2"
```

When your debugging session is over, you can remove those lines.
8 changes: 8 additions & 0 deletions docs/user-guide/pipes/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Pipes

In the previous sections (`Expressions` and `Transformations`) you saw a variety of predefined tools for working with data.

In this section you will learn how to create your own tools.

- [Building abstractions with pipes](building_abstractions_with_pipes.md),
- [Debugging with pipes](debugging_with_pipes.md).
5 changes: 4 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,10 @@ nav:
- user-guide/transformations/time-series/rolling.md
- user-guide/transformations/time-series/resampling.md
- user-guide/transformations/time-series/timezones.md
- Pipes:
- user-guide/pipes/index.md
- user-guide/pipes/building_abstractions_with_pipes.md
- user-guide/pipes/debugging_with_pipes.md
- Lazy API:
- user-guide/lazy/index.md
- user-guide/lazy/using.md
Expand Down Expand Up @@ -81,7 +85,6 @@ nav:
- Misc:
- user-guide/misc/multiprocessing.md
- user-guide/misc/visualization.md
- user-guide/misc/debugging_with_pipes.md
- user-guide/misc/comparison.md

- API reference: api/index.md
Expand Down
1 change: 1 addition & 0 deletions py-polars/docs/source/reference/dataframe/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ This page gives an overview of all public DataFrame methods.
modify_select
miscellaneous
plot
pipe

.. currentmodule:: polars

Expand Down

0 comments on commit 7091069

Please sign in to comment.