-
Notifications
You must be signed in to change notification settings - Fork 89
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0170195
commit 9ce6f21
Showing
11 changed files
with
396 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: mkdocs | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
permissions: | ||
contents: write | ||
jobs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Configure Git Credentials | ||
run: | | ||
git config user.name github-actions[bot] | ||
git config user.email 41898282+github-actions[bot]@users.noreply.github.com | ||
- uses: actions/setup-python@v4 | ||
with: | ||
python-version: 3.x | ||
- run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV | ||
|
||
|
||
- uses: actions/cache@v3 | ||
with: | ||
key: mkdocs-material-${{ env.cache_id }} | ||
path: .cache | ||
restore-keys: | | ||
mkdocs-material- | ||
- run: pip install -r docs/requirements-docs.txt -e . pandas polars | ||
|
||
- run: mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,3 +2,4 @@ | |
*.pyc | ||
todo.md | ||
.coverage | ||
site/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
# Column | ||
|
||
In [dataframe.md](dataframe.md), you learned how to write a dataframe-agnostic function. | ||
|
||
We only used DataFrame methods there - but what if we need to operate on its columns? | ||
|
||
## Extracting a column | ||
|
||
|
||
## Example 1: filter based on a column's values | ||
|
||
```python exec="1" source="above" session="ex1" | ||
import narwhals as nw | ||
|
||
def my_func(df): | ||
df_s = nw.DataFrame(df) | ||
df_s = df_s.filter(nw.col('a') > 0) | ||
return nw.to_native(df_s) | ||
``` | ||
|
||
=== "pandas" | ||
```python exec="true" source="material-block" result="python" session="ex1" | ||
import pandas as pd | ||
|
||
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` | ||
|
||
=== "Polars" | ||
```python exec="true" source="material-block" result="python" session="ex1" | ||
import polars as pl | ||
|
||
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` | ||
|
||
|
||
## Example 2: multiply a column's values by a constant | ||
|
||
Let's write a dataframe-agnostic function which multiplies the values in column | ||
`'a'` by 2. | ||
|
||
```python exec="1" source="above" session="ex2" | ||
import narwhals as nw | ||
|
||
def my_func(df): | ||
df_s = nw.DataFrame(df) | ||
df_s = df_s.with_columns(nw.col('a')*2) | ||
return nw.to_native(df_s) | ||
``` | ||
|
||
=== "pandas" | ||
```python exec="true" source="material-block" result="python" session="ex2" | ||
import pandas as pd | ||
|
||
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` | ||
|
||
=== "Polars" | ||
```python exec="true" source="material-block" result="python" session="ex2" | ||
import polars as pl | ||
|
||
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` | ||
|
||
Note that column `'a'` was overwritten. If we had wanted to add a new column called `'c'` containing column `'a'`'s | ||
values multiplied by 2, we could have used `Column.rename`: | ||
|
||
```python exec="1" source="above" session="ex2.1" | ||
import narwhals as nw | ||
|
||
def my_func(df): | ||
df_s = nw.DataFrame(df) | ||
df_s = df_s.with_columns((nw.col('a')*2).alias('c')) | ||
return nw.to_native(df_s) | ||
``` | ||
|
||
=== "pandas" | ||
```python exec="true" source="material-block" result="python" session="ex2.1" | ||
import pandas as pd | ||
|
||
df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` | ||
|
||
=== "Polars" | ||
```python exec="true" source="material-block" result="python" session="ex2.1" | ||
import polars as pl | ||
|
||
df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]}) | ||
print(my_func(df)) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,106 @@ | ||
# Complete example | ||
|
||
We're going to write a dataframe-agnostic "Standard Scaler". This class will have | ||
`fit` and `transform` methods (like `scikit-learn` transformers), and will work | ||
agnosstically for pandas and Polars. | ||
|
||
We'll need to write two methods: | ||
|
||
- `fit`: find the mean and standard deviation for each column from a given training set; | ||
- `transform`: scale a given dataset with the mean and standard deviations calculated | ||
during `fit`. | ||
|
||
The `fit` method is a bit complicated, so let's start with `transform`. | ||
Suppose we've already calculated the mean and standard deviation of each column, and have | ||
stored them in attributes `self.means` and `self.std_devs`. | ||
|
||
## Transform method | ||
|
||
The general strategy will be: | ||
|
||
1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`. | ||
2. Express your logic using the subset of the Polars API supported by Narwhals. | ||
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`. | ||
|
||
```python | ||
import narwhals as nw | ||
|
||
class StandardScalar: | ||
def transform(self, df): | ||
df = nw.DataFrame(df) | ||
df = df.with_columns( | ||
(nw.col(col) - self._means[col]) / self._std_devs[col] | ||
for col in df.columns | ||
) | ||
return nw.to_native(df) | ||
``` | ||
|
||
Note that all the calculations here can stay lazy if the underlying library permits it. | ||
For Polars, the return value is a `polars.LazyFrame` - it is the caller's responsibility to | ||
call `.collect()` on the result if they want to materialise its values. | ||
|
||
## Fit method | ||
|
||
Unlike the `transform` method, `fit` cannot stay lazy, as we need to compute concrete values | ||
for the means and standard deviations. | ||
|
||
To be able to get `Series` out of our `DataFrame`, we'll need the `DataFrame` to be an | ||
eager one, as Polars doesn't have a concept of lazy `Series`. | ||
To do that, when we instantiate our `narwhals.DataFrame`, we pass `features=['eager']`, | ||
which lets us access eager-only features. | ||
|
||
```python | ||
import narwhals as nw | ||
|
||
class StandardScalar: | ||
def fit(self, df): | ||
df = nw.DataFrame(df, features=['eager']) | ||
self._means = {df[col].mean() for col in df.columns} | ||
self._std_devs = {df[col].std() for col in df.columns} | ||
``` | ||
|
||
## Putting it all together | ||
|
||
Here is our dataframe-agnostic standard scaler: | ||
```python exec="1" source="above" session="tute-ex1" | ||
import narwhals as nw | ||
|
||
class StandardScaler: | ||
def fit(self, df): | ||
df = nw.DataFrame(df, features=["eager"]) | ||
self._means = {col: df[col].mean() for col in df.columns} | ||
self._std_devs = {col: df[col].std() for col in df.columns} | ||
|
||
def transform(self, df): | ||
df = nw.DataFrame(df) | ||
df = df.with_columns( | ||
(nw.col(col) - self._means[col]) / self._std_devs[col] | ||
for col in df.columns | ||
) | ||
return nw.to_native(df) | ||
``` | ||
|
||
Next, let's try running it. Notice how, as `transform` doesn't use | ||
`features=['lazy']`, we can pass a `polars.LazyFrame` to it without issues! | ||
|
||
=== "pandas" | ||
```python exec="true" source="material-block" result="python" session="tute-ex1" | ||
import pandas as pd | ||
|
||
df_train = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | ||
df_test = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | ||
scaler = StandardScaler() | ||
scaler.fit(df_train) | ||
print(scaler.transform(df_test)) | ||
``` | ||
|
||
=== "Polars" | ||
```python exec="true" source="material-block" result="python" session="tute-ex1" | ||
import polars as pl | ||
|
||
df_train = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | ||
df_test = pl.LazyFrame({'a': [1, 2, 3], 'b': [4, 5, 7]}) | ||
scaler = StandardScaler() | ||
scaler.fit(df_train) | ||
print(scaler.transform(df_test).collect()) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# DataFrame | ||
|
||
To write a dataframe-agnostic function, the steps you'll want to follow are: | ||
|
||
1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`. | ||
2. Express your logic using the subset of the Polars API supported by Narwhals. | ||
3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`. | ||
|
||
Let's try writing a simple example. | ||
|
||
## Example 1: group-by and mean | ||
|
||
Make a Python file `t.py` with the following content: | ||
```python exec="1" source="above" session="df_ex1" | ||
import narwhals as nw | ||
|
||
def func(df): | ||
# 1. Create a Narwhals dataframe | ||
df_s = nw.DataFrame(df) | ||
# 2. Use the subset of the Polars API supported by Narwhals | ||
df_s = df_s.group_by('a').agg(nw.col('b').mean()) | ||
# 3. Return a library from the user's original library | ||
return nw.to_native(df_s) | ||
``` | ||
Let's try it out: | ||
|
||
=== "pandas" | ||
```python exec="true" source="material-block" result="python" session="df_ex1" | ||
import pandas as pd | ||
|
||
df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]}) | ||
print(func(df)) | ||
``` | ||
|
||
=== "Polars" | ||
```python exec="true" source="material-block" result="python" session="df_ex1" | ||
import polars as pl | ||
|
||
df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]}) | ||
print(func(df)) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# Narwhals | ||
|
||
Extremely lightweight compatibility layer between pandas and Polars: | ||
|
||
- ✅ No dependencies. | ||
- ✅ Lightweight: wheel is smaller than 30 kB. | ||
- ✅ Simple, minimal, and predictable. | ||
|
||
No need to choose - support both with ease! | ||
|
||
## Who's this for? | ||
|
||
Anyone wishing to write a library/application/service which consumes dataframes, and wishing to make it | ||
completely dataframe-agnostic. | ||
|
||
## Let's get started! | ||
|
||
- [Installation](installation.md) | ||
- [Quick start](quick_start.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Installation | ||
|
||
First, make sure you have [created and activated](https://docs.python.org/3/library/venv.html) a Python3.8+ virtual environment. | ||
|
||
Then, run | ||
```console | ||
python -m pip install narwhals | ||
``` | ||
|
||
Then, if you start the Python REPL and see the following: | ||
```python | ||
>>> import narwhals | ||
>>> narwhals | ||
'0.4.1' | ||
``` | ||
then installation worked correctly! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
# Quick start | ||
|
||
## Prerequisites | ||
|
||
Please start by following the [installation instructions](installation.md) | ||
|
||
Then, please install the following: | ||
|
||
- [pandas](https://pandas.pydata.org/docs/getting_started/install.html) | ||
- [Polars](https://pola-rs.github.io/polars/user-guide/installation/) | ||
|
||
## Simple example | ||
|
||
Create a Python file `t.py` with the following content: | ||
|
||
```python | ||
import pandas as pd | ||
import polars as pl | ||
import narwhals as nw | ||
|
||
|
||
def my_function(df_any): | ||
df = nw.DataFrame(df_any) | ||
column_names = df.column_names | ||
return column_names | ||
|
||
|
||
df_pandas = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) | ||
df_polars = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}) | ||
|
||
print('pandas result: ', my_function(df_pandas)) | ||
print('Polars result: ', my_function(df_polars)) | ||
``` | ||
|
||
If you run `python t.py` and your output looks like this: | ||
``` | ||
pandas result: ['a', 'b'] | ||
Polars result: ['a', 'b'] | ||
``` | ||
|
||
then all your installations worked perfectly. | ||
|
||
Let's learn about what you just did, and what Narwhals can do for you. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Reference | ||
|
||
Here are some related projects. | ||
|
||
## Dataframe Interchange Protocol | ||
|
||
Standardised way of interchanging data between libraries, see | ||
[here](https://data-apis.org/dataframe-protocol/latest/index.html). | ||
|
||
## Array API | ||
|
||
Array counterpart to the DataFrame API, see [here](https://data-apis.org/array-api/2022.12/index.html). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
markdown-exec[ansi] | ||
mkdocs | ||
mkdocs-material | ||
mkdocstrings | ||
mkdocstrings[python] |
Oops, something went wrong.