add docs page

narwhals-dev · Mar 16, 2024 · 9ce6f21 · 9ce6f21
1 parent 0170195
commit 9ce6f21
Show file tree

Hide file tree

Showing 11 changed files with 396 additions and 0 deletions.
diff --git a/.github/workflows/mkdocs.yml b/.github/workflows/mkdocs.yml
@@ -0,0 +1,32 @@
+name: mkdocs
+
+on:
+  push:
+    branches:
+      - main
+permissions:
+  contents: write
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - name: Configure Git Credentials
+        run: |
+          git config user.name github-actions[bot]
+          git config user.email 41898282+github-actions[bot]@users.noreply.github.com
+      - uses: actions/setup-python@v4
+        with:
+          python-version: 3.x
+      - run: echo "cache_id=$(date --utc '+%V')" >> $GITHUB_ENV
+
+
+      - uses: actions/cache@v3
+        with:
+          key: mkdocs-material-${{ env.cache_id }}
+          path: .cache
+          restore-keys: |
+            mkdocs-material-
+      - run: pip install -r docs/requirements-docs.txt -e . pandas polars
+
+      - run: mkdocs gh-deploy --force
diff --git a/.gitignore b/.gitignore
@@ -2,3 +2,4 @@
 *.pyc
 todo.md
 .coverage
+site/
diff --git a/docs/basics/column.md b/docs/basics/column.md
@@ -0,0 +1,94 @@
+# Column
+
+In [dataframe.md](dataframe.md), you learned how to write a dataframe-agnostic function.
+
+We only used DataFrame methods there - but what if we need to operate on its columns?
+
+## Extracting a column
+
+
+## Example 1: filter based on a column's values
+
+```python exec="1" source="above" session="ex1"
+import narwhals as nw
+
+def my_func(df):
+    df_s = nw.DataFrame(df)
+    df_s = df_s.filter(nw.col('a') > 0)
+    return nw.to_native(df_s)
+```
+
+=== "pandas"
+    ```python exec="true" source="material-block" result="python" session="ex1"
+    import pandas as pd
+
+    df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
+
+=== "Polars"
+    ```python exec="true" source="material-block" result="python" session="ex1"
+    import polars as pl
+
+    df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
+
+
+## Example 2: multiply a column's values by a constant
+
+Let's write a dataframe-agnostic function which multiplies the values in column
+`'a'` by 2.
+
+```python exec="1" source="above" session="ex2"
+import narwhals as nw
+
+def my_func(df):
+    df_s = nw.DataFrame(df)
+    df_s = df_s.with_columns(nw.col('a')*2)
+    return nw.to_native(df_s)
+```
+
+=== "pandas"
+    ```python exec="true" source="material-block" result="python" session="ex2"
+    import pandas as pd
+
+    df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
+
+=== "Polars"
+    ```python exec="true" source="material-block" result="python" session="ex2"
+    import polars as pl
+
+    df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
+
+Note that column `'a'` was overwritten. If we had wanted to add a new column called `'c'` containing column `'a'`'s
+values multiplied by 2, we could have used `Column.rename`:
+
+```python exec="1" source="above" session="ex2.1"
+import narwhals as nw
+
+def my_func(df):
+    df_s = nw.DataFrame(df)
+    df_s = df_s.with_columns((nw.col('a')*2).alias('c'))
+    return nw.to_native(df_s)
+```
+
+=== "pandas"
+    ```python exec="true" source="material-block" result="python" session="ex2.1"
+    import pandas as pd
+
+    df = pd.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
+
+=== "Polars"
+    ```python exec="true" source="material-block" result="python" session="ex2.1"
+    import polars as pl
+
+    df = pl.DataFrame({'a': [-1, 1, 3], 'b': [3, 5, -3]})
+    print(my_func(df))
+    ```
diff --git a/docs/basics/complete_example.md b/docs/basics/complete_example.md
@@ -0,0 +1,106 @@
+# Complete example
+
+We're going to write a dataframe-agnostic "Standard Scaler". This class will have
+`fit` and `transform` methods (like `scikit-learn` transformers), and will work
+agnosstically for pandas and Polars.
+
+We'll need to write two methods:
+
+- `fit`: find the mean and standard deviation for each column from a given training set;
+- `transform`: scale a given dataset with the mean and standard deviations calculated
+  during `fit`.
+
+The `fit` method is a bit complicated, so let's start with `transform`.
+Suppose we've already calculated the mean and standard deviation of each column, and have
+stored them in attributes `self.means` and `self.std_devs`.
+
+## Transform method
+
+The general strategy will be:
+
+1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
+2. Express your logic using the subset of the Polars API supported by Narwhals.
+3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.
+
+```python
+import narwhals as nw
+
+class StandardScalar:
+    def transform(self, df):
+        df = nw.DataFrame(df)
+        df = df.with_columns(
+            (nw.col(col) - self._means[col]) / self._std_devs[col]
+            for col in df.columns
+        )
+        return nw.to_native(df)
+```
+
+Note that all the calculations here can stay lazy if the underlying library permits it.
+For Polars, the return value is a `polars.LazyFrame` - it is the caller's responsibility to
+call `.collect()` on the result if they want to materialise its values.
+
+## Fit method
+
+Unlike the `transform` method, `fit` cannot stay lazy, as we need to compute concrete values
+for the means and standard deviations.
+
+To be able to get `Series` out of our `DataFrame`, we'll need the `DataFrame` to be an
+eager one, as Polars doesn't have a concept of lazy `Series`.
+To do that, when we instantiate our `narwhals.DataFrame`, we pass `features=['eager']`,
+which lets us access eager-only features.
+
+```python
+import narwhals as nw
+
+class StandardScalar:
+    def fit(self, df):
+        df = nw.DataFrame(df, features=['eager'])
+        self._means = {df[col].mean() for col in df.columns}
+        self._std_devs = {df[col].std() for col in df.columns}
+```
+
+## Putting it all together
+
+Here is our dataframe-agnostic standard scaler:
+```python exec="1" source="above" session="tute-ex1"
+import narwhals as nw
+
+class StandardScaler:
+    def fit(self, df):
+        df = nw.DataFrame(df, features=["eager"])
+        self._means = {col: df[col].mean() for col in df.columns}
+        self._std_devs = {col: df[col].std() for col in df.columns}
+
+    def transform(self, df):
+        df = nw.DataFrame(df)
+        df = df.with_columns(
+            (nw.col(col) - self._means[col]) / self._std_devs[col]
+            for col in df.columns
+        )
+        return nw.to_native(df)
+```
+
+Next, let's try running it. Notice how, as `transform` doesn't use
+`features=['lazy']`, we can pass a `polars.LazyFrame` to it without issues!
+
+=== "pandas"
+    ```python exec="true" source="material-block" result="python" session="tute-ex1"
+    import pandas as pd
+
+    df_train = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
+    df_test = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
+    scaler = StandardScaler()
+    scaler.fit(df_train)
+    print(scaler.transform(df_test))
+    ```
+
+=== "Polars"
+    ```python exec="true" source="material-block" result="python" session="tute-ex1"
+    import polars as pl
+
+    df_train = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
+    df_test = pl.LazyFrame({'a': [1, 2, 3], 'b': [4, 5, 7]})
+    scaler = StandardScaler()
+    scaler.fit(df_train)
+    print(scaler.transform(df_test).collect())
+    ```
diff --git a/docs/basics/dataframe.md b/docs/basics/dataframe.md
@@ -0,0 +1,41 @@
+# DataFrame
+
+To write a dataframe-agnostic function, the steps you'll want to follow are:
+
+1. Initialise a Narwhals DataFrame by passing your dataframe to `nw.DataFrame`.
+2. Express your logic using the subset of the Polars API supported by Narwhals.
+3. If you need to return a dataframe to the user in its original library, call `narwhals.to_native`.
+
+Let's try writing a simple example.
+
+## Example 1: group-by and mean
+
+Make a Python file `t.py` with the following content:
+```python exec="1" source="above" session="df_ex1"
+import narwhals as nw
+
+def func(df):
+    # 1. Create a Narwhals dataframe
+    df_s = nw.DataFrame(df)
+    # 2. Use the subset of the Polars API supported by Narwhals
+    df_s = df_s.group_by('a').agg(nw.col('b').mean())
+    # 3. Return a library from the user's original library
+    return nw.to_native(df_s)
+```
+Let's try it out:
+
+=== "pandas"
+    ```python exec="true" source="material-block" result="python" session="df_ex1"
+    import pandas as pd
+
+    df = pd.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
+    print(func(df))
+    ```
+
+=== "Polars"
+    ```python exec="true" source="material-block" result="python" session="df_ex1"
+    import polars as pl
+
+    df = pl.DataFrame({'a': [1, 1, 2], 'b': [4, 5, 6]})
+    print(func(df))
+    ```
diff --git a/docs/index.md b/docs/index.md
@@ -0,0 +1,19 @@
+# Narwhals
+
+Extremely lightweight compatibility layer between pandas and Polars:
+
+- ✅ No dependencies.
+- ✅ Lightweight: wheel is smaller than 30 kB.
+- ✅ Simple, minimal, and predictable.
+
+No need to choose - support both with ease!
+
+## Who's this for?
+
+Anyone wishing to write a library/application/service which consumes dataframes, and wishing to make it
+completely dataframe-agnostic.
+
+## Let's get started!
+
+- [Installation](installation.md)
+- [Quick start](quick_start.md)
diff --git a/docs/installation.md b/docs/installation.md
@@ -0,0 +1,16 @@
+# Installation
+
+First, make sure you have [created and activated](https://docs.python.org/3/library/venv.html) a Python3.8+ virtual environment.
+
+Then, run
+```console
+python -m pip install narwhals
+```
+
+Then, if you start the Python REPL and see the following:
+```python
+>>> import narwhals
+>>> narwhals
+'0.4.1'
+```
+then installation worked correctly!
diff --git a/docs/quick_start.md b/docs/quick_start.md
@@ -0,0 +1,43 @@
+# Quick start
+
+## Prerequisites
+
+Please start by following the [installation instructions](installation.md)
+
+Then, please install the following:
+
+- [pandas](https://pandas.pydata.org/docs/getting_started/install.html)
+- [Polars](https://pola-rs.github.io/polars/user-guide/installation/)
+
+## Simple example
+
+Create a Python file `t.py` with the following content:
+
+```python
+import pandas as pd
+import polars as pl
+import narwhals as nw
+
+
+def my_function(df_any):
+    df = nw.DataFrame(df_any)
+    column_names = df.column_names
+    return column_names
+
+
+df_pandas = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
+df_polars = pl.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]})
+
+print('pandas result: ', my_function(df_pandas))
+print('Polars result: ', my_function(df_polars))
+```
+
+If you run `python t.py` and your output looks like this:
+```
+pandas result: ['a', 'b']
+Polars result: ['a', 'b']
+```
+
+then all your installations worked perfectly.
+
+Let's learn about what you just did, and what Narwhals can do for you.
diff --git a/docs/reference.md b/docs/reference.md
@@ -0,0 +1,12 @@
+# Reference
+
+Here are some related projects.
+
+## Dataframe Interchange Protocol
+
+Standardised way of interchanging data between libraries, see
+[here](https://data-apis.org/dataframe-protocol/latest/index.html).
+
+## Array API
+
+Array counterpart to the DataFrame API, see [here](https://data-apis.org/array-api/2022.12/index.html).
diff --git a/docs/requirements-docs.txt b/docs/requirements-docs.txt
@@ -0,0 +1,5 @@
+markdown-exec[ansi]
+mkdocs
+mkdocs-material
+mkdocstrings
+mkdocstrings[python]
-Original file line number
+Diff line change
@@ Expand Up / @@ -2,3 +2,4 @@ @@
     *.pyc
     todo.md
     .coverage
+    site/