Skip to content

Commit

Permalink
merge main
Browse files Browse the repository at this point in the history
  • Loading branch information
FBruzzesi committed Nov 1, 2024
2 parents cf2366d + 5c3db5b commit 17e4309
Show file tree
Hide file tree
Showing 203 changed files with 3,521 additions and 1,648 deletions.
4 changes: 2 additions & 2 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ updates:
- package-ecosystem: "github-actions"
directory: "/"
schedule:
# Check for updates to GitHub Actions every week
interval: "weekly"
# Check for updates to GitHub Actions every month
interval: "monthly"
commit-message:
prefix: "skip changelog" # So this PR will not be added to release-drafter
include: "scope" # List of the updated dependencies in the commit will be added
69 changes: 59 additions & 10 deletions .github/workflows/downstream_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]
dependencies: ["core", "core,optional"]
dependencies: ["core,optional"]

runs-on: ${{ matrix.os }}
steps:
Expand All @@ -73,19 +73,27 @@ jobs:
run: |
git clone https://github.com/marimo-team/marimo.git --depth=1
cd marimo
uv venv -p 3.12
git log
- name: install-basics
run: uv pip install --upgrade tox virtualenv setuptools hatch --system
- name: install-marimo-dev
run: |
cd marimo
uv pip install -e ".[dev]" --system
. .venv/bin/activate
uv pip install -e ".[dev]"
which python
- name: install-narwhals-dev
run: |
uv pip uninstall narwhals --system
uv pip install -e . --system
cd marimo
. .venv/bin/activate
uv pip uninstall narwhals
uv pip install -e ./..
- name: show-deps
run: uv pip freeze
run: |
cd marimo
. .venv/bin/activate
uv pip freeze
- name: Create assets directory, copy over index.html
continue-on-error: true
run: |
Expand All @@ -96,12 +104,13 @@ jobs:
if: ${{ matrix.dependencies == 'core,optional' }}
run: |
cd marimo
hatch run +py=${{ matrix.python-version }} test-optional:test-narwhals
. .venv/bin/activate
# make sure that we use the .venv when running tests, so that
# the local narwhals install is picked up
sed -i '/^\[tool.hatch.envs.default\]/a path = ".venv"' pyproject.toml
hatch run python -c "import narwhals; print(narwhals.__file__)"
hatch run test-optional:test-narwhals
timeout-minutes: 15
- name: Run typechecks
run: |
cd marimo
hatch run typecheck:check

scikit-lego:
strategy:
Expand Down Expand Up @@ -181,3 +190,43 @@ jobs:
run: |
cd py-shiny
make narwhals-test-integration
tubular:
strategy:
matrix:
python-version: ["3.12"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v3
with:
enable-cache: "true"
cache-suffix: ${{ matrix.python-version }}
cache-dependency-glob: "**requirements*.txt"
- name: clone-tubular
run: |
git clone https://github.com/lvgig/tubular --depth=1
cd tubular
git log
- name: install-basics
run: uv pip install --upgrade tox virtualenv setuptools pytest-env --system
- name: install-tubular-dev
run: |
cd tubular
uv pip install -e .[dev] --system
- name: install-narwhals-dev
run: |
uv pip uninstall narwhals --system
uv pip install -e . --system
- name: show-deps
run: uv pip freeze
- name: Run pytest
run: |
cd tubular
pytest tests --config-file=pyproject.toml
2 changes: 1 addition & 1 deletion .github/workflows/extremes.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ jobs:
nightlies:
strategy:
matrix:
python-version: ["3.12"]
python-version: ["3.13"]
os: [ubuntu-latest]
if: github.event.pull_request.head.repo.full_name == github.repository
runs-on: ${{ matrix.os }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ jobs:
pytest-windows:
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.12"]
os: [windows-latest]

runs-on: ${{ matrix.os }}
Expand All @@ -61,7 +61,7 @@ jobs:
pytest-coverage:
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.9", "3.11", "3.13"]
os: [ubuntu-latest]

runs-on: ${{ matrix.os }}
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ coverage.xml
# Documentation
site/
todo.md
docs/this.md
docs/api-completeness/*.md
!docs/api-completeness/index.md

Expand Down
8 changes: 5 additions & 3 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
ci:
autoupdate_schedule: monthly
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
# Ruff version.
rev: 'v0.6.9'
rev: 'v0.7.1'
hooks:
# Run the formatter.
- id: ruff-format
# Run the linter.
- id: ruff
args: [--fix]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: 'v1.11.2'
rev: 'v1.13.0'
hooks:
- id: mypy
additional_dependencies: ['polars==1.4.1', 'pytest==8.3.2']
Expand Down Expand Up @@ -40,7 +42,7 @@ repos:
hooks:
- id: nbstripout
- repo: https://github.com/adamchainz/blacken-docs
rev: "1.19.0" # replace with latest tag on GitHub
rev: "1.19.1" # replace with latest tag on GitHub
hooks:
- id: blacken-docs
args: [--skip-errors]
Expand Down
4 changes: 4 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,10 @@ nox

Notice that nox will also require to have all the python versions that are defined in the `noxfile.py` installed in your system.

#### Testing cuDF

We can't currently test in CI against cuDF, but you can test it manually in Kaggle using GPUs. Please follow this [Kaggle notebook](https://www.kaggle.com/code/marcogorelli/testing-cudf-in-narwhals) to run the tests.

### 7. Building docs

To build the docs, run `mkdocs serve`, and then open the link provided in a browser.
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,13 @@ Join the party!

- [Altair](https://github.com/vega/altair/)
- [Hamilton](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/narwhals)
- [marimo](https://github.com/marimo-team/marimo)
- [pymarginaleffects](https://github.com/vincentarelbundock/pymarginaleffects)
- [scikit-lego](https://github.com/koaning/scikit-lego)
- [scikit-playtime](https://github.com/koaning/scikit-playtime)
- [timebasedcv](https://github.com/FBruzzesi/timebasedcv)
- [marimo](https://github.com/marimo-team/marimo)
- [tubular](https://github.com/lvgig/tubular)
- [wimsey](https://github.com/benrutter/wimsey)

Feel free to add your project to the list if it's missing, and/or
[chat with us on Discord](https://discord.gg/V3PqtB4VA4) if you'd like any support.
Expand Down
6 changes: 6 additions & 0 deletions docs/api-reference/dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,20 @@
- get_polars
- get_pyarrow
- is_cudf_dataframe
- is_cudf_index
- is_cudf_series
- is_dask_dataframe
- is_ibis_table
- is_into_dataframe
- is_into_series
- is_modin_dataframe
- is_modin_index
- is_modin_series
- is_numpy_array
- is_pandas_dataframe
- is_pandas_index
- is_pandas_like_dataframe
- is_pandas_like_index
- is_pandas_like_series
- is_pandas_series
- is_polars_dataframe
Expand Down
19 changes: 10 additions & 9 deletions docs/api-reference/expr_dt.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,23 @@
members:
- convert_time_zone
- date
- year
- month
- day
- ordinal_day
- hour
- minute
- second
- millisecond
- microsecond
- millisecond
- minute
- month
- nanosecond
- ordinal_day
- replace_time_zone
- total_minutes
- total_seconds
- total_milliseconds
- second
- timestamp
- total_microseconds
- total_milliseconds
- total_minutes
- total_nanoseconds
- total_seconds
- to_string
- year
show_source: false
show_bases: false
1 change: 1 addition & 0 deletions docs/api-reference/narwhals.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Here are the top-level functions available in Narwhals.
- from_dict
- from_native
- from_arrow
- generate_temporary_column_name
- get_level
- get_native_namespace
- is_ordered_categorical
Expand Down
19 changes: 10 additions & 9 deletions docs/api-reference/series_dt.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,23 @@
members:
- convert_time_zone
- date
- year
- month
- day
- ordinal_day
- hour
- minute
- second
- millisecond
- microsecond
- millisecond
- minute
- month
- nanosecond
- ordinal_day
- replace_time_zone
- total_minutes
- total_seconds
- total_milliseconds
- second
- timestamp
- total_microseconds
- total_milliseconds
- total_minutes
- total_nanoseconds
- total_seconds
- to_string
- year
show_source: false
show_bases: false
76 changes: 76 additions & 0 deletions docs/basics/dataframe_conversion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# Conversion between libraries

Some library maintainers must apply complex dataframe operations, using methods and functions that may not (yet) be implemented in Narwhals. In such cases, Narwhals can still be highly beneficial, by allowing easy dataframe conversion.

## Dataframe X in, pandas out

Imagine that you maintain a library with a function that operates on pandas dataframes to produce automated reports. You want to allow users to supply a dataframe in any format to that function (pandas, Polars, DuckDB, cuDF, Modin, etc.) without adding all those dependencies to your own project and without special-casing each input library's variation of `to_pandas` / `toPandas` / `to_pandas_df` / `df` ...

One solution is to use Narwhals as a thin Dataframe ingestion layer, to convert user-supplied dataframe to the format that your library uses internally. Since Narwhals is zero-dependency, this is a much more lightweight solution than including all the dataframe libraries as dependencies,
and easier to write than special casing each input library's `to_pandas` method (if it even exists!).

To illustrate, we create dataframes in various formats:

```python exec="1" source="above" session="conversion"
import narwhals as nw
from narwhals.typing import IntoDataFrame

import duckdb
import polars as pl
import pandas as pd

df_polars = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
df_pandas = df_polars.to_pandas()
df_duckdb = duckdb.sql("SELECT * FROM df_polars")
```

Now, we define a function that can ingest any dataframe type supported by Narwhals, and convert it to a pandas DataFrame for internal use:

```python exec="1" source="above" session="conversion" result="python"
def df_to_pandas(df: IntoDataFrame) -> pd.DataFrame:
return nw.from_native(df).to_pandas()


print(df_to_pandas(df_polars))
```

## Dataframe X in, Polars out

### Via PyCapsule Interface

Similarly, if your library uses Polars internally, you can convert any user-supplied dataframe to Polars format using Narwhals.

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return nw.from_arrow(nw.from_native(df), native_namespace=pl).to_native()


print(df_to_polars(df_duckdb)) # You can only execute this line of code once.
```

It works to pass Polars to `native_namespace` here because Polars supports the [PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html) for import.

Note that the PyCapsule Interface makes no guarantee that you can call it repeatedly, so the approach above only works if you
only expect to perform the conversion a single time on each input object.

### Via PyArrow

If you need to ingest the same dataframe multiple times, then you may want to go via PyArrow instead.
This may be less efficient than the PyCapsule approach above (and always requires PyArrow!), but is more forgiving:

```python exec="1" source="above" session="conversion" result="python"
def df_to_polars(df: IntoDataFrame) -> pl.DataFrame:
return pl.DataFrame(nw.from_native(df).to_arrow())


df_duckdb = duckdb.sql("SELECT * FROM df_polars")
print(df_to_polars(df_duckdb)) # We can execute this...
print(df_to_polars(df_duckdb)) # ...as many times as we like!
```
Loading

0 comments on commit 17e4309

Please sign in to comment.