Skip to content

Commit

Permalink
Merge pull request #215 from natmokval/doc-fill-in-groupby-agg--docst…
Browse files Browse the repository at this point in the history
…rings

DOC: fill in missing docstrings for GroupBy.agg
  • Loading branch information
MarcoGorelli authored May 27, 2024
2 parents f8ead04 + a58b34f commit 21ea816
Show file tree
Hide file tree
Showing 3 changed files with 90 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/api-reference/group_by.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# `narwhals.GroupBy`

::: narwhals.group_by.GroupBy
handler: python
options:
members:
- agg
show_source: false
show_bases: false
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ nav:
- api-reference/dtypes.md
- api-reference/dependencies.md
- api-reference/selectors.md
- api-reference/group_by.md
theme:
name: material
font: false
Expand Down
80 changes: 80 additions & 0 deletions narwhals/group_by.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,86 @@ def __init__(self, df: DataFrame, *keys: str | Iterable[str]) -> None:
def agg(
self, *aggs: IntoExpr | Iterable[IntoExpr], **named_aggs: IntoExpr
) -> DataFrame:
"""
Compute aggregations for each group of a group by operation.
Arguments:
*aggs: Aggregations to compute for each group of the group by operation,
specified as positional arguments.
**named_aggs: Additional aggregations, specified as keyword arguments.
Examples:
Group by one column or by multiple columns and call `agg` to compute
the grouped sum of another column.
>>> import pandas as pd
>>> import polars as pl
>>> import narwhals as nw
>>> df_pd = pd.DataFrame(
... {
... "a": ["a", "b", "a", "b", "c"],
... "b": [1, 2, 1, 3, 3],
... "c": [5, 4, 3, 2, 1],
... }
... )
>>> df_pl = pl.DataFrame(
... {
... "a": ["a", "b", "a", "b", "c"],
... "b": [1, 2, 1, 3, 3],
... "c": [5, 4, 3, 2, 1],
... }
... )
We define library agnostic functions:
>>> def func(df_any):
... df = nw.from_native(df_any)
... df = df.group_by("a").agg(nw.col("b").sum()).sort("a")
... return nw.to_native(df)
>>> def func_mult_col(df_any):
... df = nw.from_native(df_any)
... df = df.group_by("a", "b").agg(nw.sum("c")).sort("a", "b")
... return nw.to_native(df)
We can then pass either pandas or Polars to `func` and `func_mult_col`:
>>> func(df_pd)
a b
0 a 2
1 b 5
2 c 3
>>> func(df_pl)
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a ┆ 2 │
│ b ┆ 5 │
│ c ┆ 3 │
└─────┴─────┘
>>> func_mult_col(df_pd)
a b c
0 a 1 8
1 b 2 4
2 b 3 2
3 c 3 1
>>> func_mult_col(df_pl)
shape: (4, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a ┆ 1 ┆ 8 │
│ b ┆ 2 ┆ 4 │
│ b ┆ 3 ┆ 2 │
│ c ┆ 3 ┆ 1 │
└─────┴─────┴─────┘
"""
aggs, named_aggs = self._df._flatten_and_extract(*aggs, **named_aggs)
return self._df.__class__(
self._grouped.agg(*aggs, **named_aggs),
Expand Down

0 comments on commit 21ea816

Please sign in to comment.