Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Add Development and Releases sections to the documentation #11932

Merged
merged 11 commits into from
Oct 24, 2023
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/docs-global.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ jobs:
- uses: actions/checkout@v4
- uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: docs/mlc-config.json
folder-path: docs

lint:
Expand Down
285 changes: 285 additions & 0 deletions docs/development/contributing.md

Large diffs are not rendered by default.

74 changes: 74 additions & 0 deletions docs/development/versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Versioning

## Version changes

Polars adheres to the [semantic versioning](https://semver.org/) specification.

As Polars has not released its `1.0.0` version yet, breaking releases lead to a minor version increase (e.g. from `0.18.15` to `0.19.0`), while all other releases increment the patch version (e.g. from `0.18.14` to `0.18.15`)

## Policy for breaking changes

Polars takes backwards compatibility seriously, but we are not afraid to change things if it leads to a better product.

!!! warning Rust users only

The Rust API for Polars is currently not considered stable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we word this less drastically? Maybe that we are more certain about the lazy API, but that the internals may expect more breaking API changes. Also because it is harder to hide the API than it is in python.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will change the wording here!

Copy link
Member Author

@stinodego stinodego Oct 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't really have to say anything about the stability of the API. We just need to mention that we don't deprecate things in Rust. I rewrote it to:

Breaking changes to the Rust API are not deprecated first, but will be listed in the changelog.
Supporting deprecated functionality would slow down development too much at this point in time.

Functionality can be changed or removed without warning.

### Philosophy

We don't always get it right on the first try.
We learn as we go along and get feedback from our users.
Sometimes, we're a little too eager to get out a new feature and didn't ponder all the possible implications.

If this happens, we correct our mistakes and introduce a breaking change.
Most of the time, this is no big deal.
Users get a deprecation warning, they do a quick search-and-replace in their code base, and that's that.

At times, we run into an issue requires more effort on our user's part to fix.
A change in the query engine can seriously impact the assumptions in a data pipeline.
We do not make such changes lightly, but we will make them if we believe it makes Polars better.

Freeing ourselves of past indiscretions is important to keep Polars moving forward.
We know it takes time and energy for our users to keep up with new releases, but in the end, it benefits everyone for Polars to be the best product possible.
stinodego marked this conversation as resolved.
Show resolved Hide resolved

### What qualifies as a breaking change

**A breaking change occurs when an existing component of the public API is changed or removed.**

A feature is part of the public API if it is documented in the [API reference](https://pola-rs.github.io/polars/py-polars/html/reference/).

Examples of breaking changes:

- A deprecated function or method is removed.
- The default value of a parameter is changed.
- The outcome of a query has changed due to changes to the query engine.

Examples of changes that are _not_ considered breaking:

- An undocumented function is removed.
- The module path of a public class is changed.
- An optional parameter is added to an existing method.

Bug fixes are not considered a breaking change, even though it may impact some users' [workflows](https://xkcd.com/1172/).

### Deprecation warnings

If we decide to introduce a breaking change, the existing behavior is deprecated _if possible_.
For example, if we choose to rename a function, the new function is added alongside the old function, and using the old function will result in a deprecation warning.

Not all changes can be deprecated nicely.
A change to the query engine may have effects across a large part of the API.
Such changes will not be warned for, but _will_ be included in the changelog and the migration guide.

### Deprecation period

As a rule, deprecated functionality is removed two breaking releases after the deprecation happens.
For example, a function deprecated in version `0.18.3` will be removed in version `0.20.0`.

This means that if your program does not raise any deprecation warnings, it should be mostly safe to upgrade to the next breaking release.
As breaking releases happen about once every three months, this allows three to six months to adjust to any pending breaking changes.

**In some cases, we may decide to adjust the deprecation period.**
If retaining the deprecated functionality blocks other improvements to Polars, we may shorten the deprecation period to a single breaking release. This will be mentioned in the warning message.
If the deprecation affects many users, we may extend the deprecation period.
4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,9 @@ See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchm

--8<-- "docs/people.md"

## Contribute
## Contributing

Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to implementing new features. If you're unclear on how to proceed read our [contribution guide](https://github.com/pola-rs/polars/blob/main/CONTRIBUTING.md) or contact us on [discord](https://discord.com/invite/4UfP5cfBE7).
We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing.md) to learn more.

## License

Expand Down
7 changes: 7 additions & 0 deletions docs/mlc-config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"ignorePatterns": [
{
"pattern": "^https://crates.io/"
}
]
}
4 changes: 4 additions & 0 deletions docs/releases/changelog.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Changelog

This page is under construction.
Please refer to our [GitHub releases](https://github.com/pola-rs/polars/releases) in the meantime.
199 changes: 199 additions & 0 deletions docs/releases/upgrade/0.19.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# 0.19

This document is intended to help you upgrade from an older Polars version to Polars version `0.19.*`.

For the full list of changes, please see the [release notes](https://github.com/pola-rs/polars/releases/tag/py-0.19.0).

## Breaking changes

While this section does not contain an exhaustive list of all breaking changes, we think these are most likely to impact your code.

### Aggregation functions no longer support horizontal computation

This impacts aggregation functions like `sum`, `min`, and `max`.
These functions were overloaded to support both vertical and horizontal computation.
Recently, new dedicated functionality for horizontal computation was released, and horizontal computation was deprecated.

Restore the old behavior by using the horizontal variant, e.g. `sum_horizontal`.

#### Example

Before:

```shell
>>> df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
>>> df.select(pl.sum('a', 'b')) # horizontal computation
shape: (2, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 12 │
│ 14 │
└─────┘
```

After:

```shell
>>> df = pl.DataFrame({'a': [1, 2], 'b': [11, 12]})
>>> df.select(pl.sum('a', 'b')) # vertical computation
shape: (1, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3 ┆ 23 │
└─────┴─────┘
```

### Update to `all` / `any`

`all` will now ignore null values by default, rather than treat them as `False`.

For both `any` and `all`, the `drop_nulls` parameter has been renamed to `ignore_nulls` and is now keyword-only.
Also fixed an issue when setting this parameter to `False` would erroneously result in `None` output in some cases.

To restore the old behavior, set `ignore_nulls` to `False` and check for `None` output.

#### Example

Before:

```shell
>>> pl.Series([True, None]).all()
False
```

After:

```shell
>>> pl.Series([True, None]).all()
True
```

### Improved error types for many methods

Improving our error messages is an ongoing effort.
We did a sweep of our Python code base and made many improvements to error messages and error types.
Most notably, many `ValueError`s were changed to `TypeError`s.

If your code relies on handling Polars exceptions, you may have to make some adjustments.

#### Example

Before:

```shell
>>> pl.Series(values=15)
...
ValueError: Series constructor called with unsupported type; got 'int'
```

After:

```shell
>>> pl.Series(values=15)
...
TypeError: Series constructor called with unsupported type 'int' for the `values` parameter
```

### Updates to expression input parsing

Methods like `select` and `with_columns` accept one or more expressions.
But they also accept strings, integers, lists, and other inputs that we try to interpret as expressions.
We updated our internal logic to parse inputs more consistently.

#### Example

Before:

```shell
>>> pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 1)
┌─────┐
│ a │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 2 │
└─────┘
```

After:

```shell
>>> pl.DataFrame({'a': [1, 2]}).with_columns(None)
shape: (2, 2)
┌─────┬─────────┐
│ a ┆ literal │
│ --- ┆ --- │
│ i64 ┆ null │
╞═════╪═════════╡
│ 1 ┆ null │
│ 2 ┆ null │
└─────┴─────────┘
```

### `shuffle` / `sample` now use an internal Polars seed

If you used the built-in Python `random.seed` function to control the randomness of Polars expressions, this will no longer work.
Instead, use the new `set_random_seed` function.

#### Example

Before:

```python
import random

random.seed(1)
```

After:

```python
import polars as pl

pl.set_random_seed(1)
```

## Deprecations

Creating a consistent and intuitive API is hard, finding the right name for each function, method, and parameter might be the hardest part.
stinodego marked this conversation as resolved.
Show resolved Hide resolved
The new version comes with quite some naming changes, and you will most likely run into deprecation warnings when upgrading to `0.19`.
stinodego marked this conversation as resolved.
Show resolved Hide resolved

If you want to upgrade without worrying about deprecation warnings right now, you can add the following snippet to your code:

```python
import warnings

warnings.filterwarnings("ignore", category=DeprecationWarning)
```

### `groupby` renamed to `group_by`

This is not a change we make lightly, as it will impact almost all our users. But "group by" are really two different words, and our naming strategy dictates that these should be separated by an underscore.
stinodego marked this conversation as resolved.
Show resolved Hide resolved

Most likely, a simple search and replace will be enough to take care of this update:

- Search: `.groupby(`
- Replace: `.group_by(`

### `apply` renamed to `map_*`

`apply` is probably the most misused part of our API. Many Polars users come from pandas, where `apply` has a completely different meaning.

We now consolidate all our functionality for user-defined functions under the name `map`. This results in the following renaming:

| Before | After |
| --------------------------- | -------------- |
| `Series/Expr.apply` | `map_elements` |
| `Series/Expr.rolling_apply` | `rolling_map` |
| `DataFrame.apply` | `map_rows` |
| `GroupBy.apply` | `map_groups` |
| `pl.apply` | `map_groups` |
| `map` | `map_batches` |
12 changes: 12 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,15 @@ repo_name: pola-rs/polars
# Documentation layout
nav:
- Home: index.md

- Getting started:
- getting-started/intro.md
- getting-started/installation.md
- getting-started/series-dataframes.md
- getting-started/reading-writing.md
- getting-started/expressions.md
- getting-started/joins.md

- User guide:
- user-guide/index.md
- user-guide/installation.md
Expand Down Expand Up @@ -81,6 +83,16 @@ nav:
- user-guide/misc/alternatives.md
- user-guide/misc/reference-guides.md
- user-guide/misc/contributing.md

- Development:
- development/contributing.md
- development/versioning.md

- Releases:
- releases/changelog.md
- Upgrade guide:
- releases/upgrade/0.19.md

not_in_nav: |
/_build/
people.md
Expand Down