-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User-defined functions documentation doesn't tell reader how to write fast functions #14699
Comments
I will try to work on this. Depending how long it gets this might end up being a series of issues+PRs. |
There's a (somewhat) related PR open already: #13392 |
Thanks! I'll take the comments and info there into account. But at this point I'm contemplating a much more significant rewrite, given these APIs are so tricky to use correctly. |
More problems: the documented behavior of
Except the actual output in the documentation isn't the wrong results, and group "b" does not in fact have values from group "a"... |
yeah it needs updating since #13181 |
OK so what are the expected semantics of |
that looks right (.venv) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ cat t.py
import polars as pl
def func(x):
print('batch is: ', x)
return x
df = pl.DataFrame({'group': ['a', 'a', 'b'], 'value': [1, 2, 3]})
df.select(pl.col('value').map_batches(func))
df.select(pl.col('value').map_batches(func).over('group'))
(.venv) marcogorelli@DESKTOP-U8OKFP3:~/scratch$ python t.py
batch is: shape: (3,)
Series: 'value' [i64]
[
1
2
3
]
batch is: shape: (2,)
Series: '' [i64]
[
1
2
]
batch is: shape: (1,)
Series: '' [i64]
[
3
] |
If that's the case, my first inclination is to not document |
If you exclude groups and you have then
Is mostly there same as
So it's just a convenience shortcut really. |
Description
https://docs.pola.rs/user-guide/expressions/user-defined-functions/ talks about Python functions, the slowest option.
A later section, NumPy, does talk about using NumPy ufuncs, but the title of the section is "NumPy" so unless you already know that NumPy has this functionality you won't know to look there.
And fast-and-flexible option of Numba isn't mentioned anywhere.
I therefore propose updating the user-defined functions page as follows:
This may involve merging or moving some of the NumPy content, not sure yet.
Link
https://docs.pola.rs/user-guide/expressions/user-defined-functions/
The text was updated successfully, but these errors were encountered: