-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add methods to Dask backend #637
Comments
I'll take |
I'll take |
Taking |
Take |
@MarcoGorelli I have a couple of questions:
|
hey! in pandas, some Series methods such as
pandas / Dask would then auto-align on the index, which is what we want to avoid. in pandas we can just check the index values, as it's already eager, but in dask we don't have a way to do this (though I have opened an issue about this dask/dask-expr#1112) yup, dataframe methods too 😎 |
I'm assuming Expr.min/Expr.max have also to be done, so I'll take those |
(Asking for a friend 👀) how much cheating are we allowed to? Specifically, pandas-like |
😄 should be fine to reuse that one |
Just to give a sense of how far we have gone and what's still missing.
|
I'm working on null_count and quantile. |
will work on |
will work on |
I will take |
I'll pick up Will look at Double edit: I think GH Issues:
Theoretically it'd be possible to write an implementation of some kind, but that's probably outside of the scope of Narwhals. I think the same goes for Triple edit: Have taken |
I'll take |
As there is not much left: for anyone interested, we could use the DaskNamespace |
I'll take |
Woah, looks like a lot is done! Is any stuff left around this? Looks like the remaining expression implementations are dependent on #743. I know a lot have dataframe equivalent implementations though (i.e. filtering a single expression is risky, but there's already a filter method that applies to the whole frame). Also, happy to volunteer myself to compile a list if it'd be handy! 😁 |
Ok, here's the list as good as I can figure for 'done' vs 'outstanding'. I've copied @FBruzzesi's one as well just so that they're all in one place! (stuck with the * for things waiting on #743 decision) Seems like almost everything (aside from index-tricky stuff) is done! 🎉 DaskExpr
ExprDateTimeNamespace
ExprCatNamespace
Dataframe
DaskNamespace
Edit: Just looked at this again, and realised that:
So I think everything that doesn't require more discussion/thought is implemented? |
Since #272 we have really minimal support for Dask DataFrame
Expr methods should go in
narwhals/_dask/expr.py
. For tests, you should use existing tests, and just remove thepart. If you can remove that, and the test passes, it means you've done it correctly
Not too hard
Examples of
Expr
methods which we should add (see here for the full list):Expr.__sub__
Expr.__mul__
Expr.shift
Expr.cum_sum
Expr.is_between
Note: we should not add anything which modifies the index. So, the following should not be added, even though they appear on the list in the link above:
Harder
DataFrame.group_by
DataFrame.filter
Expr.dt
,Expr.str
, ...)General guidelines:
An easy way to check if something still needs doing is to look for tests with
Then:
The text was updated successfully, but these errors were encountered: