feat: add methods to Dask backend #637

MarcoGorelli · 2024-07-27T13:39:35Z

Since #272 we have really minimal support for Dask DataFrame

Expr methods should go in narwhals/_dask/expr.py. For tests, you should use existing tests, and just remove the

    if "dask" in str(constructor):
        request.applymarker(pytest.mark.xfail)

part. If you can remove that, and the test passes, it means you've done it correctly

Not too hard

Examples of Expr methods which we should add (see here for the full list):

Expr.__sub__
Expr.__mul__
Expr.shift
Expr.cum_sum
Expr.is_between

Note: we should not add anything which modifies the index. So, the following should not be added, even though they appear on the list in the link above:

Expr.sort
Expr.head
Expr.tail

Harder

DataFrame.group_by
DataFrame.filter
get things started with namespaces (e.g. Expr.dt, Expr.str, ...)

General guidelines:

please don't ask for the issue to be assigned to you
please don't ask for permission to work on this issue
if you're a first time contribute, please choose 1 method at a time to implement, and leave a comment noting which method you're working on (if you've contributed to Narwhals before, feel free to choose multiple)
have fun 🥳

Example pull request: https://github.com/narwhals-dev/narwhals/pull/731/files
contributing guide: https://github.com/narwhals-dev/narwhals/blob/main/CONTRIBUTING.md

An easy way to check if something still needs doing is to look for tests with

    if "dask" in str(constructor):
        request.applymarker(pytest.mark.xfail)

Then:

try removing those two lines
run the test, check that it fails
implement this functionality for Dask
check the test passes

The text was updated successfully, but these errors were encountered:

anopsy · 2024-07-27T16:00:16Z

I'll take Expr.__sub__

aidoskanapyanov · 2024-07-27T16:12:19Z

I'll take Expr.is_between

anopsy · 2024-07-27T16:35:09Z

Taking Expr.__mul__

DeaMariaLeon · 2024-07-28T16:25:19Z

Take Expr.sum

FBruzzesi · 2024-07-28T20:19:10Z

@MarcoGorelli I have a couple of questions:

Could you expand a little bit more, here or somewhere else on the following:

we should not add anything which modifies the index
Should we refer to this issue also for dataframe methods? i.e. not Expr only?

MarcoGorelli · 2024-07-28T20:57:42Z

hey!

in pandas, some Series methods such as Series.sort_values change the index:

In [3]: s
Out[3]:
0    3
1    2
2    1
dtype: int64

In [4]: s.sort_values()
Out[4]:
2    1
1    2
0    3
dtype: int64

pandas / Dask would then auto-align on the index, which is what we want to avoid. in pandas we can just check the index values, as it's already eager, but in dask we don't have a way to do this (though I have opened an issue about this dask/dask-expr#1112)

yup, dataframe methods too 😎

anopsy · 2024-07-29T08:46:36Z

I'm assuming Expr.min/Expr.max have also to be done, so I'll take those

FBruzzesi · 2024-07-30T21:04:10Z

(Asking for a friend 👀) how much cheating are we allowed to? Specifically, pandas-like translate_dtype should apply one-to-one for dask.

MarcoGorelli · 2024-07-30T21:10:17Z

😄 should be fine to reuse that one

FBruzzesi · 2024-08-11T15:39:26Z

anopsy · 2024-08-12T06:20:13Z

I'm working on null_count and quantile.

lucianosrp · 2024-08-12T11:13:31Z

will work on dt.to_string

luke396 · 2024-08-15T04:06:27Z

will work on total_microseconds

raisadz · 2024-08-15T12:21:32Z

I will take diff

benrutter · 2024-08-16T08:32:53Z

I'll pick up quantile! Edit: No I won't, sorry @anopsy!

Will look at is_duplicated instead 😅

Double edit: I think is_duplicated might be a candidate for "not_implemented", looking at these github issues, it was initially deemed out of scope (since it a tricky thing to check in parallel) and now looks like it could be tabled again soon.

GH Issues:

Theoretically it'd be possible to write an implementation of some kind, but that's probably outside of the scope of Narwhals.

I think the same goes for is_unique as well.

Triple edit: Have taken is_in instead, PR incoming!

aidoskanapyanov · 2024-08-20T12:21:36Z

I'll take cast

FBruzzesi · 2024-08-21T07:10:54Z

As there is not much left: for anyone interested, we could use the DaskNamespace concat implementation 😇

benrutter · 2024-08-21T09:56:02Z

I'll take concat if nobody else has yet!

benrutter · 2024-09-18T11:59:13Z

Woah, looks like a lot is done! Is any stuff left around this? Looks like the remaining expression implementations are dependent on #743. I know a lot have dataframe equivalent implementations though (i.e. filtering a single expression is risky, but there's already a filter method that applies to the whole frame).

Also, happy to volunteer myself to compile a list if it'd be handy! 😁

benrutter · 2024-10-22T15:15:23Z

MarcoGorelli added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Jul 27, 2024

aidoskanapyanov mentioned this issue Jul 27, 2024

feat: add is_between for dask #642

Merged

10 tasks

anopsy mentioned this issue Jul 27, 2024

feat: added dask Expr.__sub__ #643

Merged

10 tasks

anopsy mentioned this issue Jul 27, 2024

feat: added dask Expr.__mul__ and test cases #644

Merged

10 tasks

aidoskanapyanov mentioned this issue Jul 27, 2024

feat: add starts_with for dask #648

Merged

10 tasks

DeaMariaLeon mentioned this issue Jul 28, 2024

feat: Added Dask Expr.sum #662

Merged

10 tasks

aidoskanapyanov mentioned this issue Jul 28, 2024

feat: add ends_with, contains, slice for dask #664

Merged

10 tasks

aidoskanapyanov mentioned this issue Jul 29, 2024

feat: add to_lowercase, to_uppercase, to_datetime for dask #665

Merged

10 tasks

aidoskanapyanov mentioned this issue Jul 29, 2024

feat: add year, month, day, hour, minute, second, millisecond, micros… #666

Merged

10 tasks

FBruzzesi mentioned this issue Jul 29, 2024

feat: dask select method #667

Merged

10 tasks

benrutter mentioned this issue Jul 29, 2024

feat: dask columns property & filter method #670

Merged

10 tasks

MarcoGorelli pinned this issue Jul 29, 2024

lucianosrp mentioned this issue Jul 29, 2024

feat: add drop_nulls for dask #675

Merged

10 tasks

benrutter mentioned this issue Jul 29, 2024

Feat: Dask filter and __and__ method added #676

Merged

10 tasks

lucianosrp mentioned this issue Jul 30, 2024

feat: add fill_null to DaskExpr #685

Merged

10 tasks

FBruzzesi mentioned this issue Jul 30, 2024

feat: dask schema and collect_schema #688

Merged

10 tasks

mistShard mentioned this issue Jul 30, 2024

feat: Added max, min, and round to dask backend #689

Closed

10 tasks

This was referenced Jul 31, 2024

feat: add dask Expr.min and test #690

Merged

feat: add dask Expr.max and test #691

Merged

This was referenced Aug 10, 2024

feat: dask namespace lit method #772

Merged

feat: dask sum_horizontal #775

Merged

FBruzzesi mentioned this issue Aug 11, 2024

feat: dask lazyframe remaining methods #778

Merged

10 tasks

anopsy mentioned this issue Aug 14, 2024

feat: add dask Expr.null_count #792

Merged

10 tasks

raisadz mentioned this issue Aug 15, 2024

feat: add diff() method to Dask backend #793

Merged

10 tasks

benrutter mentioned this issue Aug 16, 2024

feat: add is_in to DaskExpr and mark is_duplicated and is_unique as not implemented #802

Merged

10 tasks

FBruzzesi mentioned this issue Aug 16, 2024

feat: dask expr is_unique & is_duplicated #803

Merged

10 tasks

lucianosrp mentioned this issue Aug 18, 2024

feat: add dt.to_string to DaskExpr #796

Merged

10 tasks

FBruzzesi mentioned this issue Aug 18, 2024

feat: DaskExpr.over method #810

Merged

10 tasks

luke396 mentioned this issue Aug 19, 2024

feat: add DaskExpr.total_minutes, total_seconds, total_milliseconds, total_microseconds, total_nanoseconds #811

Merged

10 tasks

aidoskanapyanov mentioned this issue Aug 20, 2024

feat: dask expr cast #821

Merged

10 tasks

luke396 unpinned this issue Aug 21, 2024

luke396 pinned this issue Aug 21, 2024

anopsy mentioned this issue Aug 21, 2024

feat: add DaskExpr.quantile #835

Merged

10 tasks

benrutter mentioned this issue Aug 21, 2024

feat: dask namespace concat method #840

Merged

10 tasks

FBruzzesi removed help wanted Extra attention is needed good first issue Good for newcomers labels Sep 21, 2024

FBruzzesi unpinned this issue Sep 21, 2024

FBruzzesi added the needs discussion label Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add methods to Dask backend #637

feat: add methods to Dask backend #637

MarcoGorelli commented Jul 27, 2024 •

edited

Loading

anopsy commented Jul 27, 2024 •

edited

Loading

aidoskanapyanov commented Jul 27, 2024

anopsy commented Jul 27, 2024

DeaMariaLeon commented Jul 28, 2024

FBruzzesi commented Jul 28, 2024

MarcoGorelli commented Jul 28, 2024

anopsy commented Jul 29, 2024

FBruzzesi commented Jul 30, 2024

MarcoGorelli commented Jul 30, 2024

FBruzzesi commented Aug 11, 2024 •

edited

Loading

anopsy commented Aug 12, 2024

lucianosrp commented Aug 12, 2024

luke396 commented Aug 15, 2024

raisadz commented Aug 15, 2024

benrutter commented Aug 16, 2024 •

edited

Loading

aidoskanapyanov commented Aug 20, 2024

FBruzzesi commented Aug 21, 2024 •

edited

Loading

benrutter commented Aug 21, 2024

benrutter commented Sep 18, 2024

benrutter commented Oct 22, 2024 •

edited

Loading

feat: add methods to Dask backend #637

feat: add methods to Dask backend #637

Comments

MarcoGorelli commented Jul 27, 2024 • edited Loading

Not too hard

Harder

General guidelines:

anopsy commented Jul 27, 2024 • edited Loading

aidoskanapyanov commented Jul 27, 2024

anopsy commented Jul 27, 2024

DeaMariaLeon commented Jul 28, 2024

FBruzzesi commented Jul 28, 2024

MarcoGorelli commented Jul 28, 2024

anopsy commented Jul 29, 2024

FBruzzesi commented Jul 30, 2024

MarcoGorelli commented Jul 30, 2024

FBruzzesi commented Aug 11, 2024 • edited Loading

anopsy commented Aug 12, 2024

lucianosrp commented Aug 12, 2024

luke396 commented Aug 15, 2024

raisadz commented Aug 15, 2024

benrutter commented Aug 16, 2024 • edited Loading

aidoskanapyanov commented Aug 20, 2024

FBruzzesi commented Aug 21, 2024 • edited Loading

benrutter commented Aug 21, 2024

benrutter commented Sep 18, 2024

benrutter commented Oct 22, 2024 • edited Loading

MarcoGorelli commented Jul 27, 2024 •

edited

Loading

anopsy commented Jul 27, 2024 •

edited

Loading

FBruzzesi commented Aug 11, 2024 •

edited

Loading

benrutter commented Aug 16, 2024 •

edited

Loading

FBruzzesi commented Aug 21, 2024 •

edited

Loading

benrutter commented Oct 22, 2024 •

edited

Loading