feat: Implement `round` and `truncate` for Duration columns #12597

rob-sil · 2023-11-20T22:53:16Z

I tried to match the behavior of dt.round and dt.truncate for dates and datetimes as closely as possible. Since dt.round is experimental, I wanted to highlight:

Rounding happens in the time unit of the series. This means that rounding any microseconds series to "1ns" causes a divide-by-zero error, as one nanosecond is converted to zero microseconds.
The offset shifts the windows after rounding, not before. dt.round(every="1h", offset="30m") rounds 25 minutes to 30 minutes, but rounds 35 minutes to 90 minutes.

crates/polars-time/src/round.rs

MarcoGorelli

This is off to a good start, but we need to be careful about some subtleties

For example, is this expected?

In [3]: pl.Series([timedelta(days=29)]).dt.round('1mo')
Out[3]: 
shape: (1,)
Series: '' [duration[μs]]
[
        28d
]

?

I'd suggest raising if offset isn't a constant duration

crates/polars-time/src/round.rs

MarcoGorelli

sorry for the delay, just got a question about negative durations

py-polars/tests/unit/namespaces/test_datetime.py

crates/polars-time/src/truncate.rs

MarcoGorelli · 2024-01-03T16:36:33Z

thanks for updating - CI is red, could you fixup please then I'll take another look?

stinodego · 2024-02-14T13:04:01Z

@MarcoGorelli would you mind giving this PR another look? It looks to me like it's ready to be merged.

MarcoGorelli · 2024-02-14T13:20:09Z

nice! yup, taking a look this afternoon

UPDATE: spent some time trying to break this, even with hypothesis tests, and wasn't able to. Well done! Gonna try out some final things, then will approve

MarcoGorelli

Looks good!

Just some minor notes:

if you round with a zero-duration, then you can a mysterious panic exception:

In [7]: pl.Series([timedelta(seconds=30)]).dt.round('0m')

PanicException: attempt to calculate the remainder with a divisor of zero

Could an informative error be raised here instead?

instead of Err(polars_err!, I think you can just use polars_bail!

MarcoGorelli · 2024-02-19T10:20:49Z

thanks for updating - doing a final pass today, but this might be it

EDIT: just noticed some minor things, like that the sign of every is ignored - I'll add a commit to fixup

MarcoGorelli

Wait sorry, there is an actual issue here

If you try running

import polars as pl
from datetime import date, datetime, timedelta

df = pl.DataFrame({
    'a': [timedelta(1)]
}).with_columns(
    b=pl.col('a').dt.cast_time_unit('ms'),
    c=pl.col('a').dt.cast_time_unit('ns'),
)
print(df.select(pl.all().dt.truncate('1h', offset='1m')))
print(df.select(pl.all().dt.truncate('1h', offset='-1m')))

you'll see

shape: (1, 3)
┌──────────────┬──────────────┬──────────────┐
│ a            ┆ b            ┆ c            │
│ ---          ┆ ---          ┆ ---          │
│ duration[μs] ┆ duration[ms] ┆ duration[ns] │
╞══════════════╪══════════════╪══════════════╡
│ 1d 1m        ┆ 1d 1m        ┆ 1d 1m        │
└──────────────┴──────────────┴──────────────┘
shape: (1, 3)
┌──────────────┬──────────────┬──────────────┐
│ a            ┆ b            ┆ c            │
│ ---          ┆ ---          ┆ ---          │
│ duration[μs] ┆ duration[ms] ┆ duration[ns] │
╞══════════════╪══════════════╪══════════════╡
│ 1d 1m        ┆ 1d 1m        ┆ 1d 1m        │
└──────────────┴──────────────┴──────────────┘

The second one should be different?

codecov · 2024-03-17T18:50:05Z

Codecov Report

Attention: Patch coverage is 70.68966% with 34 lines in your changes missing coverage. Please review.

Project coverage is 80.68%. Comparing base (8bad2fd) to head (1554a57).

Files	Patch %	Lines
crates/polars-time/src/round.rs	55.76%	23 Missing ⚠️
...ates/polars-plan/src/dsl/function_expr/datetime.rs	50.00%	6 Missing ⚠️
crates/polars-time/src/truncate.rs	90.38%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #12597      +/-   ##
==========================================
- Coverage   80.69%   80.68%   -0.02%     
==========================================
  Files        1485     1485              
  Lines      195485   195546      +61     
  Branches     2782     2782              
==========================================
+ Hits       157747   157769      +22     
- Misses      37226    37265      +39     
  Partials      512      512

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rob-sil · 2024-03-17T20:54:11Z

Hi, sorry for the delay!

The negative offset error should be fixed now. Turns out Duration's duration_ns doesn't include its sign and just returns the absolute value of the duration in nanoseconds. I made a note about it, but should it return negative values for negative durations instead?

Regarding the sign of every: what's the expect outcome of rounding a duration to a negative duration?

Round symmetrically, so rounding with every=-1h is the same as every=1h. I assumed this option, which is what the PR currently does.
Raising an error, like if someone tries to round dates and datetimes to negative durations.
Rounding in the opposite direction. I think this is what pandas does, but it might be a bug.

Also, @MarcoGorelli, you mentioned using hypothesis tests. Should I add some?

MarcoGorelli · 2024-03-19T21:25:24Z

Thanks @rob-sil for updating!

Regarding the sign of every: what's the expect outcome of rounding a duration to a negative duration?

This doesn't sound well-defined. "round to the next month" I can understand what it means, but "round to the next minus one month"?

In [16]: pl.Series([datetime(2020, 2, 15)]).dt.round('-1mo')
Out[16]:
shape: (1,)
Series: '' [datetime[μs]]
[
        2020-02-01 00:00:00
]

My guess is that it only currently works by accident, and that it's gone unnoticed because it's not something most users would write anyway

Do you fancy making a separate precursor PR in which you disallow rounding by negative every for date/datetime? Then we can do the same in this PR, and then finally approve it 💪

MarcoGorelli

Looks solid, nice one @rob-sil

rob-sil · 2024-03-27T03:34:01Z

Thanks for all the help, @MarcoGorelli! Does this need anything else to merge?

MarcoGorelli · 2024-04-13T16:47:16Z

crates/polars-time/src/round.rs

+        if !offset.is_constant_duration() {
+            polars_bail!(InvalidOperation: "Cannot offset a Duration series by a non-constant duration.");
+        }


offset has been deprecated for round in a separate PR - rather than adding complexity here for something which is now deprecated anyway, can we just raise if someone passes offset when rounding a duration?

This would be backwards-compatible as rounding durations wasn't supported yet

crates/polars-time/src/round.rs

codspeed-hq · 2024-04-14T20:00:26Z

CodSpeed Performance Report

Merging #12597 will not alter performance

_{Comparing rob-sil:round-durations (1554a57) with main (8bad2fd)}

Summary

✅ 37 untouched benchmarks

crates/polars-time/src/round.rs

MarcoGorelli

Looks good to me

A couple of things that I think should be discussed as follow-ups:

when should ComputeError vs InvalidOperationError be raised?
the case "scalar vs expression" is not covered here - but it's also not covered for rounding / truncating dates or datetimes, so I think it's OK to address it separately. I've opened a separate issue for that: truncate does not correctly broadcast scalar vs expression #15743

EDIT: marking as draft for a little: from discussion today, the some errors should be InvalidOperationError. For consistency, with truncating date/datetime, I think it's OK for the "cannot truncate by negative duration" to raise a ComputeError, maybe they can all be switched over at a later point, it'd be a breaking change

MarcoGorelli · 2024-04-20T16:59:17Z

Hey @rob-sil - are you interested in updating this to use broadcast_try_binary_elementwise and a cache, as was done in #15768 ?

No worries if not, happy to add a commit if you like, I understand this PR has been taking some time

rob-sil · 2024-04-21T20:57:36Z

Hi @MarcoGorelli, I'm interested but I think this is pushing the limits of my polars/rust knowledge.

For example, I'm a bit confused by broadcast_try_binary_elementwise and how to handle series of different lengths.

>>> import polars as pl
>>> from datetime import datetime, timedelta
>>>
>>> df = pl.DataFrame(
...     {
...         "datetime": datetime(2000, 2, 3, 12, 30),
...         "every":  ["1y", "1mo", "1d", "1h"],
...         "duration": [timedelta(i) for i in range(4)],
...     }
... )
>>> df
shape: (4, 3)
┌─────────────────────┬───────┬──────────────┐
│ datetime            ┆ every ┆ duration     │
│ ---                 ┆ ---   ┆ ---          │
│ datetime[μs]        ┆ str   ┆ duration[μs] │
╞═════════════════════╪═══════╪══════════════╡
│ 2000-02-03 12:30:00 ┆ 1y    ┆ 0µs          │
│ 2000-02-03 12:30:00 ┆ 1mo   ┆ 1d           │
│ 2000-02-03 12:30:00 ┆ 1d    ┆ 2d           │
│ 2000-02-03 12:30:00 ┆ 1h    ┆ 3d           │
└─────────────────────┴───────┴──────────────┘

If I filter "datetime" to a single value but leave all four values of "every", then truncation broadcasts and returns four values:

>>> df.select(
...     pl.col("datetime").filter(pl.col("every") == "1mo")
...         .dt.truncate(pl.col("every"))
... )
shape: (4, 1)
┌─────────────────────┐
│ datetime            │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2000-01-01 00:00:00 │
│ 2000-02-01 00:00:00 │
│ 2000-02-03 00:00:00 │
│ 2000-02-03 12:00:00 │
└─────────────────────┘

If I filter "datetime" to two values, then truncation picks the first two of "every" and gives me two values back:

>>> df.select(
...     pl.col("datetime").filter(pl.col("every").is_in(["1mo", "1d"]))
...         .dt.truncate(pl.col("every"))
... )
shape: (2, 1)
┌─────────────────────┐
│ datetime            │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2000-01-01 00:00:00 │
│ 2000-02-01 00:00:00 │
└─────────────────────┘

But if I try other operations that use two series, like addition, I would get an error.

>>> df.select(
...     pl.col("datetime").filter(pl.col("every").is_in(["1mo", "1d"]))
...         + pl.col("duration")
... )
[...]
polars.exceptions.ComputeError: cannot evaluate two Series of different lengths (2 and 4)
[...]

Is this the intended behavior?

MarcoGorelli · 2024-04-26T08:38:28Z

Hey

If I filter "datetime" to a single value but leave all four values of "every", then truncation broadcasts and returns four values:

this looks correct, but

If I filter "datetime" to two values, then truncation picks the first two of "every" and gives me two values back:

I think this isn't. It should raise - length 2 vs length 4 shouldn't silently truncate the RHS to length 2. Thanks for spotting this! I think this can be handled within broadcast_try_binary_elementwise - @reswqa is this right?

crates/polars-time/src/truncate.rs

MarcoGorelli

Looks good to me, thanks @rob-sil

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Nov 20, 2023

rob-sil marked this pull request as ready for review November 21, 2023 14:18

rob-sil requested review from ritchie46, stinodego, alexander-beedie, MarcoGorelli and orlp as code owners November 21, 2023 14:18

ritchie46 reviewed Nov 22, 2023

View reviewed changes

crates/polars-time/src/round.rs Outdated Show resolved Hide resolved

MarcoGorelli requested changes Nov 24, 2023

View reviewed changes

crates/polars-time/src/round.rs Outdated Show resolved Hide resolved

MarcoGorelli reviewed Nov 29, 2023

View reviewed changes

crates/polars-time/src/round.rs Outdated Show resolved Hide resolved

rob-sil requested a review from MarcoGorelli December 13, 2023 22:58

MarcoGorelli reviewed Dec 30, 2023

View reviewed changes

py-polars/tests/unit/namespaces/test_datetime.py Outdated Show resolved Hide resolved

MarcoGorelli reviewed Dec 30, 2023

View reviewed changes

crates/polars-time/src/truncate.rs Outdated Show resolved Hide resolved

rob-sil requested a review from c-peters as a code owner January 3, 2024 15:01

rob-sil force-pushed the round-durations branch from e675ac0 to 1bf4ac6 Compare January 3, 2024 17:43

rob-sil force-pushed the round-durations branch from 1bf4ac6 to 2de1d78 Compare February 9, 2024 02:25

rob-sil requested a review from MarcoGorelli February 9, 2024 02:48

stinodego changed the title ~~feat: round and truncate for durations~~ feat: Implement round and truncate for Duration columns Feb 14, 2024

MarcoGorelli reviewed Feb 14, 2024

View reviewed changes

rob-sil force-pushed the round-durations branch from 2de1d78 to 78b8601 Compare February 16, 2024 12:54

rob-sil requested a review from MarcoGorelli February 16, 2024 14:25

MarcoGorelli requested changes Feb 19, 2024

View reviewed changes

rob-sil force-pushed the round-durations branch from 6f604c2 to 614577a Compare March 17, 2024 18:20

rob-sil mentioned this pull request Mar 20, 2024

fix(python, rust): Block rounding/truncating to negative durations #15175

Merged

MarcoGorelli approved these changes Mar 21, 2024

View reviewed changes

MarcoGorelli requested changes Apr 13, 2024

View reviewed changes

rob-sil requested a review from reswqa as a code owner April 14, 2024 18:54

rob-sil requested a review from MarcoGorelli April 14, 2024 20:05

MarcoGorelli reviewed Apr 16, 2024

View reviewed changes

crates/polars-time/src/round.rs Outdated Show resolved Hide resolved

rob-sil requested a review from MarcoGorelli April 17, 2024 02:47

MarcoGorelli approved these changes Apr 18, 2024

View reviewed changes

MarcoGorelli marked this pull request as draft April 19, 2024 14:20

MarcoGorelli and others added 2 commits July 15, 2024 15:51

feat: implement round and truncate for Duration

584d680

noop

da2199a

MarcoGorelli force-pushed the round-durations branch from 3c3147b to da2199a Compare July 15, 2024 14:54

MarcoGorelli added 3 commits July 15, 2024 15:56

consistency

fde75e9

consistency

fc7a3c7

move tests to more appropriate location

c72d3c4

MarcoGorelli reviewed Jul 15, 2024

View reviewed changes

crates/polars-time/src/truncate.rs Outdated Show resolved Hide resolved

align formulae between datetime and duration

1554a57

MarcoGorelli marked this pull request as ready for review July 16, 2024 08:20

MarcoGorelli approved these changes Jul 16, 2024

View reviewed changes

ritchie46 force-pushed the main branch from 0a696ff to 9c29683 Compare July 28, 2024 08:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement `round` and `truncate` for Duration columns #12597

feat: Implement `round` and `truncate` for Duration columns #12597

rob-sil commented Nov 20, 2023 •

edited

Loading

MarcoGorelli left a comment

MarcoGorelli left a comment

MarcoGorelli commented Jan 3, 2024

stinodego commented Feb 14, 2024

MarcoGorelli commented Feb 14, 2024 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Feb 19, 2024 •

edited

Loading

MarcoGorelli left a comment

codecov bot commented Mar 17, 2024 •

edited

Loading

rob-sil commented Mar 17, 2024 •

edited

Loading

MarcoGorelli commented Mar 19, 2024

MarcoGorelli left a comment

rob-sil commented Mar 27, 2024

MarcoGorelli Apr 13, 2024

codspeed-hq bot commented Apr 14, 2024 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Apr 20, 2024

rob-sil commented Apr 21, 2024

MarcoGorelli commented Apr 26, 2024 •

edited

Loading

MarcoGorelli left a comment

feat: Implement round and truncate for Duration columns #12597

Are you sure you want to change the base?

feat: Implement round and truncate for Duration columns #12597

Conversation

rob-sil commented Nov 20, 2023 • edited Loading

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

MarcoGorelli commented Jan 3, 2024

stinodego commented Feb 14, 2024

MarcoGorelli commented Feb 14, 2024 • edited Loading

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

MarcoGorelli commented Feb 19, 2024 • edited Loading

MarcoGorelli left a comment

Choose a reason for hiding this comment

codecov bot commented Mar 17, 2024 • edited Loading

Codecov Report

rob-sil commented Mar 17, 2024 • edited Loading

MarcoGorelli commented Mar 19, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

rob-sil commented Mar 27, 2024

MarcoGorelli Apr 13, 2024

Choose a reason for hiding this comment

codspeed-hq bot commented Apr 14, 2024 • edited Loading

CodSpeed Performance Report

Merging #12597 will not alter performance

Summary

MarcoGorelli left a comment • edited Loading

Choose a reason for hiding this comment

MarcoGorelli commented Apr 20, 2024

rob-sil commented Apr 21, 2024

MarcoGorelli commented Apr 26, 2024 • edited Loading

MarcoGorelli left a comment

Choose a reason for hiding this comment

feat: Implement `round` and `truncate` for Duration columns #12597

feat: Implement `round` and `truncate` for Duration columns #12597

rob-sil commented Nov 20, 2023 •

edited

Loading

MarcoGorelli commented Feb 14, 2024 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Feb 19, 2024 •

edited

Loading

codecov bot commented Mar 17, 2024 •

edited

Loading

rob-sil commented Mar 17, 2024 •

edited

Loading

codspeed-hq bot commented Apr 14, 2024 •

edited

Loading

MarcoGorelli left a comment •

edited

Loading

MarcoGorelli commented Apr 26, 2024 •

edited

Loading