Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved business day support #20884

Open
MarcoGorelli opened this issue Jan 24, 2025 · 5 comments
Open

Improved business day support #20884

MarcoGorelli opened this issue Jan 24, 2025 · 5 comments
Labels
A-api Area: changes to the public API A-temporal Area: date/time functionality

Comments

@MarcoGorelli
Copy link
Collaborator

This comes up repeatedly when I teach Polars. add_business_days / business_day_count are a start, but people really want to be able to use business days in group_by_dynamic / rolling. Concrete requests people have brought up are:

  • Expr.is_business_day
  • business days in rolling_mean_by, such as "rolling mean over the last 3 business days"
  • business days in group_by_dynamic

Implementation-wise, I don't think this would be terrible - add_business_day could be added alongside

#[doc(hidden)]
fn add_month(ts: NaiveDateTime, n_months: i64, negative: bool) -> NaiveDateTime {
let mut months = n_months;
if negative {
months = -months;
}
// Retrieve the current date and increment the values
// based on the number of months
let mut year = ts.year();
let mut month = ts.month() as i32;
let mut day = ts.day();
year += (months / 12) as i32;
month += (months % 12) as i32;
// if the month overflowed or underflowed, adjust the year
// accordingly. Because we add the modulo for the months
// the year will only adjust by one
if month > 12 {
year += 1;
month -= 12;
} else if month <= 0 {
year -= 1;
month += 12;
}
// Normalize the day if we are past the end of the month.
let last_day_of_month =
DAYS_PER_MONTH[is_leap_year(year) as usize][(month - 1) as usize] as u32;
if day > last_day_of_month {
day = last_day_of_month
}
// Retrieve the original time and construct a data
// with the new year, month and day
let hour = ts.hour();
let minute = ts.minute();
let sec = ts.second();
let nsec = ts.nanosecond();
new_datetime(year, month as u32, day, hour, minute, sec, nsec).expect(
"Expected valid datetime, please open an issue at https://github.com/pola-rs/polars/issues"
)
}
, and 'bd' could be accepted into the Polars string language.

The main difficulty I see is, API-wise, how to specify custom weekday masks and holiday calendars. In Expr.dt.add_business_days, this is solved by having week_mask and holidays arguments. However, adding week_mask and holidays arguments to every temporal function which handles intervals (rolling_*_by, offset_by, group_by_dynamic, rolling, upsample, ...) seems infeasible - I don't think it's warranted to expand all these functions' APIs for functionality which sure is useful but only perhaps to 1% of users.

Would it work to instead have week_mask and holidays be configurable with pl.Config? So people could do something like

with pl.Config() as cfg:
    cfg.set_week_mask([True, True, True, True, False, False, True])  # Islamic calendar
    df = df.with_columns(pl.col('price').rolling_mean_by('date', '3bd').over('asset'))
@MarcoGorelli MarcoGorelli added A-temporal Area: date/time functionality A-api Area: changes to the public API labels Jan 24, 2025
@orlp
Copy link
Collaborator

orlp commented Jan 24, 2025

I'm not a fan of the global config whatsoever. We need to find a different solution.

The cleanest is probably by not (only) using '3bd' but instead doing something like (syntax completely up for debate):

bd = BusinessDays(week_mask = [True, True, True, True, False, False, True])
df = df.with_columns(pl.col('price').rolling_mean_by('date', bd(3)).over('asset'))

Then any temporal function that accepts business days we change

window_size: timedelta | str

to

window_size: timedelta | str | BusinessDays

@MarcoGorelli
Copy link
Collaborator Author

From discussion:

  • 👍 for BusinessDays class
  • initially at least, it should not be possible to mix business-day aware durations with non-business ones. E.g. you can't offset by 3 business days and 2 hours. If anyone has a use case for that, please do post a comment 🙏

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 27, 2025

  • initially at least, it should not be possible to mix business-day aware durations with non-business ones. E.g. you can't offset by 3 business days and 2 hours. If anyone has a use case for that, please do post a comment 🙏

Example where you would like to mix time and business days:

  • "Market open on the next business day" (where this is going to be a different number of hours depending on the exchange). Can do it in two parts (first offset by business day, then offset by time), but it's a case where you might want to mix both in one expression.

@MarcoGorelli
Copy link
Collaborator Author

thanks @alexander-beedie ! in this case, would you have a table with the the market opening hour varies depending on the day? in which case, would a join_asof with such a table work as a solution?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 27, 2025

thanks @alexander-beedie ! in this case, would you have a table with the the market opening hour varies depending on the day? in which case, would a join_asof with such a table work as a solution?

If normalised, would more likely be an inner join against such a table (as you'd expect the answer to be exact), but it might also be something supplied from an API running outside of Polars, hence wanting to mix & match at the expression level (eg: "next business day according to [externally supplied calendar] at [externally supplied offset]").

Definitely workarounds available of course ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-api Area: changes to the public API A-temporal Area: date/time functionality
Projects
None yet
Development

No branches or pull requests

3 participants