Business date/calendar logic #5713

erinov1 · 2022-12-03T16:46:49Z

Problem description

It would nice to natively support certain business date operations. In particular, I envision polars expression counterparts versions of the vectorized numpy busday_offset, is_busday, busday_count functions. The numpy versions of these functions accept weekmask and holidays arguments (or a busdaycalendar object, which just stores a weekmask and list of holidays).

Given a dataframe with a date column, you could do something like

weekmask = "1111100"
holidays = [datetime.date(2000, 1, 1)]

df.with_columns(
    [
        pl("date").busday_offset(3, weekmask=weekmask, holidays=holidays),
        pl("date").is_busday(weekmask=weekmask, holidays=holidays),
        pl.busday_count(
            pl.col("date"), 
            datetime.date(2000, 1, 2), 
            weekmask=weekmask, 
            holidays=holidays
        )
    ]
)

It would also be nice to have a bdate_range function/expression so you could get all business dates between two columns by

df.select(
    pl.bdate_range(pl.col("start_date"), pl.col("end_date"), weekmask=weekmask, holidays=holidays)
)

The text was updated successfully, but these errors were encountered:

zundertj · 2022-12-23T11:40:44Z

Related issue, but specific for upsampling: #5516.

wangkev · 2023-04-01T01:16:33Z

+1 to have these features.

Also to note that this answer in this stackoverflow answer does not seem to do the right thing either for np.busday_count.

import polars as pl
import numpy as np

df = pl.DataFrame(
    {
        "Day1": [
            "2022-01-02",
            "2022-01-03",
            "2022-01-04",
        ],
        "Day2": [
            "2022-01-03",
            "2022-01-04",
            "2022-01-05",
        ],
    }
).with_columns(pl.col(["Day1", "Day2"]).str.strptime(pl.Date, "%Y-%m-%d"))

print(
    (
        df.with_columns(
            pl.struct([pl.col("Day1"), pl.col("Day2")])
            .map(
                lambda x: np.busday_count(
                    x.struct["Day1"], x.struct["Day2"], weekmask="1110000"
                )
            )
            .alias("Result")
        )
    )
)

shape: (6, 5)
┌────────┬──────────┬────────────┬────────────┬─────────────┐
│ Market ┆ Service  ┆ Day1       ┆ Day2       ┆ Result      │
│ ---    ┆ ---      ┆ ---        ┆ ---        ┆ ---         │
│ str    ┆ str      ┆ date       ┆ date       ┆ list[i64]   │
╞════════╪══════════╪════════════╪════════════╪═════════════╡
│ AT     ┆ Standard ┆ 2022-01-02 ┆ 2022-01-03 ┆ [0, 1, … 0] │
│ DE     ┆ Express  ┆ 2022-01-03 ┆ 2022-01-04 ┆ [0, 1, … 0] │
│ AT     ┆ Standard ┆ 2022-01-04 ┆ 2022-01-05 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-05 ┆ 2022-01-06 ┆ [0, 1, … 0] │
│ GB     ┆ Standard ┆ 2022-01-06 ┆ 2022-01-07 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-07 ┆ 2022-01-08 ┆ [0, 1, … 0] │
└────────┴──────────┴────────────┴────────────┴─────────────┘

And passing the expressions directly into the numpy functions (e.g. np.busday_count) does not work either.

df.select(np.busday_count(pl.col("Day1"), pl.col("Day2"), weekmask="1110000"))

ValueError: Could not convert object to NumPy datetime

zundertj · 2023-04-01T07:31:02Z

The error you are seeing is a Numpy limitation, busday_count is not a ufunc, see the ufunc list of functions.

Also, right now we do not support ufunc's with more than one expression. The recommendation is to use pl.reduce instead, although the naive way generates a single row list[i64] dataframe (despite the docs claiming it works horizontally):

>>> df.select(pl.reduce(lambda dt1, dt2: np.busday_count(dt1, dt2, weekmask="1110000"), [pl.col('Day1'), pl.col('Day2')]))
shape: (1, 1)
┌───────────┐
│ Day1      │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [0, 1, 1] │
└───────────┘

Wrapping in a Series solves this:

>>> df.select(pl.reduce(lambda dt1, dt2: pl.Series(np.busday_count(dt1, dt2, weekmask="1110000")), [pl.col('Day1'), pl.col('Day2')]))
shape: (3, 1)
┌──────┐
│ Day1 │
│ ---  │
│ i64  │
╞══════╡
│ 0    │
│ 1    │
│ 1    │
└──────┘

We use the pl.reduce trick basically, where the limitation is that non-expressions have to be passed in as kwargs. That is probably the safest anyway. Related issues: pola-rs#6770 : brought up no support for multiple expression, have added a ValueError in response pola-rs#5713 : reminder that there is the `pl.reduce` trick

wangkev · 2023-04-01T11:50:33Z

@zundertj Thanks for the suggestion! That's a good workaround until there is native support for business date operations.

ion-elgreco · 2023-07-17T18:44:52Z

I would also expect some support with aliases

In Expr.dt.offset_by we could then use 1b for 1 business day offset or 1bh for a business hour. Such as in pandas. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

MarcoGorelli · 2023-09-10T11:58:06Z

Starting with adding '1bd' and '1bh' as offset aliases sounds good! weekmask and holidays can come after

Anyone here interested in giving this one a go?

EDIT: this may require more discussion #11568 (comment)

MarcoGorelli · 2023-10-21T18:46:24Z

taking this forwards in https://github.com/MarcoGorelli/polars-business , let's take requests and ideas over there

MarcoGorelli · 2023-10-28T20:07:17Z

@erinov1 just FYI, all the requests from your example are now available:

import polars as pl
import polars_business as plb
from datetime import date

weekend = ["Sat", "Sun"]
holidays = [date(2000, 1, 1)]

df = pl.DataFrame(
    {
        "start_date": [date(2000, 3, 1), date(2000, 4, 3)],
        "end_date": [date(2000, 3, 3), date(2000, 4, 19)],
    }
)

print(
    df.with_columns(
        start_plus_3bd=plb.col("start_date").bdt.offset_by(
            "3bd", weekend=weekend, holidays=holidays
        ),
        start_is_workday=plb.col("start_date").bdt.is_workday(
            weekend=weekend, holidays=holidays
        ),
        workday_count=plb.workday_count(
            "start_date",
            "end_date",
            weekend=weekend,
            holidays=holidays,
        ),
    )
)

shape: (2, 5)
┌────────────┬────────────┬────────────────┬──────────────────┬───────────────┐
│ start_date ┆ end_date   ┆ start_plus_3bd ┆ start_is_workday ┆ workday_count │
│ ---        ┆ ---        ┆ ---            ┆ ---              ┆ ---           │
│ date       ┆ date       ┆ date           ┆ bool             ┆ i32           │
╞════════════╪════════════╪════════════════╪══════════════════╪═══════════════╡
│ 2000-03-01 ┆ 2000-03-03 ┆ 2000-03-06     ┆ true             ┆ 2             │
│ 2000-04-03 ┆ 2000-04-19 ┆ 2000-04-06     ┆ true             ┆ 12            │
└────────────┴────────────┴────────────────┴──────────────────┴───────────────┘

erinov1 added the enhancement New feature or an improvement of an existing feature label Dec 3, 2022

zundertj mentioned this issue Apr 1, 2023

feat(python): Support Numpy ufunc with more than one expression #7924

Merged

MarcoGorelli added the A-temporal Area: date/time functionality label Sep 10, 2023

MarcoGorelli added the accepted Ready for implementation label Sep 10, 2023

github-project-automation bot added this to Backlog Sep 10, 2023

github-project-automation bot moved this to Ready in Backlog Sep 10, 2023

MarcoGorelli mentioned this issue Oct 6, 2023

Tracking issue: business day logic #11568

Closed

4 tasks

MarcoGorelli removed the accepted Ready for implementation label Oct 6, 2023

MarcoGorelli closed this as completed Oct 21, 2023

github-project-automation bot moved this from Ready to Done in Backlog Oct 21, 2023

JustinKurland mentioned this issue Oct 21, 2023

date_range with custom weekend / holidays pola-rs/polars-xdt#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Business date/calendar logic #5713

Business date/calendar logic #5713

erinov1 commented Dec 3, 2022

zundertj commented Dec 23, 2022

wangkev commented Apr 1, 2023 •

edited

Loading

zundertj commented Apr 1, 2023 •

edited

Loading

wangkev commented Apr 1, 2023

ion-elgreco commented Jul 17, 2023

MarcoGorelli commented Sep 10, 2023 •

edited

Loading

MarcoGorelli commented Oct 21, 2023

MarcoGorelli commented Oct 28, 2023

Business date/calendar logic #5713

Business date/calendar logic #5713

Comments

erinov1 commented Dec 3, 2022

Problem description

zundertj commented Dec 23, 2022

wangkev commented Apr 1, 2023 • edited Loading

zundertj commented Apr 1, 2023 • edited Loading

wangkev commented Apr 1, 2023

ion-elgreco commented Jul 17, 2023

MarcoGorelli commented Sep 10, 2023 • edited Loading

MarcoGorelli commented Oct 21, 2023

MarcoGorelli commented Oct 28, 2023

wangkev commented Apr 1, 2023 •

edited

Loading

zundertj commented Apr 1, 2023 •

edited

Loading

MarcoGorelli commented Sep 10, 2023 •

edited

Loading