Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Business date/calendar logic #5713

Closed
erinov1 opened this issue Dec 3, 2022 · 8 comments
Closed

Business date/calendar logic #5713

erinov1 opened this issue Dec 3, 2022 · 8 comments
Labels
A-temporal Area: date/time functionality enhancement New feature or an improvement of an existing feature

Comments

@erinov1
Copy link

erinov1 commented Dec 3, 2022

Problem description

It would nice to natively support certain business date operations. In particular, I envision polars expression counterparts versions of the vectorized numpy busday_offset, is_busday, busday_count functions. The numpy versions of these functions accept weekmask and holidays arguments (or a busdaycalendar object, which just stores a weekmask and list of holidays).

Given a dataframe with a date column, you could do something like

weekmask = "1111100"
holidays = [datetime.date(2000, 1, 1)]

df.with_columns(
    [
        pl("date").busday_offset(3, weekmask=weekmask, holidays=holidays),
        pl("date").is_busday(weekmask=weekmask, holidays=holidays),
        pl.busday_count(
            pl.col("date"), 
            datetime.date(2000, 1, 2), 
            weekmask=weekmask, 
            holidays=holidays
        )
    ]
)

It would also be nice to have a bdate_range function/expression so you could get all business dates between two columns by

df.select(
    pl.bdate_range(pl.col("start_date"), pl.col("end_date"), weekmask=weekmask, holidays=holidays)
)
@erinov1 erinov1 added the enhancement New feature or an improvement of an existing feature label Dec 3, 2022
@zundertj
Copy link
Collaborator

Related issue, but specific for upsampling: #5516.

@wangkev
Copy link

wangkev commented Apr 1, 2023

+1 to have these features.

Also to note that this answer in this stackoverflow answer does not seem to do the right thing either for np.busday_count.

import polars as pl
import numpy as np

df = pl.DataFrame(
    {
        "Day1": [
            "2022-01-02",
            "2022-01-03",
            "2022-01-04",
        ],
        "Day2": [
            "2022-01-03",
            "2022-01-04",
            "2022-01-05",
        ],
    }
).with_columns(pl.col(["Day1", "Day2"]).str.strptime(pl.Date, "%Y-%m-%d"))

print(
    (
        df.with_columns(
            pl.struct([pl.col("Day1"), pl.col("Day2")])
            .map(
                lambda x: np.busday_count(
                    x.struct["Day1"], x.struct["Day2"], weekmask="1110000"
                )
            )
            .alias("Result")
        )
    )
)
shape: (6, 5)
┌────────┬──────────┬────────────┬────────────┬─────────────┐
│ Market ┆ Service  ┆ Day1       ┆ Day2       ┆ Result      │
│ ---    ┆ ---      ┆ ---        ┆ ---        ┆ ---         │
│ str    ┆ str      ┆ date       ┆ date       ┆ list[i64]   │
╞════════╪══════════╪════════════╪════════════╪═════════════╡
│ AT     ┆ Standard ┆ 2022-01-02 ┆ 2022-01-03 ┆ [0, 1, … 0] │
│ DE     ┆ Express  ┆ 2022-01-03 ┆ 2022-01-04 ┆ [0, 1, … 0] │
│ AT     ┆ Standard ┆ 2022-01-04 ┆ 2022-01-05 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-05 ┆ 2022-01-06 ┆ [0, 1, … 0] │
│ GB     ┆ Standard ┆ 2022-01-06 ┆ 2022-01-07 ┆ [0, 1, … 0] │
│ CZ     ┆ Standard ┆ 2022-01-07 ┆ 2022-01-08 ┆ [0, 1, … 0] │
└────────┴──────────┴────────────┴────────────┴─────────────┘

And passing the expressions directly into the numpy functions (e.g. np.busday_count) does not work either.

df.select(np.busday_count(pl.col("Day1"), pl.col("Day2"), weekmask="1110000"))

ValueError: Could not convert object to NumPy datetime

@zundertj
Copy link
Collaborator

zundertj commented Apr 1, 2023

The error you are seeing is a Numpy limitation, busday_count is not a ufunc, see the ufunc list of functions.

Also, right now we do not support ufunc's with more than one expression. The recommendation is to use pl.reduce instead, although the naive way generates a single row list[i64] dataframe (despite the docs claiming it works horizontally):

>>> df.select(pl.reduce(lambda dt1, dt2: np.busday_count(dt1, dt2, weekmask="1110000"), [pl.col('Day1'), pl.col('Day2')]))
shape: (1, 1)
┌───────────┐
│ Day1      │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [0, 1, 1] │
└───────────┘

Wrapping in a Series solves this:

>>> df.select(pl.reduce(lambda dt1, dt2: pl.Series(np.busday_count(dt1, dt2, weekmask="1110000")), [pl.col('Day1'), pl.col('Day2')]))
shape: (3, 1)
┌──────┐
│ Day1 │
│ ---  │
│ i64  │
╞══════╡
│ 0    │
│ 1    │
│ 1    │
└──────┘

zundertj added a commit to zundertj/polars that referenced this issue Apr 1, 2023
We use the pl.reduce trick basically, where the limitation is that non-expressions have to be passed in as kwargs. That is probably the safest anyway.

Related issues:
pola-rs#6770 : brought up no support for multiple expression, have added a ValueError in response

pola-rs#5713 : reminder that there is the `pl.reduce` trick
@wangkev
Copy link

wangkev commented Apr 1, 2023

@zundertj Thanks for the suggestion! That's a good workaround until there is native support for business date operations.

@ion-elgreco
Copy link
Contributor

I would also expect some support with aliases

In Expr.dt.offset_by we could then use 1b for 1 business day offset or 1bh for a business hour. Such as in pandas. https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases

@MarcoGorelli MarcoGorelli added the A-temporal Area: date/time functionality label Sep 10, 2023
@MarcoGorelli
Copy link
Collaborator

MarcoGorelli commented Sep 10, 2023

Starting with adding '1bd' and '1bh' as offset aliases sounds good! weekmask and holidays can come after

Anyone here interested in giving this one a go?

EDIT: this may require more discussion #11568 (comment)

@MarcoGorelli MarcoGorelli added the accepted Ready for implementation label Sep 10, 2023
@github-project-automation github-project-automation bot moved this to Ready in Backlog Sep 10, 2023
@MarcoGorelli MarcoGorelli removed the accepted Ready for implementation label Oct 6, 2023
@MarcoGorelli
Copy link
Collaborator

taking this forwards in https://github.com/MarcoGorelli/polars-business , let's take requests and ideas over there

@MarcoGorelli
Copy link
Collaborator

@erinov1 just FYI, all the requests from your example are now available:

import polars as pl
import polars_business as plb
from datetime import date

weekend = ["Sat", "Sun"]
holidays = [date(2000, 1, 1)]

df = pl.DataFrame(
    {
        "start_date": [date(2000, 3, 1), date(2000, 4, 3)],
        "end_date": [date(2000, 3, 3), date(2000, 4, 19)],
    }
)

print(
    df.with_columns(
        start_plus_3bd=plb.col("start_date").bdt.offset_by(
            "3bd", weekend=weekend, holidays=holidays
        ),
        start_is_workday=plb.col("start_date").bdt.is_workday(
            weekend=weekend, holidays=holidays
        ),
        workday_count=plb.workday_count(
            "start_date",
            "end_date",
            weekend=weekend,
            holidays=holidays,
        ),
    )
)
shape: (2, 5)
┌────────────┬────────────┬────────────────┬──────────────────┬───────────────┐
│ start_date ┆ end_date   ┆ start_plus_3bd ┆ start_is_workday ┆ workday_count │
│ ---        ┆ ---        ┆ ---            ┆ ---              ┆ ---           │
│ date       ┆ date       ┆ date           ┆ bool             ┆ i32           │
╞════════════╪════════════╪════════════════╪══════════════════╪═══════════════╡
│ 2000-03-01 ┆ 2000-03-03 ┆ 2000-03-06     ┆ true             ┆ 2             │
│ 2000-04-03 ┆ 2000-04-19 ┆ 2000-04-06     ┆ true             ┆ 12            │
└────────────┴────────────┴────────────────┴──────────────────┴───────────────┘

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-temporal Area: date/time functionality enhancement New feature or an improvement of an existing feature
Projects
Archived in project
Development

No branches or pull requests

5 participants