-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Business date/calendar logic #5713
Comments
Related issue, but specific for upsampling: #5516. |
+1 to have these features. Also to note that this answer in this stackoverflow answer does not seem to do the right thing either for import polars as pl
import numpy as np
df = pl.DataFrame(
{
"Day1": [
"2022-01-02",
"2022-01-03",
"2022-01-04",
],
"Day2": [
"2022-01-03",
"2022-01-04",
"2022-01-05",
],
}
).with_columns(pl.col(["Day1", "Day2"]).str.strptime(pl.Date, "%Y-%m-%d"))
print(
(
df.with_columns(
pl.struct([pl.col("Day1"), pl.col("Day2")])
.map(
lambda x: np.busday_count(
x.struct["Day1"], x.struct["Day2"], weekmask="1110000"
)
)
.alias("Result")
)
)
)
And passing the expressions directly into the numpy functions (e.g. df.select(np.busday_count(pl.col("Day1"), pl.col("Day2"), weekmask="1110000"))
|
The error you are seeing is a Numpy limitation, Also, right now we do not support ufunc's with more than one expression. The recommendation is to use >>> df.select(pl.reduce(lambda dt1, dt2: np.busday_count(dt1, dt2, weekmask="1110000"), [pl.col('Day1'), pl.col('Day2')]))
shape: (1, 1)
┌───────────┐
│ Day1 │
│ --- │
│ list[i64] │
╞═══════════╡
│ [0, 1, 1] │
└───────────┘ Wrapping in a >>> df.select(pl.reduce(lambda dt1, dt2: pl.Series(np.busday_count(dt1, dt2, weekmask="1110000")), [pl.col('Day1'), pl.col('Day2')]))
shape: (3, 1)
┌──────┐
│ Day1 │
│ --- │
│ i64 │
╞══════╡
│ 0 │
│ 1 │
│ 1 │
└──────┘ |
We use the pl.reduce trick basically, where the limitation is that non-expressions have to be passed in as kwargs. That is probably the safest anyway. Related issues: pola-rs#6770 : brought up no support for multiple expression, have added a ValueError in response pola-rs#5713 : reminder that there is the `pl.reduce` trick
@zundertj Thanks for the suggestion! That's a good workaround until there is native support for business date operations. |
I would also expect some support with aliases In |
Starting with adding Anyone here interested in giving this one a go? EDIT: this may require more discussion #11568 (comment) |
taking this forwards in https://github.com/MarcoGorelli/polars-business , let's take requests and ideas over there |
@erinov1 just FYI, all the requests from your example are now available: import polars as pl
import polars_business as plb
from datetime import date
weekend = ["Sat", "Sun"]
holidays = [date(2000, 1, 1)]
df = pl.DataFrame(
{
"start_date": [date(2000, 3, 1), date(2000, 4, 3)],
"end_date": [date(2000, 3, 3), date(2000, 4, 19)],
}
)
print(
df.with_columns(
start_plus_3bd=plb.col("start_date").bdt.offset_by(
"3bd", weekend=weekend, holidays=holidays
),
start_is_workday=plb.col("start_date").bdt.is_workday(
weekend=weekend, holidays=holidays
),
workday_count=plb.workday_count(
"start_date",
"end_date",
weekend=weekend,
holidays=holidays,
),
)
)
|
Problem description
It would nice to natively support certain business date operations. In particular, I envision polars expression counterparts versions of the vectorized numpy busday_offset, is_busday, busday_count functions. The numpy versions of these functions accept
weekmask
andholidays
arguments (or abusdaycalendar
object, which just stores a weekmask and list of holidays).Given a dataframe with a date column, you could do something like
It would also be nice to have a
bdate_range
function/expression so you could get all business dates between two columns byThe text was updated successfully, but these errors were encountered: