-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Window closed-ness defaults are inconsistent for temporal rolling functions #9193
Comments
In finance you would typically want
Pandas.rolling has I wasn't aware there are inconsistencies in the defaults, we should definitely fix that. Happy to help. @stinodego : I have tried to search for an issue on this, as I remember discussing this, but can't find it. Is your point that |
If I look at these examples, closed=right indeed makes for the most sensible default. I'd support making that consistent across methods. Unless there is something I am missing here... |
Grepping for our type annotation Rolling:
I can create a PR for where we currently default to Ranges:
For ranges, both is intuitive imo. |
Great, thanks! Groupby_dynamic is ok to be closed on the left IMO as it does something different, where left is more natural (e.g. resampling by day, you expect 2020-01-01T00:00 and 2020-01-01T00:01 to be part of the same group by default) But the rolling functions all do the same kind of thing, so I'd make those consistent So, I'd suggest only changing the default for the expr rolling functions |
The behaviour and docstring of
this basically says: we use "left" when using a by-column, and "right" if no by column. The odd thing is,closed=left or right does not seem to make a difference at all, to me this is "right" behaviour only: >>> pl.Series([1,2,3,4]).to_frame().select(pl.all().rolling_sum(2, closed="left"))
shape: (4, 1)
┌──────┐
│ │
│ --- │
│ i64 │
╞══════╡
│ null │
│ 3 │
│ 5 │
│ 7 │
└──────┘
>>> pl.Series([1,2,3,4]).to_frame().select(pl.all().rolling_sum(2, closed="right"))
shape: (4, 1)
┌──────┐
│ │
│ --- │
│ i64 │
╞══════╡
│ null │
│ 3 │
│ 5 │
│ 7 │
└──────┘ Same for both and none values, and for other functions. Am I missing something here, this should be different, per the docstring? |
Work in progress here: #9215. Not touching the |
Perhaps the docstring could be clarified, but closed only takes effect if you have a |
Ok, so that is to accommodate similar behaviour as
So do we want to change anything at all then, apart from docstrings? Maybe we should even raise an error if |
Nice! ⭐ I don't know what the original reasoning was, but I think it makes sense - if you say |
I have updated #9215 to purely update the docstrings. Reason being that when |
Sure - is the idea to then follow up with a PR to warn that the future default for |
Per ^, I don't know if we want to change at all? As you said |
I'd say that for These are both already the case, all that I'm suggesting is that It'd be natural to expect
Currently however, they're not - unless, that is, you change In [128]: df.groupby_rolling('ts', period='3d', closed='left').agg(pl.col('a').mean())
Out[128]:
shape: (5, 2)
┌─────────────────────┬──────┐
│ ts ┆ a │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪══════╡
│ 2020-01-01 00:00:00 ┆ null │
│ 2020-01-02 00:00:00 ┆ 1.0 │
│ 2020-01-03 00:00:00 ┆ 1.5 │
│ 2020-01-04 00:00:00 ┆ 2.0 │
│ 2020-01-05 00:00:00 ┆ 3.0 │
└─────────────────────┴──────┘
In [129]: df.with_columns(pl.col('a').rolling_mean('3d', by='ts'))
Out[129]:
shape: (5, 2)
┌─────────────────────┬──────┐
│ ts ┆ a │
│ --- ┆ --- │
│ datetime[μs] ┆ f64 │
╞═════════════════════╪══════╡
│ 2020-01-01 00:00:00 ┆ null │
│ 2020-01-02 00:00:00 ┆ 1.0 │
│ 2020-01-03 00:00:00 ┆ 1.5 │
│ 2020-01-04 00:00:00 ┆ 2.0 │
│ 2020-01-05 00:00:00 ┆ 3.0 │
└─────────────────────┴──────┘ |
I am not an expert on the ergonomics of time series, but I just want to say we shouldn't use the same closedness for the sake of consistency between different functions per se. Especially I believe, I chose the closedness at the time by looking what other tools did or what I found made sense. Let's use this opportunity to determine what makes sense for them in the following groups:
and then document that, so we later understand our own rationale and have a place where we can follow the logic. |
Yeah agree - here's what I'd suggest:
If this sounds sensible, then I'd back the original approach by @zundertj in #9215 |
Yep, that is updating the docstrings in #9215 indeed. |
I will pick that up for the |
legend 🙌 |
See #9470 |
This has been addressed, it just needs enforcing - but that'll happen as part of the usual version upgrade procedure closing then, thanks all |
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
Noticed this while trying to write some docs in #9192
Seems like the defaults are:
closed on the right:
groupby_rolling
closed on the left:
rolling_mean
,rolling_sum
,rolling_min
,rolling_max
,rolling_std
,rolling_var
Is there a reason which I'm missing for this? If not, any objections on consistently setting the default to be
closed='right'
?Reproducible example
Expected behavior
I think,
Installed versions
The text was updated successfully, but these errors were encountered: