-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling groupby should not maintain the by column in the resulting DataFrame #14013
Comments
A little note while digging through more code:
The |
I can fix the issue if I set the group selection:
I think we need this function at the start of Seems similar to #12839 |
This is defined behavior; in, that it is identical to
you can look back at the issues, IIRC @jorisvandenbossche and I had a long conversation about this. |
Hmm:
In addition to |
on reread this should be consistent - so marking as a bug |
A similar thing happens with index columns. from pandas import DataFrame, Timestamp
c = pandas.DataFrame({u'ul_payload': {('a', Timestamp('2016-11-01 06:15:00')): 5, ('a', Timestamp('2016-11-01 07:45:00')): 8, ('a', Timestamp('2016-11-01 09:00:00')): 9, ('a', Timestamp('2016-11-01 07:15:00')): 6, ('a', Timestamp('2016-11-01 07:30:00')): 7, ('a', Timestamp('2016-11-01 06:00:00')): 4}, u'dl_payload': {('a', Timestamp('2016-11-01 06:15:00')): 15, ('a', Timestamp('2016-11-01 07:45:00')): 18, ('a', Timestamp('2016-11-01 09:00:00')): 19, ('a', Timestamp('2016-11-01 07:15:00')): 16, ('a', Timestamp('2016-11-01 07:30:00')): 17, ('a', Timestamp('2016-11-01 06:00:00')): 14}})
In [27]: c
Out[27]:
dl_payload ul_payload
a 2016-11-01 06:00:00 14 4
2016-11-01 06:15:00 15 5
2016-11-01 07:15:00 16 6
2016-11-01 07:30:00 17 7
2016-11-01 07:45:00 18 8
2016-11-01 09:00:00 19 9
In [29]: c.groupby(level=0).rolling(window=3).agg(np.sum)
Out[29]:
dl_payload ul_payload
a a 2016-11-01 06:00:00 NaN NaN
2016-11-01 06:15:00 NaN NaN
2016-11-01 07:15:00 45.0 15.0
2016-11-01 07:30:00 48.0 18.0
2016-11-01 07:45:00 51.0 21.0
2016-11-01 09:00:00 54.0 24.0 But not with In [48]: c.groupby(level=0, group_keys=False).rolling(window=3).agg(np.sum)
Out[48]:
dl_payload ul_payload
a 2016-11-01 06:00:00 NaN NaN
2016-11-01 06:15:00 NaN NaN
2016-11-01 07:15:00 45.0 15.0
2016-11-01 07:30:00 48.0 18.0
2016-11-01 07:45:00 51.0 21.0
2016-11-01 09:00:00 54.0 24.0 |
Why is the issue closed? The problem is still there (pandas 0.24.2). |
this is closed in 0.25 coming soon |
Still the same problem in 0.25. Workaround: |
The problem still exists in v1.0.1 |
Still an issue in v2. So for future travellers...
df["RollingMean"] = df.groupby(["A", "B"]).rolling(2).Value.mean().reset_index(level=[0, 1], drop=True)
# OR, if your data is already sorted by A, B
df["RollingMean"] = df.groupby(["A", "B"]).rolling(2).Value.mean().tolist() |
Still an issue in v2.2.1: df = pd.DataFrame(data=range(12), index=pd.MultiIndex.from_product(
[['one', 'two', 'three'], ['a', 'b', 'c', 'd']]), columns=['vals'])
df
Out[108]:
vals
one a 0
b 1
c 2
d 3
two a 4
b 5
c 6
d 7
three a 8
b 9
c 10
d 11 regular df.groupby(level=0).sum()
Out[103]:
vals
one 6
three 38
two 22 with df.groupby(level=0).rolling(3).sum()
Out[104]:
vals
one one a NaN
b NaN
c 3.0
d 6.0
three three a NaN
b NaN
c 27.0
d 30.0
two two a NaN
b NaN
c 15.0
d 18.0 Is there a fix planned for this issue? From the discussion above it's not clear if this is intended behavior or a bug? If it's intended, can someone explain why? df.groupby(level=0).rolling(3).sum().droplevel(0)
Out[110]:
vals
one a NaN
b NaN
c 3.0
d 6.0
three a NaN
b NaN
c 27.0
d 30.0
two a NaN
b NaN
c 15.0
d 18.0 but without having to do |
I found another oddity while digging through #13966.
Begin with the initial DataFrame in that issue:
Save the grouping:
Compute the rolling sum:
It maintains the
by
column (A
)! That column should not be in the resulting DataFrame.It gets weirder if I compute the sum over the entire grouping and then re-do the rolling calculation. Now
by
column is gone as expected:So the grouping summation has some sort of side effect.
The text was updated successfully, but these errors were encountered: