Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError with df.resample(how="median") #1688

Closed
eloraburns opened this issue Jul 27, 2012 · 3 comments
Closed

AssertionError with df.resample(how="median") #1688

eloraburns opened this issue Jul 27, 2012 · 3 comments
Labels
Milestone

Comments

@eloraburns
Copy link
Contributor

I've reproduced something using how="median", perhaps #1648. It seems to hit when there are discontinuities in the resampling (i.e. minutes with no records when downsampling).

Both pandas 0.8.1 and 0.8.2.dev-f5a74d4 don't like it:

import pandas as pd
from datetime import datetime
df = pd.DataFrame([1, 2], index=[datetime(2012,1,1,0,0,0), datetime(2012,1,1,0,5,0)])
df.resample("T", how="median")
# Throws AssertionError
@eloraburns
Copy link
Contributor Author

how="mean" works:

In [5]: df.resample("T", how="mean")
Out[5]: 
                      0
2012-01-01 00:00:00   1
2012-01-01 00:01:00 NaN
2012-01-01 00:02:00 NaN
2012-01-01 00:03:00 NaN
2012-01-01 00:04:00 NaN
2012-01-01 00:05:00   2

I have no idea what numpy is up to, though:

In [6]: import numpy as np

In [7]: df.resample("T", how=lambda x: np.percentile(x, 50))
Out[7]: 
Empty DataFrame
Columns: array([], dtype=int64)
Index: array([], dtype=object)

I can't seem to make it misbehave here, but with some Actual Data with data absent for several of the resampled intervals, I can get numpy to explode with the same stack trace as:

np.percentile([], 50)
# ...
# ValueError: operands could not be broadcast together with shapes (0) (2) 

I can certainly work around that in the lambda by short-circuiting on an empty list-like thing being passed in, but I don't think I should have to.

For this particular case, I can just do this with expected results:

In [8]: df.resample("T", how=np.median)
Out[8]: 
                      0
2012-01-01 00:00:00   1
2012-01-01 00:01:00 NaN
2012-01-01 00:02:00 NaN
2012-01-01 00:03:00 NaN
2012-01-01 00:04:00 NaN
2012-01-01 00:05:00   2

I can't fathom why np.median(…) is different from np.percentile(…, 50). Maybe that's just a numpy thing? :)

Thanks!

@wesm wesm closed this as completed in 2279c92 Aug 11, 2012
@wesm
Copy link
Member

wesm commented Aug 11, 2012

thanks for the test case. fixed the underlying cause

@eloraburns
Copy link
Contributor Author

Awesome, glad it helped. Thanks for fixing it! :) When trying to learn numpy/pandas concepts, sometimes it's hard to tell user error from a True bug. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants