-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug/inconsistent result when indexing DataFrame by YY-MM-DD #2306
Comments
I've noticed this bug as well. I'll have to see how difficult it is to fix |
Probably not an issue with string parsing, as I just tried |
More info that may help you track it down: Slicing with I then tried `df.ix['1/2012':'1/2012'] to see what the "correct" response for this sort of slice is, and was surprised to see the same bug! Only this time the outputted value seems...a bit random:
Why on earth would slicing from the year to itself return a single value from the middle of the range? Hope this helps! Correction: |
(Sorry for the comment spam!) I've just found another, possibly related, bug. Let me know if you'd like me to break this off into a separate GitHub issue.
Here,
However, watch what happens when we change the datetimes to timestamps:
Produces:
I thought this might be because the days were aligning to the hour I chose (1:00), so I tried:
This worked as expected, so I tried one last thing:
The only change being the last data point, moved from 0:00 to 1:00, but remaining on the same day. And behold:
So...uh...I feel like I've gone down a rabbit hole! Am I in bug central, or am I misunderstanding how pandas date ranges work? |
Do you have any recommendations on how to work around this issue, and on when this bug can be expected to trigger? |
DataFrame.resample has keywords
Now your last example seems to be a bug in the parser. If you examine the index, '1-2-2012 0:01' is actually being parsed as midnight for some reason.
We'll put in a fix for this. As a workaround, use
|
Should have broken this into two issues. This commit does not fix original issue but fixes datetime parsing bug in section starting "(Sorry for the comment spam!)" |
@pikeas as for the original issue, python dateutil parses '2012-01-01' as a datetime at midnight, so that's why it returns the first entry. There is special handling inside pandas for YYYY and YYYY-MM type of strings to use as "partial date slices". I'll have to do more digging but I don't think there's an easy way to do that for YYYY-MM-DD that both retains performance and doesn't impact other use cases. |
note: indexing time series with strings is purely convenience-- so we could sacrifice performance for correctness here. The main issue I saw in this is recognizing daily-only time series from intraday time series-- in some sense in this case you might want to check that there are all midnight timestamps and cache that so you know whether to return scalar values or sub-TimeSeries |
Oh you know what is_normalized is already cached so we can use that On Nov 27, 2012, at 11:39 AM, Wes McKinney notifications@github.com wrote:
|
df.ix['2012']
anddf['2012-05']
(selection by year or year and month) work as expected. However:The output from indexing by year/month/day is unexpected - we should receive either all values matching the day or an error (if indexing by day has not been implemented), instead we get only the first match.
The text was updated successfully, but these errors were encountered: