-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out-by-one error when slicing a TimeSeries with date-strings #3905
Comments
@dhirschfeld I believe the string slicing is correct here, while the datetime object slicing is off by one. Labels by definition are INCLUSIVE of the endpoints, while integer are not. Since slicing a datetimeindex is never 'integer' based, I think it is appropraite to always include endpoints. Though a case could be made for the reverse if a frequency is defined. What do you think? |
Since each datetime in the index has hour = 9, the end date of i.e. the returned slice has included a value which is 9 hours after my specified end point! If I had instead specified @jreback, surely this is a bug? |
this issue added this, IOW, the last day is included, regardless of the time: #3546 |
@dhirschfeld when does the last day end (11:59:59.9999999) ? your example implies it should end exactly 1 day before |
IMHO that's exactly the problem with having an inclusive end point on a continuously defined interval - there is no inclusive end point as there is an infinite number of reals between any two reals. That's beside the point in this example though as Perhaps the misunderstanding is that I'm interpreting the string |
but that's exactly the problem the string '04-jan-2014' is ambiguous you think it should not include anything on that day if you really mean the Instant in time then you need to be more specific '04-jan-2014 00:00:00' and that why the datetime example does what u want, it IS specific |
I'd argue that it is a strongly held convention that any unspecified hours/minutes/seconds are equal to zero. It holds in Excel, dateutil, numpy and even in the rest of pandas: In [1]: from dateutil.parser import parser
In [2]: parser().parse('04-Jan-2014')
Out[2]: datetime.datetime(2014, 1, 4, 0, 0)
In [3]: np.datetime64('2014-01-04').astype(pd.datetime)
Out[3]: datetime.date(2014, 1, 4)
In [4]: pd.DatetimeIndex(['01-Jan-2014'])[0]
Out[4]: <Timestamp: 2014-01-01 00:00:00> Looking at #3546 I can see the logic - it's handy to be able to get a whole period out without having to explicitly specify the start and end points of the slice e.g. I'm not sure that this utility extends to slices though - it seems extremely surprising and far too magical to me to have a date string in a slice interpreted as anything other than a datetime instance as would be parsed by pandas itself in other circumstances. I'll post a link on the mailing list to solicit other feedback as it could be that I'm in the minority by being surprised by the current behaviour. If so it may still be worthwhile to have a note in the documentation. |
you can easily have your slice interpreted exactly as you want, either string or datetime, just be specific (to seconds level). I don't think is all that magical and is quite useful; we are debating whether if you specifiy
Your examples are not on point as they are single points in time, not slices, which is exactly how this is interpreted. |
@dhirschfeld I put up #3939, please take a look and let me know if this makes sense (I know we disagree about the actual semantics, but I am documenting the current behavior, if it is changed, these will change too) |
Sorry, I'm mostly offline for the next week and haven't had a chance to really look at it properly. My interpretation of the current behaviour is that slicing with date strings works differently to slicing with date objects. When slicing with date strings the both the start and end dates are treated as period objects whose frequency is determined by the "accuracy" of the passed date object. e.g. if Treating the slice arguments as periods has the important consequence that because the end point is inclusive all timestamps falling within the last period will be included in the slice. e.g. the slice |
I had put this section up in the main docs: your interpretation is exactly right....date strings are different than datetime objects I think I should include an example using datetime objects (as well), as well as some of your language from above |
Great - with the new documentation I'm happy for this issue to be closed. If you were going to expand the description further it may even deserve a mention or link from the gotchas section. |
great...thanks for the discussion! |
...when the hour is not zero.
Example below:
The text was updated successfully, but these errors were encountered: