Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

date_range does not capture right timezone from input dates #7901

Closed
rockg opened this issue Aug 2, 2014 · 19 comments · Fixed by #7909
Closed

date_range does not capture right timezone from input dates #7901

rockg opened this issue Aug 2, 2014 · 19 comments · Fixed by #7909
Labels
Bug Timezones Timezone data dtype
Milestone

Comments

@rockg
Copy link
Contributor

rockg commented Aug 2, 2014

Example is below. I would expect that if dates have a timezone on them, date_range would then use that timezone to fill in the rest of the period. However, something goes awry (notice the 01:00 below). If I have dates with a timezone and pass a timezone (case 2), it still doesn't work. Only when I remove the timezone from the dates does it work (case 3). I would expect all these to work the same.

import pytz
tz = pytz.timezone('US/Eastern')
from datetime import datetime
sd = tz.localize(datetime(2014, 3, 6))
ed = tz.localize(datetime(2014, 3, 12))
list(pd.date_range(sd, ed, freq='D'))
Out[41]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 01:00:00-0400', tz='US/Eastern', offset='D')]

list(pd.date_range(sd, ed, freq='D', tz='US/Eastern'))
Out[42]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 01:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 01:00:00-0400', tz='US/Eastern', offset='D')]

list(pd.date_range(sd.replace(tzinfo=None), ed.replace(tzinfo=None), freq='D', tz='US/Eastern'))
Out[43]: 
[Timestamp('2014-03-06 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-07 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-08 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-09 00:00:00-0500', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-10 00:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-11 00:00:00-0400', tz='US/Eastern', offset='D'),
 Timestamp('2014-03-12 00:00:00-0400', tz='US/Eastern', offset='D')]
@jreback
Copy link
Contributor

jreback commented Aug 2, 2014

is this the same as #7835 ?

@rockg
Copy link
Contributor Author

rockg commented Aug 2, 2014

Now that I look more closely, it probably is. I'd prefer to leave it open for some additional test cases at least.

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

The 3.4 test failed and it's because Timestamp vs datetime result in different offsets. I recall seeing this a few days ago in one of the issues but can't easily find it. Any ideas what's going on?

from pytz import timezone as tz
pd.Timestamp('1/1/2011', tz='US/Eastern')
2011-01-01 00:00:00-05:00
datetime(2011, 1, 1, tzinfo=tz('US/Eastern'))
2011-01-01 00:00:00-04:56

@seth-p
Copy link
Contributor

seth-p commented Aug 4, 2014

FWIW, this is what I see using 64-bit Python 3.4.1 on Windows, with pytz 2014.4:

In [105]: from datetime import datetime

In [106]: datetime(2011, 1, 1, tzinfo=tz('US/Eastern'))
Out[106]: datetime.datetime(2011, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>)

I don't know if it has anything to do with anything, but I just noticed that there's no egg for 3.4 on https://pypi.python.org/pypi/pytz.

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

And I don't know how this particular test test_daterange.py(TestDateRange.test_range_tz_pytz) ever passes on 3.4 (I didn't add it and my change doesn't impact it).

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

@rockg pytz will fall back to a generic installer for 3.4, and since its python only this works.

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

I don't understand completely what that means. Is this a bug in itself (why does the Timestamp have a different offset than the datetime) or are they doing something different with pytz?

@sinhrks
Copy link
Member

sinhrks commented Aug 4, 2014

Shoud use localize in pytz with DST to get localized offset.

datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))
# 2011-01-01 00:00:00-04:56

pytz.timezone('US/Eastern').localize(datetime.datetime(2011, 1, 1))
# 2011-01-01 00:00:00-05:00

Before your fix, both date_range and datetime had non-localized offset <DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD> thus test has passed.

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

@sinhrks Why would this only have to happen for 3.4? All other versions of tests passed fine. And I swear that I tried localize to the same effect, but you show otherwise...I will confirm later this evening.

@sinhrks
Copy link
Member

sinhrks commented Aug 4, 2014

In my understanding, the issue is caused by pytz 2014.4 used in 3.4 test, unrelated to python version. I checked above behaviour in 2.7.6, but may better to confirm with 3.4 also.

http://stackoverflow.com/questions/24188060/in-pandas-why-does-tz-convert-change-the-timezone-used-from-est-to-lmt

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

if its with pytz 2014.4 then the problem is with the comparison itself. See for example a fix here: https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_timezones.py#L386

You have to be very explicit with the expected case, iow, you have to normalize it correctly. Their were a few cases that 'worked' because US/Eastern was the same through all pytz, but that changed in pytz 2014.3 (when the actual timezone definition changed to be LMT).

3.4 fails because it uses the current pytz. the others use a definition that has < 2014.3 pytz

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

Okay, now this is all making sense. I thought all travis tests were using the latest pytz version. I will update my test. Thanks @sinhrks, @jreback. I will add to the release note that simply passing in tzinfo is not enough and that localize is the right way to create localized times.

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

@rockg what do you you mean passing in 'tzinfo' is not enough? you ALWAYS have to localize

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

I know, but the pandas tests themselves don't localize (and I'm sure other people have done the same thing and it was fine until the latest release of pytz).

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

no, the pandas tests are WRONG if they don't localize. I appreciate that you want to fix the docs, ok. But pandas tests themselves need to be fixed if they are wrong (as some were when we shifted to 2014.3)

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

#7343

@rockg
Copy link
Contributor Author

rockg commented Aug 4, 2014

That's what I'm saying...the pandas tests are wrong. Of course I'm going to fix the tests in addition to the docs.

@jreback
Copy link
Contributor

jreback commented Aug 4, 2014

@rockg perfect, thanks!

and a doc-warning (actually in the timezone section) might not be a bad idea as well.

@ischwabacher
Copy link
Contributor

I don't think the user should ever have to call localize or normalize. Those are datetime.datetime implementation details that somehow datetime.datetime failed to implement, so pytz had to graft them on somewhere in order to work. (There's more detail on this in my Stack Overflow answer.) But Timestamp is a datetime.datetime subclass that does know about them, so it should take care of calling localize and normalize as appropriate and leave the user none the wiser.

Again, I am strongly in favor of a view of time zones as immutable objects, at least modulo the ability of governments to screw up our predictions of the future.

jreback added a commit that referenced this issue Aug 5, 2014
Remove from start/end dates if tz is not None (#7901, #7835)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants