-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with datetime and HDFStore #809
Comments
I get something else. What else is weird is DST is 3/13 in 2011, not 3/27. What pandas.version do you have, what OS & python? If linux, what are your LC_ variables? (ie, run set | grep "LC" in bash)
|
Aha, I thought that it was the same date across all countries but that was not the case. According to http://www.timeanddate.com/time/dst/2011.html there are several different dates. Running I'm running Ubuntu 11.04, Python 2.7.1 and pandas-0.7.0rc1-py2.7-linux-x86_64.egg. |
My use case is that I read data stored in CSV files. I load them into DataFrames. So far, so good. The problem occurs when persisting the DataFrame in a h5 file. When I load the data from the h5 file, I receive a DataFrame that has an index containing duplicate entries. |
You are right about DST being different where you are :) Since there is no timezone information attached, the principal of least surprise would suggest it should return exactly what you stored. However, looking into pandas/io/pytables.py, going into storage, it does:
And coming out,
Ok, so the problem is this: 2:02 on 3/27 is actually a non-existent time, and 2:02 == 3:02. How your locale knows you are in Sweden and your posix API takes advantage of this, I have no idea. Can you prefilter the data before storing? But even stranger, I cannot reproduce the behavior for me on 3/13, in my timezone (EST5EDT) when I should see the exact same behavior. |
I assume one would want the option to save data in non-standard (ie, no daylight saving time even during daylight saving time period). i'll see whether this is easy. |
Thank you for looking into this. I would definitively profit from having the option of storing and reading data from other timezones than my local. |
I think the best way to deal with this for now is to set the timezone information on your original dates. I.e., if x is a datetime, x = x.replace(tzinfo=pytz.UTC). When it comes out the other side, it should conform to your local time properly. We should have improved time zone handling in 0.8 along with the datetime64 type. |
Timestamp data is all represented internally as UTC (even though may appear to be in one time zone vs. another) and should not have any locale issues in pandas 0.8.0. See #1232 re storing time zones in HDFStore, will be done soon |
When storing a DataFrame using HDFStore the datetime information is altered. My guess is that there is some problem with daylight saving and time zones when the DataFrame is loaded from the h5 file. An example:
The text was updated successfully, but these errors were encountered: