Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataFrame combine_first() loses timezone information for datetime columns #10567

Closed
iyer opened this issue Jul 14, 2015 · 11 comments
Closed

DataFrame combine_first() loses timezone information for datetime columns #10567

iyer opened this issue Jul 14, 2015 · 11 comments
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation MultiIndex Timezones Timezone data dtype
Milestone

Comments

@iyer
Copy link

iyer commented Jul 14, 2015

xref addl example in #13650

combine_first() loses timezone information for datetime columns

dts1 = pd.date_range('20150101','20150105',tz='UTC')
df1 = pd.DataFrame({'DATE':dts1})
dts2 = pd.date_range('20150103','20150105',tz='UTC')
df2 = pd.DataFrame({'DATE':dts2})
df = df1.combine_first(df2)
df.DATE[0].tz # this shows up as None
@jreback jreback added Bug Timezones Timezone data dtype labels Jul 14, 2015
@jreback
Copy link
Contributor

jreback commented Jul 14, 2015

this is actually some older code combine which really needs to be pushed into the internal classes, e.g. see this issue #3025 so a bit non-trivial. that said, pull-requests are welcome!

@jreback jreback added the Internals Related to non-user accessible pandas implementation label Jul 14, 2015
@jreback jreback added this to the Next Major Release milestone Jul 14, 2015
@iyer
Copy link
Author

iyer commented Jul 15, 2015

Thanks. I'll await the change when it happens
In the meantime I'll consume the DATE in my index, so as to work around the issue

@jreback
Copy link
Contributor

jreback commented Jul 15, 2015

well pull-requests are always welcome. There are a ton of issues. So this will prob not be addressed for quite some time.

@terrytangyuan
Copy link
Contributor

@jreback Could you give some instructions on how to fix this bug?

@jreback
Copy link
Contributor

jreback commented Sep 4, 2015

well, this needs to be pushed to the block manager (e.g. need a method Block.combine), then all of this dtype handling can be dispatched via the internal block types. Further #10477 will allow this to handle nicely.

Doing it now is going to be a bit hacky.

So if you want to look to start moving it internally as in #3025 would be a good start.

@phretor
Copy link

phretor commented Nov 9, 2015

FWIW, I noticed that, in certain cases (i.e., when you are well aware of the TZs), this workaround can be used:

In [16]: n['ts'].index.tz
Out[16]: <DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>

In [17]: n['ev'].index.tz
Out[17]: <DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>

In [18]: n['ts'].combine_first(n['ts']).index.tz
Out[18]: <DstTzInfo 'America/Los_Angeles' PST-1 day, 16:00:00 STD>

In [19]: n['ts'].combine_first(n['ev']).index.tz
Out[19]: <UTC>

In [20]: n['ts'].combine_first(n['ev']).tz_convert('America/Los_Angeles').index.tz
Out[20]: <DstTzInfo 'America/Los_Angeles' LMT-1 day, 16:07:00 STD>

Warning: this is my first dig into pandas.

@sinhrks
Copy link
Member

sinhrks commented Apr 30, 2016

This looks to be fixed on master (I don't look into detail, but maybe by changes in concat_compat?)

Adding tests.

@sinhrks
Copy link
Member

sinhrks commented Apr 30, 2016

Ah, even though tz is preserved, datetimes are incorrectly shifts (maybe the same as #12619).

dts1 = pd.DatetimeIndex(['2011-01-01', 'NaT', '2011-01-03', '2011-01-04'], tz='US/Eastern')
df1 = pd.DataFrame({'DATE': dts1}, index=[1, 3, 5, 7])
dts2 = pd.DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03'], tz='US/Eastern')
df2 = pd.DataFrame({'DATE':dts2}, index=[2, 4, 5])
df = df1.combine_first(df2)
df
#                        DATE
# 1 2011-01-01 05:00:00-05:00
# 2 2012-01-01 05:00:00-05:00
# 3                       NaT
# 4 2012-01-02 05:00:00-05:00
# 5 2011-01-03 05:00:00-05:00
# 7 2011-01-04 05:00:00-05:00

@jorisvandenbossche
Copy link
Member

Note that the example at the top is not really showing the issue (that seems to work in master)

@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 11, 2016
@jackalack
Copy link

why was this closed? This still appears to be an issue

@jorisvandenbossche
Copy link
Member

@jackalack Can you provide a reproducible example that shows this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Internals Related to non-user accessible pandas implementation MultiIndex Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants