Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Should extended dtype work as the same as np.dtype? #12619

Closed
sinhrks opened this issue Mar 14, 2016 · 3 comments · Fixed by #21674
Closed

API: Should extended dtype work as the same as np.dtype? #12619

sinhrks opened this issue Mar 14, 2016 · 3 comments · Fixed by #21674
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Mar 14, 2016

We can create / convert using datetimetz dtype, but it doesn't work some cases.

s = pd.Series([pd.Timestamp('2011-01-31', tz='US/Eastern')])
s
#0   2011-01-31 00:00:00-05:00
# dtype: datetime64[ns, US/Eastern]

# OK
s.astype('datetime64[ns, Asia/Tokyo]')
#0   2011-01-31 14:00:00+09:00
# dtype: datetime64[ns, Asia/Tokyo]

astype

# numpy (OK)
pd.Series([1296432000000000000]).astype('datetime64[ns]')
#0   2011-01-31
# dtype: datetime64[ns]

# extended (NG)
pd.Series([1296432000000000000]).astype('datetime64[ns, Asia/Tokyo]')
# TypeError: Invalid datetime unit in metadata string "[ns, Asia/Tokyo]"

dtype arg

# extended (OK ? I think the result should be 2011-01-31 00:00:00-05:00... see below)
pd.Series([1296432000000000000], dtype='datetime64[ns, US/Eastern]')
#0   2011-01-31 05:00:00-05:00
# dtype: datetime64[ns, US/Eastern]

# ref
pd.Series([1296432000000000000], dtype='datetime64[ns]').dt.tz_localize('US/Eastern')
#0   2011-01-31 00:00:00-05:00
# dtype: datetime64[ns, US/Eastern]
# extended (NG)
pd.Series([pd.Timestamp('2011-01-01', tz='US/Eastern')], dtype='datetime64[ns, US/Eastern]')
# TypeError: data type not understood
@sinhrks sinhrks added Dtype Conversions Unexpected or buggy dtype conversions API Design Timezones Timezone data dtype labels Mar 14, 2016
@sinhrks sinhrks added this to the 0.18.1 milestone Mar 14, 2016
@sinhrks
Copy link
Member Author

sinhrks commented Apr 2, 2016

Found dtype arg issue affects to boxing (#12752) issue. Must fix this first.

s = pd.Series([pd.Timestamp('2011-01-01 09:00', tz='US/Eastern')])
s
# 0   2011-01-01 09:00:00-05:00
# dtype: datetime64[ns, US/Eastern]

s._data._block.values.asi8
# array([1293890400000000000])

# NG, must be 2011-01-01 09:00:00-05:00
pd.Series(s._data._block.values.asi8, dtype='datetime64[ns, US/Eastern]')
# 0   2011-01-01 14:00:00-05:00
# dtype: datetime64[ns, US/Eastern]

@sinhrks
Copy link
Member Author

sinhrks commented Apr 2, 2016

Timestamp/TDI holds internal repr in int, and it refers to absolute time of GMT.

int(pd.Timestamp('2011-01-01').asm8)
# 1293840000000000000

int(pd.Timestamp('2011-01-01', tz='US/Eastern').asm8)
# 1293858000000000000

Thus, Timestamp creation using int should have the same internal repr.

pd.Timestamp(1293858000000000000)
# Timestamp('2011-01-01 05:00:00')

int(pd.Timestamp(1293858000000000000).asm8)
# 1293858000000000000

pd.Timestamp(1293858000000000000, tz='US/Eastern')
# Timestamp('2011-01-01 00:00:00-0500', tz='US/Eastern')

int(pd.Timestamp(1293858000000000000, tz='US/Eastern').asm8)
# 1293858000000000000

However, the rule is not applied to DTI. DTI must work the same as Timestamp, otherwise boxing against scalar / array outputs different results.

# OK, without TZ
pd.DatetimeIndex([1293858000000000000])
# DatetimeIndex(['2011-01-01 05:00:00'], dtype='datetime64[ns]', freq=None)

pd.DatetimeIndex([1293858000000000000]).asi8
# array([1293858000000000000])

# NG, with TZ slides internal repr
pd.DatetimeIndex([1293858000000000000], tz='US/Eastern')
# DatetimeIndex(['2011-01-01 05:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

pd.DatetimeIndex([1293858000000000000], tz='US/Eastern').asi8
# array([1293876000000000000])

@jreback
Copy link
Contributor

jreback commented Apr 3, 2016

yeah, I think there's a bug somewhere where I am converting on a localized UTC somewhere

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Dtype Conversions Unexpected or buggy dtype conversions Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants