-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement assert_tzawareness_compat for DatetimeIndex #18376
Conversation
@@ -649,6 +653,20 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, | |||
result._reset_identity() | |||
return result | |||
|
|||
def _assert_tzawareness_compat(self, other): | |||
# adapted from _Timestamp._assert_tzawareness_compat | |||
other_tz = getattr(other, 'tzinfo', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is odd to use tzinfo
, these are already wrapped scalars, or an index type, so .tz
is appropriate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my first thought too but other
could be a raw datetime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use .tz, you can simply wrap other = pd.Timestamp(other)
pandas/core/internals.py
Outdated
if isna(s): | ||
return isna(values) | ||
return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq) | ||
if is_datetime64tz_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the right place to fix if this is even an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree, could use advice on where is the right place. Without this change there is a test failure in tests.indexing.test_coercion.TestReplaceSeriesCoercion.test_replace_series_datetime64tz with traceback
def test_replace_series_datetime64tz(self):
from_key = 'datetime64[ns, US/Eastern]'
for to_key in self.rep:
> self._assert_replace_conversion(from_key, to_key, how='dict')
pandas/tests/indexing/test_coercion.py:1317:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/tests/indexing/test_coercion.py:1239: in _assert_replace_conversion
result = obj.replace(replacer)
pandas/core/generic.py:4514: in replace
limit=limit, regex=regex)
pandas/core/generic.py:4563: in replace
regex=regex)
pandas/core/internals.py:3504: in replace_list
masks = [comp(s) for i, s in enumerate(src_list)]
pandas/core/internals.py:3502: in comp
operator.eq)
pandas/core/internals.py:4954: in _maybe_compare
result = op(a, b)
pandas/core/indexes/datetimes.py:122: in wrapper
self._assert_tzawareness_compat(other)
where the relevant obj.replace(replacer)
call has
index = pd.Index([3, 4], name='xxx')
data = [Timestamp('2011-01-01 00:00:00-0500', tz='US/Eastern'), Timestamp('2011-01-03 00:00:00-0500', tz='US/Eastern')]
obj = pd.Series(data, index=index, name='yyy')
replacer = {Timestamp('2011-01-01 00:00:00-0500', tz='US/Eastern'): 1.1, Timestamp('2011-01-03 00:00:00-0500', tz='US/Eastern'): 2.2}
The problem is that in internals replace_list
is converting to m8 which drops tz.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the place for this, might need to fixe .replace
in the block itself.
#17920 makes some desig decisions about tzawareness strictness. Conditional on those decisions, comparisons should probably use the same conventions. |
#17920 is orthogonal. That is purely string based. When you use Timestamps, they must be the same timezone (or None) or should raise |
See the OP. The unresolved error here is in comparing a tzaware DatetimeIndex against naive-like strings.
Under the status quo, the following are equivalent:
#17920 affects the behavior of |
@jbrockmendel Honestly I used some parts of pandas for my project and focussed on that part but I do not have deep insight into the internals and wherelse the changes I suggested could be made. Your comparison seems like an edge case to me but people have different tasks to do with this great library. I would rather go for the split between naive and timezoned datetime-like objects and my |
Increasingly I think we should be strict about this, i.e. be Technically Correct. Anything else is going to require keeping track of what rules are loosened in which special cases, and will inevitably lead to headaches down the road. That said, an idea for a workaround for slicing tzaware indexes: interpret a trailing "TZ" at the end of a string as "interpret this naive datetime-like string with this DatetimeIndex's timezone". Is #17920 motivated by a pressing use case where you've got a tzaware DatetimeIndex? I expect the large majority of indexing use cases are tznaive. |
It is motivated by a use case with data from several different data sources. Some are UTC, some are German Winter Time (without daylight saving time, definitely not a standard time), some are CET etc. I believe that mixing different data sources often has timezone issues as a consequence. But a prove I can not serve. Adding a |
Let's move the discussion over to #18435. Closing this PR until the design issue is resolved. |
Reopening. If we're going to introduce inconsistencies, might as well go full-speed ahead and get them sorted out sooner rather than later. Will push update shortly. |
Codecov Report
@@ Coverage Diff @@
## master #18376 +/- ##
==========================================
- Coverage 91.36% 91.34% -0.02%
==========================================
Files 163 163
Lines 49704 49723 +19
==========================================
+ Hits 45411 45420 +9
- Misses 4293 4303 +10
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18376 +/- ##
==========================================
+ Coverage 91.53% 91.53% +<.01%
==========================================
Files 148 148
Lines 48688 48701 +13
==========================================
+ Hits 44566 44579 +13
Misses 4122 4122
Continue to review full report at Codecov.
|
with pytest.raises(TypeError): | ||
op(left, right) | ||
|
||
# Check that there isn't a problem aware-aware and naive-naive do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests should be in series (the latter ones). alternatively a better place might be tin test_base for these, where we already handle series & index tests.
@@ -649,6 +653,20 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, | |||
result._reset_identity() | |||
return result | |||
|
|||
def _assert_tzawareness_compat(self, other): | |||
# adapted from _Timestamp._assert_tzawareness_compat | |||
other_tz = getattr(other, 'tzinfo', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use .tz, you can simply wrap other = pd.Timestamp(other)
pandas/core/internals.py
Outdated
if isna(s): | ||
return isna(values) | ||
return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq) | ||
if is_datetime64tz_dtype(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not the place for this, might need to fixe .replace
in the block itself.
That was the idea behind #18435. Feel free to edit if that if the OP doesn't provide enough overview for your tastes.
How to treat it in the whatsnew notes is above my pay grade. But over the course of this PR's history I've become increasingly convinced that the current comparison behavior is Just Plain Wrong and should be treated like a bug. The three options on hand are 1) this PR which makes AFAICT the main objection to 1) is that in conjunction with #17920 it breaks the equivalence between
... and most of all, I am not remotely confident that this list is complete. How many places across the code-base do comparisons with A tz-aware Any of the available options introduces an inconsistency somewhere. AFAICT Option1 breaks a convenience equivalency, will do so loudly, and as a result will not snowball into other inconsistencies. Special-casing string comparisons generates a whole mess of other potential (often silent) problems that can be avoided by enforcing behavior that is already canonical. |
Ah, yes, forgot there was already quite some discussion there. Can you put your last comment there as well? Then I will answer there |
Just pushed a commit that lets tzawareness compat slide for strings, enforces it for everything else. |
Closes #12601 |
@jorisvandenbossche with the most recent change this allows strings through without changing current behavior, only checks tzawareness-compat for datetime and vectors. i.e. hopefully this fixes the part that we all agree is a bug without taking a stand on the rest. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Should this also be tested for series/series and series/index comparisons?
@@ -821,6 +821,9 @@ def test_replace_series(self, how, to_key, from_key): | |||
if (from_key.startswith('datetime') and to_key.startswith('datetime')): | |||
pytest.xfail("different tz, currently mask_missing " | |||
"raises SystemError") | |||
elif from_key in ['datetime64[ns, US/Eastern]', 'datetime64[ns, UTC]']: | |||
pytest.xfail(reason='GH #18376, tzawareness-compat bug ' | |||
'in BlockManager.replace_list') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be discussed before, but since you need to add a xfail here, is this introducing a regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess? See above:
so replace_list is a giant hack ATM and needs to be fixed. maybe this is the impetus. However, don't really want to add your hack on top. What exactly is failing w/o you touching replace? let's isolate and x-fail those tests for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give an explicit code example that works now and will fail after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche good catch. Two things here:
-
Does the fact that this is not immediately obvious by inspection suggest that test parametrization may have been taken a step too far?
-
In putting together an answer to this question I found that any non-datetime comparison raises, whereas I expect we want
DatetimeIndex(...) == "baz"
to just be Falsey (following the behavior ofTimestamp.__richcmp__
andTimedelta.__richcmp__
). So I need to fix this, will answer the original question after that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche An example of a case that xfails after this PR:
index = pd.Index([3, 4], dtype='int64', name=u'xxx')
obj = pd.Series([pd.Timestamp('2011-01-01', tz='US/Eastern'),
pd.Timestamp('2011-01-03', tz='US/Eastern')],
index=index, name='yyy')
replacer = pd.Series([pd.Timedelta(days=1), pd.Timedelta(days=2)],
index=obj)
>>> result = obj.replace(replacer)
[...]
TypeError: Cannot compare tz-naive and tz-aware datetime-like objects
Separately. I've just started looking at Series vs Index arithmetic/comparisons and that's going to be a long process (see #18824). This PR has been a tough slog; I'd really like to get it over with. |
this looks fine now that have eliminated the controversial string coercing. |
rebase |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -395,4 +395,5 @@ Categorical | |||
Other | |||
^^^^^ | |||
|
|||
- Fixed bug where comparing :class:`DatetimeIndex` failed to raise ``TypeError`` when attempting to compare timezone-aware and timezone-naive datetimelike objects (:issue:`18162`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to conversaion
whatsnew change, otherwise lgtm. |
Ping |
thanks! |
closes #18162
ATM comparing tzaware
DatetimeIndex
with tznaive DTI fails to raise. This PR implements_assert_tzawareness_compa
t (which currently exists in_Timestamp
) inDatetimeIndex
to fix that.That in turn causes breakage in
Series.replace
. There's a small edit incore.internals
to fix that, but I'm not sure that's the best way to make that work.There is still one remaining test error in
tests.series.test_indexing.TestSeriesIndexing.test_getitem_setitem_datetimeindex
, which may either represent a bug or a case where tznaive/tzaware rules are relaxed.Later in the same tests it tries to compare
ts.index
to naivedatetime
objects. Is this a special case where we are intentionally less strict?