Recognize timezoned labels when accessing dataframes. #17920

1kastner · 2017-10-19T14:54:15Z

closes ERR: validate partial string indexing with tz-aware end-points #16785
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

1kastner · 2017-10-19T15:24:12Z

Ouch, looks like some for me...

jschendel · 2017-10-19T19:59:50Z

pandas/tests/indexing/test_datetime.py

@@ -123,6 +123,30 @@ def test_consistency_with_tz_aware_scalar(self):
        result = df[0].at[0]
        assert result == expected

+    def test_access_datetimeindex_with_timezoned_label(self):
+


Can you add the github issue number here as a comment? (see the test after this one for an example)

jreback · 2017-10-20T10:37:40Z

pandas/core/indexes/datetimes.py

@@ -1273,52 +1273,57 @@ def _parsed_string_to_bounds(self, reso, parsed):
        lower, upper: pd.Timestamp

        """
+        if parsed.tzinfo is None:


so you need to do something different here. leave the dates that are generated in the current tz (e.g. self.tz). Then you need to convert it to the parsed_tz, BUT then you need to localize back into the original timezone. There are a number of cases, here's an example.

# index in US/Eastern, parsed in UTC In [22]: pd.Timestamp('20130101',tz='US/Eastern') Out[22]: Timestamp('2013-01-01 00:00:00-0500', tz='US/Eastern') 00:00+0000', tz='UTC') In [24]: pd.Timestamp('20130101',tz='US/Eastern').tz_convert('UTC').tz_localize(None).tz_localize('US/Eastern') Out[24]: Timestamp('2013-01-01 05:00:00-0500', tz='US/Eastern') # index is naive, parsed is UTC, ineffect no change here In [25]: pd.Timestamp('20130101') Out[25]: Timestamp('2013-01-01 00:00:00') In [26]: pd.Timestamp('20130101').tz_localize('UTC').tz_localize(None) Out[26]: Timestamp('2013-01-01 00:00:00')

As I am writing this, it looks overly complicated. I might choose instead to raise if the timezones don't match (they can be same tz or both None).

As I elaborated in #16785 the timezones do not need to match. There are quite usual cases with daylight savings time that require some flexibility here. So my idea is to change target_tz = parsed.tzinfo to some kind of conversion which you mentioned. Next week I might give it a shot. Dealing with only the timezone of the DatetimeIndex seems reasonable.

I am really tired but tried to find a solution. If it works, great, if not, at least it shows the idea. I will come back to it as soon as possible. Now I am having some issues with installing the development environment under Windows 10 and all solutions found look quite time consuming. Sorry when I spam you with not working code.

jreback · 2017-10-28T00:32:06Z

pls rebase when you can

…ror-on-non-naive-datetime-strings

pep8speaks · 2017-10-31T21:43:03Z

Hello @1kastner! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on November 26, 2017 at 22:11 Hours UTC

1kastner · 2017-10-31T22:12:34Z

I did a rebase and some more minor adjustments.

)

…pandas-dev#17748)

closes pandas-dev#17979 Author: sfoo <sfoohei@gmail.com> Author: Jeff Reback <jeff@reback.net> Closes pandas-dev#17996 from GuessWhoSamFoo/groupby_tuples and squashes the following commits: afb0031 [Jeff Reback] TST: separate out grouping-type tests c52b2a8 [sfoo] Moved notes to 0.22; created is_axis_multiindex var - pending internal use fb52c1c [sfoo] Added whatsnew; checked match_axis_length 99ebc4e [sfoo] Cast groupby tuple as list when multiindex

…andas-dev#18076)

…ror-on-non-naive-datetime-strings

1kastner · 2017-11-23T20:32:19Z

@jreback ping on update

1kastner · 2017-11-23T21:24:18Z

The failing test I accidentially got from the master branch.

jbrockmendel · 2017-11-24T05:11:28Z

@jreback pls hold off on this pending resolution of #18435.

jreback · 2017-11-24T16:23:45Z

@jbrockmendel your discussion w.r.t. to this issue is misplaced. We are talking about how to interpret partial strings when you already know the underlying details about the tz's in question. This is just user convenience and is completely orthogonal to other issues you have raised.

jbrockmendel · 2017-11-24T16:30:17Z

This is just user convenience and is completely orthogonal to other issues you have raised.

I understand the convenience issue. Consider the two types of slicing/indexing series[lower:upper] and series[(series.index >= lower) & (series.index <= lower)]. This equivalence breaks the orthogonality. If you want to break the equivalence that's fine, I just want that design decision to be made explicit.

1kastner · 2017-11-24T20:45:56Z

@jreback Is this pull request ready to go?

jreback · 2017-11-25T15:19:55Z

pandas/tests/indexes/datetimes/test_datetime.py

@@ -236,7 +236,8 @@ def test_stringified_slice_with_tz(self):
        start = datetime.datetime.now()
        idx = DatetimeIndex(start=start, freq="1d", periods=10)
        df = DataFrame(lrange(10), index=idx)
-        df["2013-01-14 23:44:34.437768-05:00":]  # no exception here
+        with tm.assert_produces_warning(UserWarning):
+            df["2013-01-14 23:44:34.437768-05:00":]  # no exception here


I would remove the # no exception here and add a comment about the warning that is produced

I am not sure whether that is a good idea. That comment is not mine and it is not related to my code. The proposed refactoring is beyond the scope of this pull request.

1kastner · 2017-11-26T00:11:41Z

I am not sure whether that is a good idea. That comment is not mine and it is not related to my code. The proposed refactoring is beyond the scope of this pull request. *EDIT*: I published this via my email client and as @jbrockmendel pointed out my comment was meant to be for your change request.

jreback · 2017-11-26T00:15:02Z

I am not sure whether that is a good idea. That comment is not mine and it is not related to my code. The proposed refactoring is beyond the scope of this pull request.

@1kastner not sure what you are referring. I had only 1 more comment (small). @jbrockmendel concerns will be addressed elsewhere.

jbrockmendel · 2017-11-26T00:50:36Z

@jreback I think the comment @1kastner was referring to was the #no exception comment you asked to have removed. Ambiguity in code-comment vs GH comment.

I would remove the # no exception here

jreback · 2017-11-26T01:00:12Z

if this was the what you were referring

I would remove the # no exception here

then let's add a comparison for that test (to ensure it returns the correct thing).
and add a comment as to why there is a warning.

as to being 'out of scope', well just it. since this code needed to be touched it also needs updating. just because there is a comment that is non-sensical, doesn't mean we leave technical debt in place.

1kastner · 2017-11-26T21:55:15Z

@jreback This test checks whether no exception is thrown. Actually it is better to completely remove this test because whatever it tests is covered by my new tests as well

jreback · 2017-11-26T21:58:37Z

@jreback This test checks whether no exception is thrown. Actually it is better to completely remove this test because whatever it tests is covered by my new tests as well

that would be fine

…ror-on-non-naive-datetime-strings

jreback · 2017-11-26T22:39:10Z

doc/source/timeseries.rst

@@ -557,6 +557,50 @@ We are stopping on the included end-point as it is part of the index
   dft2 = dft2.swaplevel(0, 1).sort_index()
   dft2.loc[idx[:, '2013-01-05'], :]

+.. versionadded:: 0.21.1


add a sub-section label here (with a ref), call it something like slicing with timezones.

jreback · 2017-11-26T22:43:00Z

pandas/tests/indexing/test_partial.py

+        # GH 6785
+        # timezone was ignored when string was provided as a label
+
+        first_january = pd.date_range('2016-01-01T00:00', '2016-01-01T23:59',


I would really like to parametrize these to avoid the code repetition. so i think you can do it with 2 test functions, one which slices and compares with an expected, and the 2nd function which checks for the warnings (you can actually do it with one if you add some more paramaters)

something like

@pytest.mark.parametrize("tz, start, end",......)

I think this contradicts the idea of having the df.iloc check suggested by @jorisvandenbossche because that is rather specific. I would rather delete the non-naive UTC test because the CET test shows much more.

You should be able to write the different strings so that they give the same expected frame I think

jorisvandenbossche

I would leave this for 0.22.0 instead of 0.21.1 (since it raises some discussion, I think it is good to have it in master for some time longer)

I think I am in general fine with the changes in this PR (as it is aligning df[lb:up] with df[df.index >= lb & df.index <= up] which already worked with timezones), but I agree with @jbrockmendel that the discussion in #18435 is relevant. I think that we should try to have comparisons, scalar element accessing and partial slicing all work consistent (which is however the case in this PR, so I fine with going forward with this PR is it is for 0.22.0, and we can continue to discuss on the other issue about the more broader aspect).
Something else related to this:

If we add a warning when doing partial string indexing with aware string on naive index, we should do the same for accessing a scalar with a string (eg df.loc["2016-01-01T00:00-02:00"] with your example in the docs)

jorisvandenbossche · 2017-11-26T22:36:54Z

doc/source/timeseries.rst

+
+.. note::
+
+   This both works with ``pd.Timestamp`` and strings


I think this is a bit confusing here. This section is about "partial datetime string indexing", so for me it is confusing to mention Timestamp

Please talk to @jreback who suggested to mention it. Actually it also works for datetime.datetime.

jorisvandenbossche · 2017-11-26T22:38:29Z

doc/source/timeseries.rst

+.. ipython:: python
+   :okwarning:
+
+   first_january_implicit_utc = pd.date_range('2016-01-01T00:00', '2016-01-01T23:59',


Can you make this a much shorter index? (you only need the first 10 to show the actual behaviour)
I would also try to use a shorter variable name here (eg idx_naive)

It can be shortened but I would keep it a bit longer than the first 10 because of the comparison in the end.

Which comparison?

I thought (without carefully checking) that maybe in the end I will just compare two empty dataframes which will accidentially happen to be equal. To avoid such wrong positive test I thought having a bit longer df can be helpful.

This are the docs, not tests. And you perfectly control what you do in the example, so you can just make it a bit longer than needed for the slicing to see the effect.

jorisvandenbossche · 2017-11-26T22:39:29Z

doc/source/timeseries.rst

+
+   df
+
+   four_minute_slice = df["2016-01-01T00:00-02:00":"2016-01-01T02:03"]


This is actually not an example of partial datetime string indexing. The dataframe index has a frequency of minutes, and you provide strings with a minute resolution

Yes you are right. What is the consequence in your eyes? I just want the timezones to work, that is my only desire.

There is no consequence for the behaviour, so this PR will fix your usecase, But for the example in the docs, we should make a clear one. So either I would make this actual partial slicing, or move this section to somewhere else

Then better move it because the timezones can not always be parsed, e.g. for months still UTC will be assumed as it goes through another path.

No, you can just edit the example a little bit. For example keep the minute resolution, and use strings with only hours (instead of the minutes now, and that still provides ability to specify time zone), or change the resolution of the df to seconds, and keep the strings as they are. Note you can do eg each 30s to avoid that selecting some minutes results in many rows.

jorisvandenbossche · 2017-11-26T23:08:34Z

pandas/core/indexes/datetimes.py

+                parsed = parsed.tz_convert(utc)
+        else:
+            if parsed.tz is None:  # treat like in same timezone
+                parsed = parsed.tz_localize(self.tz)


This case already worked before AFAIK, do you know why this is needed? (although the code seems logical)

The code is not necessarily needed when it is done somewhere else

jorisvandenbossche · 2017-11-26T23:12:56Z

pandas/tests/indexing/test_partial.py

+
+        result = df[
+            "2016-01-01T00:00-02:00":"2016-01-01T02:03"
+        ]


Can you put this all on a single line?

jorisvandenbossche · 2017-11-26T23:14:41Z

pandas/tests/indexing/test_partial.py

+            pd.Timestamp("2016-01-01T02:03")
+        ]
+
+        tm.assert_frame_equal(result, expected)


Can you assert both results (with strings or with Timestamps) with an independelty constructed one? (eg df.iloc[...])

I am not sure what you mean with df.iloc[...] but generally speaking yes

I mean to created 'expected' with something like df.iloc[120:124] (but then with the cirrect numbers)

jorisvandenbossche · 2017-11-26T23:19:11Z

pandas/tests/indexing/test_partial.py

@@ -637,3 +637,66 @@ def test_partial_set_empty_frame_empty_consistencies(self):
        df.loc[0, 'x'] = 1
        expected = DataFrame(dict(x=[1], y=[np.nan]))
        tm.assert_frame_equal(df, expected, check_dtype=False)
+
+    def test_access_timezoned_datetimeindex_with_timezoned_label_utc(self):


I think this is the incorrect file for the tests, as the partial in test_partial.py is not referring partial datetime string indexing, but partial setting or something.
There is a indexes/datetimes/test_partial_slicing.py that seems a better fit.

Possible, this was a suggestion of @jreback.

Yes, but I think it was a typo of @jreback and he forgot the _slicing part. You can look in the current file and see that there is no test related to datetime index slicing, so please make the change

jorisvandenbossche · 2017-11-26T23:25:14Z

pandas/tests/indexing/test_partial.py

+
+    def test_access_timezoned_datetimeindex_with_timezoned_label_utc(self):
+
+        # GH 6785


I think this is an incorrect issue number

oh let me have a look how that could happen

jorisvandenbossche · 2017-11-26T23:57:02Z

@1kastner Can you also test with an implicit partial string slice? (I mean a single string that represents a slice, without actually slicing, so eg df["2016-01-01T00-02:00"])

jbrockmendel · 2017-11-27T00:21:21Z

The slicing code goes through a different branch for monotonic vs non-monotic DatetimeIndexes. Is there a test for this that goes through that path?

jbrockmendel · 2017-11-27T01:13:40Z

pandas/core/indexes/datetimes.py

@@ -1364,7 +1378,7 @@ def _parsed_string_to_bounds(self, reso, parsed):
            st = datetime(parsed.year, parsed.month, parsed.day,
                          parsed.hour, parsed.minute, parsed.second,
                          parsed.microsecond)
-            return (Timestamp(st, tz=self.tz), Timestamp(st, tz=self.tz))
+            return Timestamp(st, tz=self.tz), Timestamp(st, tz=self.tz)


Since you're already making edits here there's a small bug-like that might be worth fixing. The day, hour, minute, and second cases don't have tz=self.tz passed to the upper half of the returned tuple.

Thanks for pointing that out! By now the timezone is like ignored (or at least not considered well enough) in the later stages but maybe one day that will change.

1kastner · 2017-11-27T22:23:13Z

As I need to focus on my work for now, I will not continue for a while. I set that maintainers are allowed to edit the pull request in case of urgent need for action.

jreback · 2018-02-11T15:24:59Z

can you rebase. let's see where we are on this.

1kastner · 2018-06-30T08:39:00Z

I just checked and it means a lot of merging. I do not have the resources to deal with this subject even though the problem itself still bugs me.

Recognize timezoned labels when accessing dataframes.

4671aeb

jschendel reviewed Oct 19, 2017

View reviewed changes

jreback reviewed Oct 20, 2017

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Datetime Datetime data dtype Timezones Timezone data dtype labels Oct 20, 2017

Merge branch 'master' of https://github.com/pandas-dev/pandas into er…

2297833

…ror-on-non-naive-datetime-strings

Marvin Kastner added 2 commits October 31, 2017 22:49

Make test_access_datetimeindex_with_timezoned_label PEP08 compliant.

69b517e

add translate function for converting time zones.

6532e76

jbrockmendel and others added 17 commits November 5, 2017 11:00

Move NaT to self-contained module (pandas-dev#18014)

c354271

Separate out arithmetic tests for datetimelike indexes (pandas-dev#18049

a9202fb

)

Adding skip to test failing because of lxml import (pandas-dev#17747) (…

88bf001

…pandas-dev#17748)

a zillion flakes (pandas-dev#18046)

7d8c9ab

TST: separate out grouping-type tests (pandas-dev#18057)

1310680

CLN: some lint issues

c8a604e

read_html(): rewinding [wip] (pandas-dev#18017)

de7a065

CI: temp disable scipy on windows 3.6 build (pandas-dev#18078)

7c0a3be

DOC: Remove duplicate 'in' from contributing.rst (pandas-dev#18040) (p…

8844b2e

…andas-dev#18076)

improve test output for Categoricals (pandas-dev#18069)

62695a2

MAINT: Remove np.array_equal calls in tests (pandas-dev#18047)

7691209

Move scalar arithmetic tests to tests.scalars (pandas-dev#18075)

edad476

Update Contributing Environment section (pandas-dev#18052)

bd958a1

Index tests in the wrong places (pandas-dev#18074)

ef9a06c

Move comparison utilities to np_datetime; (pandas-dev#18080)

ba279c0

Separate _TSObject into conversion (pandas-dev#18060)

2a31f7b

Merge branch 'master' of https://github.com/pandas-dev/pandas into er…

16fe3c3

…ror-on-non-naive-datetime-strings

jreback requested changes Nov 25, 2017

View reviewed changes

jreback requested a review from jorisvandenbossche November 25, 2017 15:20

1kastner and others added 3 commits November 26, 2017 23:09

Add CET timezoned datetime index as another test case

5724292

Merge branch 'master' of https://github.com/pandas-dev/pandas into er…

a4f3a5c

…ror-on-non-naive-datetime-strings

Adjust for flake8

8a2176d

jreback requested changes Nov 26, 2017

View reviewed changes

jorisvandenbossche reviewed Nov 26, 2017

View reviewed changes

jbrockmendel reviewed Nov 27, 2017

View reviewed changes

1kastner closed this Jun 30, 2018

This was referenced Jan 8, 2020

CLN: de-duplicate boxing in DTI.get_value #30819

Merged

API: DatetimeIndex.get_loc(datetime) should require tzawareness compat? #30994

Closed

jbrockmendel mentioned this pull request Sep 6, 2020

DEPR: disallow tznaive datetimes when indexing tzaware datetimeindex #36148

Merged

5 tasks


		df

		four_minute_slice = df["2016-01-01T00:00-02:00":"2016-01-01T02:03"]


		def test_access_timezoned_datetimeindex_with_timezoned_label_utc(self):

		# GH 6785

Recognize timezoned labels when accessing dataframes. #17920

Recognize timezoned labels when accessing dataframes. #17920

Conversation

1kastner commented Oct 19, 2017 • edited Loading

1kastner commented Oct 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 28, 2017

pep8speaks commented Oct 31, 2017 • edited Loading

Comment last updated on November 26, 2017 at 22:11 Hours UTC

1kastner commented Oct 31, 2017

1kastner commented Nov 23, 2017

1kastner commented Nov 23, 2017

jbrockmendel commented Nov 24, 2017

jreback commented Nov 24, 2017

jbrockmendel commented Nov 24, 2017

1kastner commented Nov 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1kastner commented Nov 26, 2017 via email • edited Loading

jreback commented Nov 26, 2017

jbrockmendel commented Nov 26, 2017

jreback commented Nov 26, 2017

1kastner commented Nov 26, 2017

jreback commented Nov 26, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche Nov 27, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 26, 2017

jbrockmendel commented Nov 27, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

1kastner commented Nov 27, 2017

jreback commented Feb 11, 2018

1kastner commented Jun 30, 2018

1kastner commented Oct 19, 2017 •

edited

Loading

pep8speaks commented Oct 31, 2017 •

edited

Loading

1kastner commented Nov 26, 2017 via email •

edited

Loading

jorisvandenbossche left a comment •

edited

Loading

jorisvandenbossche Nov 27, 2017 •

edited

Loading