Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix indexing with datetime64[ns] with pandas=1.1 #4292

Merged
merged 9 commits into from
Sep 16, 2020

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Jul 31, 2020

Fixes #4283

The underlying issue is that calling .item() on a NumPy array with
dtype=datetime64[ns] returns an integer, rather than an np.datetime64
scalar. This is somewhat baffling but works this way because .item()
returns native Python types, but datetime.datetime doesn't support
nanosecond precision.

pandas.Index.get_loc used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using array[()] to convert 0d arrays into NumPy scalars
instead of calling array.item().

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.

  • Tests added
  • Passes isort . && black . && mypy . && flake8

Fixes pydata#4283

The underlying issue is that calling `.item()` on a NumPy array with
`dtype=datetime64[ns]` returns an _integer_, rather than an `np.datetime64
scalar. This is somewhat baffling but works this way because `.item()`
returns native Python types, but `datetime.datetime` doesn't support
nanosecond precision.

`pandas.Index.get_loc` used to support these integers, but now is more strict.
Hence we get errors.

We can fix this by using `array[()]` to convert 0d arrays into NumPy scalars
instead of calling `array.item()`.

I've added a crude regression test. There may well be a better way to test this
but I haven't figured it out yet.
@pep8speaks
Copy link

pep8speaks commented Jul 31, 2020

Hello @shoyer! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-09-16 00:31:04 UTC

@keewis
Copy link
Collaborator

keewis commented Aug 5, 2020

you should be able to isolate this to just indexing.convert_label_indexer, e.g. with this in test_indexing.py:

    def test_convert_label_indexer_datetime(self):
        index = pd.to_datetime(["2000-01-01", "2001-01-01", "2002-01-01"])
        actual = indexing.convert_label_indexer(index, "2001-01-01")
        expected = (1, None)
        assert actual == expected

        actual = indexing.convert_label_indexer(index, index.to_numpy()[1])
        assert actual == expected

The failing tests are due to label[()] returning a numpy.str_ instead of a plain python str. Maybe we can fix that by using item as long as the dtype is not "datetime64" or "timedelta64":

if label.dtype.kind in "mM":
    label_value = label[()]
else:
    label_value = label.item()

Edit: stable is affected by this, too

@keewis
Copy link
Collaborator

keewis commented Aug 28, 2020

there are lots of people that stumble into this, so I think it might be good to get this to work as soon as possible and issue a bugfix release.

@keewis
Copy link
Collaborator

keewis commented Sep 6, 2020

gentle ping, @shoyer. Any updates on this?

@keewis
Copy link
Collaborator

keewis commented Sep 13, 2020

it seems we have to cast because label may also be something like a Variable object, which is not accepted by index.get_loc. I pushed my original fix (falling back to .item() for non-datetime / timedelta dtypes), I hope that's okay?

@keewis
Copy link
Collaborator

keewis commented Sep 13, 2020

it seems pandas warns about our usage of pandas.Grouper:

/home/docs/checkouts/readthedocs.org/user_builds/xray/checkouts/4292/xarray/core/common.py:1134: FutureWarning: 'base' in .resample() and in Grouper() is deprecated.
The new arguments that you should use are 'offset' or 'origin'.

>>> df.resample(freq="3s", base=2)

becomes:

>>> df.resample(freq="3s", offset="2s")

  grouper = pd.Grouper(

this was introduced in 1.1.0. We're supporting pandas>=0.25 (maybe even >=0.24) so we can't switch yet. I added a warning filter and a todo comment, but we might also need a tracking issue.

@dcherian
Copy link
Contributor

I am +1 on merging this quickly and issuing a bugfix release.

We can always make cleanups later...

max-sixty and others added 2 commits September 15, 2020 17:22
Co-authored-by: keewis <keewis@users.noreply.github.com>
@shoyer shoyer merged commit 59f57f3 into pydata:master Sep 16, 2020
@shoyer
Copy link
Member Author

shoyer commented Sep 16, 2020

OK, submitting this!

Thanks @keewis for making this fix actually work :)

@seth-p
Copy link
Contributor

seth-p commented Sep 16, 2020

Does this fix #4363?

@shoyer
Copy link
Member Author

shoyer commented Sep 16, 2020

Does this fix #4363?

No, that seems to be unrelated

@max-sixty
Copy link
Collaborator

I can do a patch release this weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Selection with datetime64[ns] fails with Pandas 1.1.0
6 participants