Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ellipsis typehint to reductions #7048

Merged
merged 12 commits into from
Sep 28, 2022
Merged

Add Ellipsis typehint to reductions #7048

merged 12 commits into from
Sep 28, 2022

Conversation

headtr1ck
Copy link
Collaborator

This PR adds the ellipsis typehint to reductions (only where they behave differently from None to reduce overhead).
Follow up on #7017 (comment)

Additionally I was changing a lot of "one or more dimensions" typehints to str | Iterable[Hashable] (See #6142).
Some code changes were necessary to support this fully. Before several things were not working with actual hashable dimensions that are not strings.

@headtr1ck headtr1ck changed the title Ellipsis Add Ellipsis typehint to reductions Sep 16, 2022
@max-sixty
Copy link
Collaborator

Excellent @headtr1ck !

Do we need to run pytest --accept to get the docstrings? It looks like we lost lots...

@headtr1ck
Copy link
Collaborator Author

Could any dev that uses linux rerun the generate_reductions and pytest --doctest-modules xarray/core/_reductions.py --accept?
On windows I still get different results (maybe that should be fixed at some point...)

@headtr1ck
Copy link
Collaborator Author

Turns out that the buildin ellipsis works now with mypy.
Did not test it for older python versions, may require some special casing (Lets see if the tests pass)?

@max-sixty
Copy link
Collaborator

Here's the diff from pytest-accept (it is weird that it's slightly different on windows...)

commit 83615a94a6b7c0ae0cf0e0240d7705d9ce6c21e5
Author: Maximilian Roos <m@maxroos.com>
Date:   Sat Sep 17 12:36:45 2022 -0700

    pytest-accept

diff --git a/xarray/core/_reductions.py b/xarray/core/_reductions.py
index a7cf7ec2..d0c2a9d7 100644
--- a/xarray/core/_reductions.py
+++ b/xarray/core/_reductions.py
@@ -97,7 +97,7 @@ def count(
         <xarray.Dataset>
         Dimensions:  ()
         Data variables:
-            da       int32 5
+            da       int64 5
         """
         return self.reduce(
             duck_array_ops.count,
@@ -4400,7 +4400,7 @@ def count(
 
         >>> da.groupby("labels").count()
         <xarray.DataArray (labels: 3)>
-        array([1, 2, 2], dtype=int64)
+        array([1, 2, 2])
         Coordinates:
           * labels   (labels) object 'a' 'b' 'c'
         """
@@ -5485,7 +5485,7 @@ def count(
 
         >>> da.resample(time="3M").count()
         <xarray.DataArray (time: 3)>
-        array([1, 3, 1], dtype=int64)
+        array([1, 3, 1])
         Coordinates:
           * time     (time) datetime64[ns] 2001-01-31 2001-04-30 2001-07-31
         """

Copy link
Contributor

@Illviljan Illviljan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if Dims should include ellipsis as well? The few times it's missing might be issues with the functions?

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/variable.py Outdated Show resolved Hide resolved
xarray/core/weighted.py Outdated Show resolved Hide resolved
@Illviljan
Copy link
Contributor

Illviljan commented Sep 18, 2022

@@ -4400,7 +4400,7 @@ def count(
 
         >>> da.groupby("labels").count()
         <xarray.DataArray (labels: 3)>
-        array([1, 2, 2], dtype=int64)
+        array([1, 2, 2])
         Coordinates:
           * labels   (labels) object 'a' 'b' 'c'
         """

Is it just me that this example crashes the second time I run it?

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()
Traceback (most recent call last):

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec
    exec(code, globals, locals)

  File "g:\program\dropbox\python\xarray_groupby_windows_diff.py", line 34, in <module>
    da.groupby("labels").count()

  File "c:\users\j.w\documents\github\xarray\xarray\core\_reductions.py", line 4384, in count
    return self._flox_reduce(

  File "c:\users\j.w\documents\github\xarray\xarray\core\groupby.py", line 738, in _flox_reduce
    result = xarray_reduce(

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\flox\xarray.py", line 240, in xarray_reduce
    ds, *by = xr.broadcast(ds, *by, exclude=exclude_dims)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 1046, in broadcast
    args = align(*args, join="outer", copy=False, exclude=exclude)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 765, in align
    aligner.align()

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 549, in align
    self.find_matching_indexes()

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 256, in find_matching_indexes
    obj_indexes, obj_index_vars = self._normalize_indexes(obj.xindexes)

  File "c:\users\j.w\documents\github\xarray\xarray\core\alignment.py", line 205, in _normalize_indexes
    pd_idx = safe_cast_to_index(data)

  File "c:\users\j.w\documents\github\xarray\xarray\core\utils.py", line 140, in safe_cast_to_index
    index = pd.Index(np.asarray(array), **kwargs)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\pandas\core\indexes\base.py", line 483, in __new__
    data = sanitize_array(data, None, dtype=dtype, copy=copy)

  File "C:\Users\J.W\anaconda3\envs\xarray-tests\lib\site-packages\pandas\core\construction.py", line 524, in sanitize_array
    raise ValueError("index must be specified when data is not list-like")

ValueError: index must be specified when data is not list-like

@headtr1ck
Copy link
Collaborator Author

Is it just me that this example crashes the second time I run it?

Could you specify what you mean by "second time I run it"?
Executing the groupby twice?

@Illviljan
Copy link
Contributor

Just running that script file several times without restarting the console. It might be a Spyder bug though since I can't reproduce it in a stand alone ipython console.

This for example works (the first time):

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

@headtr1ck
Copy link
Collaborator Author

import numpy as np
import pandas as pd
import xarray as xr

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

da = xr.DataArray(
    np.array([1, 2, 3, 1, 2, np.nan]),
    dims="time",
    coords=dict(
        time=("time", pd.date_range("01-01-2001", freq="M", periods=6)),
        labels=("time", np.array(["a", "b", "c", "c", "b", "a"])),
    ),
)
da.groupby("labels").count()

For me this works in a python terminal, python script and jupyter notebook (I don't use spyder but vscode).

@headtr1ck headtr1ck added the plan to merge Final call for comments label Sep 25, 2022
@dcherian dcherian merged commit 226c23b into pydata:main Sep 28, 2022
@headtr1ck headtr1ck deleted the ellipsis branch September 28, 2022 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants