-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rely on NEP-18 to dispatch to dask in duck_array_ops #5571
Rely on NEP-18 to dispatch to dask in duck_array_ops #5571
Conversation
Hello @TomNicholas! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-09-29 17:13:58 UTC |
Right I understand the failure now - everywhere where |
A net-negative pull request 🤯 |
Looks great! I don't know the details of this code well, but at the conceptual level it looks good!
The best type! |
xarray/core/duck_array_ops.py
Outdated
from numpy import take, tensordot, transpose, unravel_index # noqa | ||
from numpy import where as _where | ||
from numpy import zeros_like # noqa | ||
from numpy.ma import masked_invalid # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do masked arrays support NEP-18?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I don't see an __array_function__
method in MaskedArray
, but it does inherit from ndarray
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does have an __array_function__
attribute
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it does support NEP-18.
However, for some functions (e.g. np.median
) it returns a strange MaskedConstant
object if the result would be a constant. Not sure if it needs special care, but something to be aware of.
xarray/core/duck_array_ops.py
Outdated
from numpy import take, tensordot, transpose, unravel_index # noqa | ||
from numpy import where as _where | ||
from numpy import zeros_like # noqa | ||
from numpy.ma import masked_invalid # noqa |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, it does support NEP-18.
However, for some functions (e.g. np.median
) it returns a strange MaskedConstant
object if the result would be a constant. Not sure if it needs special care, but something to be aware of.
xarray/core/duck_array_ops.py
Outdated
zeros_like = _dask_or_eager_func("zeros_like") | ||
# Requires special-casing because pandas won't automatically dispatch to dask.isnull via NEP-18 | ||
def _dask_or_eager_isnull(obj): | ||
if is_duck_dask_array(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does dask_array.isnull
dispatch for duck dask arrays, or does this also create e.g. dask(pint(dask))
objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does this:
d = dask_array.array([1, 2, 3])
q = pint.Quantity(d, units='m')
dask.array.isnull(q)
raises warnings
/home/tegn500/miniconda3/envs/xarray-testing-min-all-deps-py37/lib/python3.7/site-packages/dask/array/core.py:2756: UserWarning: Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior.
"Passing an object to dask.array.from_array which is already a "
/home/tegn500/miniconda3/envs/xarray-testing-min-all-deps-py37/lib/python3.7/site-packages/numpy/core/_asarray.py:85: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
return array(a, dtype, copy=False, order=order)
and returns the result
dask.array<_asarray_isnull, shape=(3,), dtype=bool, chunksize=(3,), chunktype=numpy.ndarray>
I think that's a sensible result of calling isnull
on a chunked pint array?
Now there is one more mystery: why do some of the tests for padding fail with output that is 0.5 away from what's expected??:
|
We work around some dask bugs with EDIT: though I don't see this kind of thing being fixed there. |
The issue mentioned in |
@TomNicholas try merging main now. Let's see if there are any errors left. |
@Illviljan that actually solved those padding errors! Awesome! The tests still fail because of something going on with |
I think you just need to fix the spaces in the expected output |
___________ [doctest] xarray.core._typed_ops.DataArrayOpsMixin.round ___________
EXAMPLE LOCATION UNKNOWN, not showing all tests of that example
??? >>> np.around([0.37, 1.64])
Expected:
array([0., 2.])
Got:
array([0., 2.])
/home/runner/work/xarray/xarray/xarray/core/_typed_ops.py:None: DocTestFailure The expected had 2 spaces after the comma for some reason. I think the actual output makes more sense now so I think it's fine to just accept this difference and change the expected output. |
Seems to be a problem in numpy I wonder if the docstring can be ignored if it is copied from somewhere else? I think an easy workaround is simply replacing the docstring: xarray/xarray/core/duck_array_ops.py Line 75 in 4fd81b5
around = _dask_or_eager_func("around")
# np.around has failing doctests, overwrite it so they pass:
around.__doc__ = """
Evenly round to the given number of decimals.
Parameters
----------
a : array_like
Input data.
decimals : int, optional
Number of decimal places to round to (default: 0). If
decimals is negative, it specifies the number of positions to
the left of the decimal point.
out : ndarray, optional
Alternative output array in which to place the result. It must have
the same shape as the expected output, but the type of the output
values will be cast if necessary. See :ref:`ufuncs-output-type` for more
details.
Returns
-------
rounded_array : ndarray
An array of the same type as `a`, containing the rounded values.
Unless `out` was specified, a new array is created. A reference to
the result is returned.
The real and imaginary parts of complex numbers are rounded
separately. The result of rounding a float is a float.
See Also
--------
ndarray.round : equivalent method
ceil, fix, floor, rint, trunc
Notes
-----
For values exactly halfway between rounded decimal values, NumPy
rounds to the nearest even value. Thus 1.5 and 2.5 round to 2.0,
-0.5 and 0.5 round to 0.0, etc.
``np.around`` uses a fast but sometimes inexact algorithm to round
floating-point datatypes. For positive `decimals` it is equivalent to
``np.true_divide(np.rint(a * 10**decimals), 10**decimals)``, which has
error due to the inexact representation of decimal fractions in the IEEE
floating point standard [1]_ and errors introduced when scaling by powers
of ten. For instance, note the extra "1" in the following:
>>> np.round(56294995342131.5, 3)
56294995342131.51
If your goal is to print such values with a fixed number of decimals, it is
preferable to use numpy's float printing routines to limit the number of
printed decimals:
>>> np.format_float_positional(56294995342131.5, precision=3)
'56294995342131.5'
The float printing routines use an accurate but much more computationally
demanding algorithm to compute the number of digits after the decimal
point.
Alternatively, Python's builtin `round` function uses a more accurate
but slower algorithm for 64-bit floating point values:
>>> round(56294995342131.5, 3)
56294995342131.5
>>> np.round(16.055, 2), round(16.055, 2) # equals 16.0549999999999997
(16.06, 16.05)
References
----------
.. [1] "Lecture Notes on the Status of IEEE 754", William Kahan,
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
.. [2] "How Futile are Mindless Assessments of
Roundoff in Floating-Point Computation?", William Kahan,
https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf
Examples
--------
>>> np.around([0.37, 1.64])
array([0., 2.])
>>> np.around([0.37, 1.64], decimals=1)
array([0.4, 1.6])
>>> np.around([.5, 1.5, 2.5, 3.5, 4.5]) # rounds to nearest even value
array([0., 2., 2., 4., 4.])
>>> np.around([1,2,3,11], decimals=1) # ndarray of ints is returned
array([ 1, 2, 3, 11])
>>> np.around([1,2,3,11], decimals=-1)
array([ 0, 0, 0, 10])
""" |
Saw dask does similar fixes too: Here's a version inspired by that one: around = _dask_or_eager_func("around")
# np.around has failing doctests, overwrite it so they pass:
# https://github.com/numpy/numpy/issues/19759
around.__doc__ = test.__doc__.replace(
"array([0., 2.])",
"array([0., 2.])",
)
around.__doc__ = test.__doc__.replace(
"array([0.4, 1.6])",
"array([0.4, 1.6])",
)
around.__doc__ = test.__doc__.replace(
"array([0., 2., 2., 4., 4.])",
"array([0., 2., 2., 4., 4.])",
) |
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Thanks @Illviljan |
I must say it gets kind of annoying having so pedantic doctests and doc generation when upstream modules aren't as picky. |
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
I don't understand why doctests doesn't go through the if path, is it using tricks that can make Anyway I think just ignoring these typing errors might be easier. Readthedocs error:
|
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Doctests passed! Thanks so much @Illviljan ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though maybe we should merge main again before merging..
Thanks for the reminder @dcherian - I merged main and all the tests pass so I'll merge this PR now! |
The what's new entry for this went in under the wrong edition - I fixed it in ebfc6a3 |
* basic test for the mean * minimum to get mean working * don't even need to call dask specifically * remove reference to dask when dispatching to modules * fixed special case of pandas vs dask isnull * removed _dask_or_eager_func completely * noqa * pre-commit * what's new * linting * properly import dask for test * fix iris conversion error by rolling back treatment of np.ma.masked_invalid * linting * Update xarray/core/duck_array_ops.py Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * Update xarray/core/duck_array_ops.py Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> * Update xarray/core/duck_array_ops.py Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
Removes special-casing for dask in
duck_array_ops.py
, instead relying on NEP-18 to call it when the input is a dask array.Probably actually don't need the
_dask_or_eager_func()
(now_module_func()
) helper function at all, because all remaining instances look likepandas_isnull = _module_func("isnull", module=pd)
, which could just bepandas_isnull = pd.isnull
.Only problem is that I seem to have broken one (parameterized) test:
test_duck_array_ops.py::test_min_count[True-True-None-sum-True-bool_-1]
fails withpre-commit run --all-files
whats-new.rst