Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UserWarning when wrapping pint & dask arrays together #5559

Closed
TomNicholas opened this issue Jul 1, 2021 · 4 comments · Fixed by #5571
Closed

UserWarning when wrapping pint & dask arrays together #5559

TomNicholas opened this issue Jul 1, 2021 · 4 comments · Fixed by #5571
Labels
bug topic-arrays related to flexible array support

Comments

@TomNicholas
Copy link
Member

With pint-xarray you can create a chunked, unit-aware xarray object, but calling a calculation method and then computing doesn't appear to behave as hoped.

da = xr.DataArray([1,2,3], attrs={'units': 'metres'})

chunked = da.chunk(1).pint.quantify()
print(chunked.compute())
<xarray.DataArray (dim_0: 3)>
<Quantity([1 2 3], 'meter')>
Dimensions without coordinates: dim_0

So far this is fine, but if we try to take a mean before computing we get

print(chunked.mean().compute())
<xarray.DataArray ()>
<Quantity(dask.array<true_divide, shape=(), dtype=float64, chunksize=(), chunktype=numpy.ndarray>, 'meter')>
/home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:3139: UserWarning: Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior.
  warnings.warn(

This is not correct: as well as the UserWarning, the return value of compute is a dask array, meaning we need to compute a second time to actually get the answer:

print(chunked.mean().compute().compute())
<xarray.DataArray ()>
<Quantity(2.0, 'meter')>
/home/tegn500/miniconda3/envs/py38-mamba/lib/python3.8/site-packages/dask/array/core.py:3139: UserWarning: Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior.
  warnings.warn(

If we try chunking the other way (chunked = da.pint.quantify().pint.chunk(1)) then we get all the same results.

xref xarray-contrib/pint-xarray#116 and #4972 @keewis

@TomNicholas TomNicholas added bug topic-arrays related to flexible array support labels Jul 1, 2021
@jthielen
Copy link
Contributor

jthielen commented Jul 1, 2021

Is it correct that xarray ends up calling dask.array.mean() on the pint.Quantity(dask.Array) object inside the DataArray? I took a guess at that since I can replicate what is happening inside the DataArray with

import dask.array as da

da = xr.DataArray([1,2,3], attrs={'units': 'metres'})

chunked = da.chunk(1).pint.quantify()

da.mean(chunked.variable._data)

Also, the Dask warning Passing an object to dask.array.from_array which is already a Dask collection. This can lead to unexpected behavior. is a big red flag that the Pint Quantity is making its way into Dask internals where it should not end up.

If so, I think this gets into a thorny issue with duck array handling in Dask. It was decided in dask/dask#6393 that deliberately calling Dask array operations like elemwise (so, presumably by extension, blockwise and the reductions in dask.array.reductions like mean()) on a non-Dask array implies that the user wants to turn that array into a dask array. This get problematic, however, for upcast types like Pint Quantities that wrap Dask Arrays, since then you can get dask.Array(pint.Quantity(dask.Array)), which is what I think is going on here?

If this all checks out, I believe this becomes a Dask issue to improve upcast type/duck Dask array handling.

@dcherian
Copy link
Contributor

dcherian commented Jul 1, 2021

Is it correct that xarray ends up calling dask.array.mean() on the pint.Quantity(dask.Array) object inside the DataArray?

Yes that's correct. See

if any(is_duck_dask_array(a) for a in dispatch_args):
try:
wrapped = getattr(dask_module, name)

It may be time to update this method since we now depend on a minimum numpy version that supports NEP-18.

cc @shoyer

EDIT: You get there from _create_nan_agg_method:

func = _dask_or_eager_func(name, dask_module=dask_module)

@TomNicholas
Copy link
Member Author

So this is actually an xarray problem not a dask/pint problem? And the solution would be to just call the method on the duck array without any kind of type checking first?

@dcherian
Copy link
Contributor

dcherian commented Jul 1, 2021

And the solution would be to just call the method on the duck array without any kind of type checking first?

Or np.method(array)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-arrays related to flexible array support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants