Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide wrapper for nan_array to give lazy object correct dtype #2530

Closed
wants to merge 1 commit into from

Conversation

djkirkham
Copy link
Contributor

As I suggested yesterday, a wrapper class around an array to make dask think it has a particular dtype. This will make fixing the issue of getting the correct dtype for cube operations (#2528) significantly easier.

Currently the tests fail.

@djkirkham
Copy link
Contributor Author

@pp-mo @dkillick @bjlittle

@djkirkham
Copy link
Contributor Author

This causes concatenating cubes to give incorrect results in some cases. Here's some code that deomonstrates what goes wrong at the dask level:

import dask.array as da
import numpy as np

class _DtypeWrapper(object):
    def __init__(self, array, dtype):
        self.dtype = dtype
        self.array = array

    @property
    def shape(self):
        return self.array.shape

    @property
    def ndim(self):
        return self.array.ndim

    def __getitem__(self, item):
        return self.array.__getitem__(item)


a = np.array([1,2])
b = np.array([3.,np.nan])
b_wrapped = _DtypeWrapper(b, dtype=int)
a_dask = da.from_array(a, 1)
b_dask = da.from_array(b_wrapped, 1)
c = da.concatenate([a_dask, b_dask])

print c.compute()

yields:

[                   1                    2                    3
 -9223372036854775808]

This is because the second array is being interpreted as integer (compare with b.astype(int)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants