Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile with Dask arrays #3326

Closed
jkmacc-LANL opened this issue Sep 20, 2019 · 0 comments · Fixed by #3559
Closed

quantile with Dask arrays #3326

jkmacc-LANL opened this issue Sep 20, 2019 · 0 comments · Fixed by #3559

Comments

@jkmacc-LANL
Copy link

jkmacc-LANL commented Sep 20, 2019

Currently the quantile method raises an exception when it encounters a Dask array.

        if isinstance(self.data, dask_array_type):
            raise TypeError(
                "quantile does not work for arrays stored as dask "
                "arrays. Load the data via .compute() or .load() "
                "prior to calling this method."
            )

I think it's because taking a quantile needs to see all the data in the dimension it's quantile-ing, or blocked/approximate methods weren't on hand when the feature was added. Dask arrays where the dimension being quantile-ed was exactly one chunk in extent seem like a special case where no blocked algorithm is needed.

The problem with following the suggestion of the exception (loading the array into memory) is that "wide and shallow" arrays are too big to load into memory, yet each chunk is statistically independent if the quantile dimension is the "shallow" dimension.

I'm not necessarily proposing delegating to Dask's quantile (unless it's super easy), but wanted to explore this special case described above.

Related links:

Thank you!

EDIT: added stackoverflow link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant