Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] preserve chunks in results of multi-class pred_contrib predictions on sparse matrices #4438

Closed
jameslamb opened this issue Jul 4, 2021 · 1 comment

Comments

@jameslamb
Copy link
Collaborator

Summary

As of #4378, DaskLGBMClassifier.predict(X, pred_contrib=True) returns a list of Dask Arrays if the model is a multiclass classification model and X is a scipy sparse array.

However, those Dask Arrays only have a single chunk. That code should be updated to preserve the original chunking from X.

Motivation

Preserving the chunking would improve the parallelism of any postprocessing of the prediction results using other Dask Array operations, which would reduce the risk of out-of-memory issues.

Description

See #4378 (comment) for a proposed solution, using dask.array.core.concatenate_lookup().

References

Created from #4378 (comment) and #4378 (comment).

This issue is only relevant once #4378 is merged.

The different output format for the multiclass + pred_contrib + sparse X case is described in detail in #3881.

@jameslamb
Copy link
Collaborator Author

Per this project's process, I've added this to #2302, the issue where all feature requests are tracked. Anyone is welcome to contribute this feature. Please leave a comment here if you're interested in contributing and this issue can be re-opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant