Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] determine output shape of array in predict (fixes #4285) #4351

Merged
merged 4 commits into from
Jul 8, 2021

Conversation

jmoralez
Copy link
Collaborator

@jmoralez jmoralez commented Jun 8, 2021

In more recent versions of dask (>2021.4.0) the tests for the predict method with dask arrays are failing. Following #4307 (comment), I believe a change in dask array caused the current implementation to stop working.

This PR proposes determining the output shape of the array containing the predictions and being explicit about it in the call to map_blocks. Right now I'm computing the predictions for the first row of data, however this is failing for test_classifier_pred_contrib[multiclass-classification-scipy_csr_matrix] and as pointed out in #4307 (comment) there could be better approaches.

@jameslamb
Copy link
Collaborator

@jmoralez what is your plan for this PR? I noticed it is still a draft...are you planning to propose a change that will get CI passing, or are you waiting for me to suggest one?

If lightgbm is currently incompatible with dask=2021.5.0+, I definitely want to get a fix in for v3.3.0 (#4310).

@jmoralez
Copy link
Collaborator Author

I wanted to discuss my current approach, I think its more robust but I'm very confused about how the current implementation managed to convert a list to coo in the sparse matrices case.

@jameslamb
Copy link
Collaborator

Good morning, @jmoralez ! Now that #4378 has been merged to handle the multiclass pred_contrib case, can you update this PR? I support the approach you're taking where we get the shape by predicting on one row.

@jmoralez jmoralez marked this pull request as ready for review July 8, 2021 02:43
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, thanks! And I appreciate you also making the decision to thread the client through and use client.compute() instead of calling the Dask Array's .compute() method, to make sure that the expected client is being used.

I checked the logs of some CI jobs and I can see that the tests are running successfully against newer versions of Dask, as expected.

For example, linux regular: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=10548&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf

dask-2021.6.2              |     pyhd3eb1b0_0           5 KB
dask-core-2021.6.2     |     pyhd3eb1b0_0         692 KB
distributed-2021.6.2   |   py39h06a4308_0         1.1 MB

And these changes are working for older versions of Dask as well. For example, Linux_latest regular uses Python 3.6, so conda gives it dask / distributed 2021.3.0

https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=10548&view=logs&j=275189f9-c769-596a-7ef9-49fb48a9ab70&t=3a9e7a4a-04e6-52a0-67ea-6e8f6cfda74f

dask-2021.3.0              |     pyhd3eb1b0_0           5 KB
dask-core-2021.3.0         |     pyhd3eb1b0_0         659 KB
distributed-2021.3.0       |   py36h06a4308_0         1.0 MB

I'm very happy that we'll be able to get this fix into the 3.3.0 release (#4310). Thanks as always for your help!

@jameslamb jameslamb changed the title [dask] determine output shape of array in predict [dask] determine output shape of array in predict (fixes #4285) Jul 8, 2021
@jameslamb jameslamb merged commit f836fe0 into microsoft:master Jul 8, 2021
@jmoralez jmoralez deleted the dask/array-predict branch July 8, 2021 17:07
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants