-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] determine output shape of array in predict (fixes #4285) #4351
Conversation
I wanted to discuss my current approach, I think its more robust but I'm very confused about how the current implementation managed to convert a list to coo in the sparse matrices case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach, thanks! And I appreciate you also making the decision to thread the client through and use client.compute()
instead of calling the Dask Array's .compute()
method, to make sure that the expected client is being used.
I checked the logs of some CI jobs and I can see that the tests are running successfully against newer versions of Dask, as expected.
For example, linux regular
: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=10548&view=logs&j=c28dceab-947a-5848-c21f-eef3695e5f11&t=fa158246-17e2-53d4-5936-86070edbaacf
dask-2021.6.2 | pyhd3eb1b0_0 5 KB
dask-core-2021.6.2 | pyhd3eb1b0_0 692 KB
distributed-2021.6.2 | py39h06a4308_0 1.1 MB
And these changes are working for older versions of Dask as well. For example, Linux_latest regular
uses Python 3.6, so conda gives it dask
/ distributed
2021.3.0
dask-2021.3.0 | pyhd3eb1b0_0 5 KB
dask-core-2021.3.0 | pyhd3eb1b0_0 659 KB
distributed-2021.3.0 | py36h06a4308_0 1.0 MB
I'm very happy that we'll be able to get this fix into the 3.3.0 release (#4310). Thanks as always for your help!
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
In more recent versions of dask (>2021.4.0) the tests for the predict method with dask arrays are failing. Following #4307 (comment), I believe a change in dask array caused the current implementation to stop working.
This PR proposes determining the output shape of the array containing the predictions and being explicit about it in the call to
map_blocks
. Right now I'm computing the predictions for the first row of data, however this is failing fortest_classifier_pred_contrib[multiclass-classification-scipy_csr_matrix]
and as pointed out in #4307 (comment) there could be better approaches.