[dask] [python] client.rebalance on dask ranker test #3892

ffineis · 2021-02-02T04:56:05Z

Further addresses #3817 by making test_dask.py a little bit more predictable. Previously the test_ranker test while using output='array' provided very uneven group allocations among its two test workers, unlike in the case of dask.dataframe input. Because #data is so small (e.g. 100 rows), when one worker gets a small amount of data relative to the other(s), this can cause rather significant discrepancies between the Dask ranker and the standard LGBMRanker. See comment for more background.

Applying client.rebalance in this case (small data and uneven worker data distributions) helps make the tests a bit more predictable, tightening the distribution of observed spearman correlations with LGBMRanker. Thanks for the patience to the maintainers!

jameslamb

This is awesome! This is one of those fun changes that looks super simple, but I know a ton of good research went into it in #3817 (comment).

Thanks very much 🚀

jameslamb · 2021-02-02T05:13:10Z

tests/python_package_test/test_dask.py

@@ -409,7 +418,7 @@ def test_ranker(output, client, listen_port, group):
    # have high rank correlation with scores from serial ranker.
    dcor = spearmanr(rnkvec_dask, y).correlation
    assert dcor > 0.6
-    assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.75
+    assert spearmanr(rnkvec_dask, rnkvec_local).correlation > 0.8


nice! I'm comfortable with this, given the distributed you saw in #3817 (comment)

jameslamb · 2021-02-02T05:13:37Z

tests/python_package_test/test_dask.py

 from distributed.utils_test import client, cluster_fixture, gen_cluster, loop
 from scipy.sparse import csr_matrix
 from sklearn.datasets import make_blobs, make_regression
-from sklearn.utils import check_random_state


oh thanks for removing this. Didn't realize it was unused.

all credit goes to pycharm

github-actions · 2023-08-24T01:27:28Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

rebalance dask.array ranker input

f78eac6

ffineis requested a review from jameslamb as a code owner February 2, 2021 04:56

ffineis changed the title ~~client.rebalance on dask ranker test~~ [dask] [python] client.rebalance on dask ranker test Feb 2, 2021

jameslamb added the maintenance label Feb 2, 2021

jameslamb approved these changes Feb 2, 2021

View reviewed changes

jameslamb merged commit a4cae37 into microsoft:master Feb 2, 2021

jameslamb mentioned this pull request Feb 2, 2021

[dask] flaky test_ranker test #3817

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dask] [python] client.rebalance on dask ranker test #3892

[dask] [python] client.rebalance on dask ranker test #3892

ffineis commented Feb 2, 2021 •

edited

Loading

jameslamb left a comment

jameslamb Feb 2, 2021

jameslamb Feb 2, 2021

ffineis Feb 2, 2021

github-actions bot commented Aug 24, 2023

[dask] [python] client.rebalance on dask ranker test #3892

[dask] [python] client.rebalance on dask ranker test #3892

Conversation

ffineis commented Feb 2, 2021 • edited Loading

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb Feb 2, 2021

Choose a reason for hiding this comment

jameslamb Feb 2, 2021

Choose a reason for hiding this comment

ffineis Feb 2, 2021

Choose a reason for hiding this comment

github-actions bot commented Aug 24, 2023

ffineis commented Feb 2, 2021 •

edited

Loading