-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests][dask] replace client fixture with cluster fixture #4159
Conversation
I just thought that this could also be achieved by defining a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for doing this!
It looks like this might save on the order of 16 minutes in our CI jobs, that is an AMAZING improvement. 🙌
Linux regular
on master
(logs)
===== 510 passed, 6 skipped, 2 xfailed, 435 warnings in 1221.07s (0:20:21) =====
Linux regular
on this PR (logs)
===== 510 passed, 6 skipped, 2 xfailed, 340 warnings in 204.23s (0:03:24) ======
I'm +1 on this PR, just left one small suggestion. I've also provided some more answers and information below.
I just thought that this could also be achieved by defining a new client fixture that just creates a client from the cluster fixture defined here. This would involve minimal changes to the existing code (only the client that the tests receive would be different, but the code would stay the same). Should I go this way?
If this is just for the sake of code organization but wouldn't have any functional or performance benefit, I'd rather not. Using fixtures of fixtures makes me (maybe irrationally) nervous. I'm ok with the changes you've made in this PR, passing in a cluster and then creating a client from it. I'm also ok with using the client context manager to handle teardowns instead of Client.close()
.
In the future, when you open a PR or issue that comes from some previous discussion, please link that discussion. That helps anyone who finds this PR from search to find all of the relevant context, and helps reviewers who might not have been part of that original conversation to learn what they need to give an effective review.
In this case, I'm linking #3829 (comment).
You don't have to close and re-open PRs on GitHub to re-trigger CI. In most projects, you can just push an empty commit. In LightGBM we have a slight preference for that because a new commit means new builds, which means it is possible to view the logs from the previous builds, which can be useful in debugging. Some CI services overwrite builds if you close and re-open. This is why, for example, only the most recent build of this PR is currently available on GitHub Actions, even though there have been two.
You can push an empty commit like this:
git commit --allow-empty -m "empty commit"
git push tests/cluster-fixture
Yes, this was just to avoid the (seemingly) huge diff.
Yeah, that's my bad. I saw you do it on some previous PRs and forgot to do it here.
I usually do that but when I opened this it only triggered four actions (WIP, license/cla, continuous-integraction/appveyor/pr and Microsoft.LightGBM) and only one ran (WIP), the other three got stuck in "queued" which seemed weird. I didn't try the empty commit because I thought something had gone wrong when opening this PR (it took like 3 minutes for me to be able to see it, I thought I had missclicked somewhere). So I don't believe anything was lost in this case but will definitely try the empty commit first from now on. |
You can safely ignore the R-package CI failures, they're not related to this PR's changes. They are part of #4160. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks! Once #4161 is merged, the R tests should be fixed and we'll be able to merge this.
Ok @jmoralez , I think CI should be ok now. At your earliest convenience, can you update this to the latest |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Motivation: #4088 (comment)
This replaces the client fixture (which creates and tears down a
LocalCluster
for every test) with a cluster fixture that creates a cluster once for all the tests and then every test creates a client based on it. This reduces the time it takes fortest_dask
to run significantly.I replaced all the
client.close(timeout=CLIENT_CLOSE_TIMEOUT)
with a context manager for theclient
, so the diff seems awful but there's an option to hide whitespace changes in github which can be toggled in the little gears when reviewing the files. If this is too much I can just add aclient = Client(cluster)
at the top of every test instead, but I thought it'd be nice to unify all the tests to use this context manager.I modified
test_errors
to use a regular cluster (instead of the async one) as well.I also had to add a
client.restart
intest_machines_should_be_used_if_provided
because that triggers an error on the worker which hadlocal_listen_port
already bound but the other one keeps waiting, so adding the restart resets that worker.In
test_error_on_feature_parallel_tree_learner
I had to add aclient.rebalance
because the error wasn't always being triggered. I assume it's because the test data had two partitions and there was a chance that they both went to the same worker and we werent performing distributed training, which inspired #4149.