Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dask] Use client to persist collections #6722

Merged
merged 2 commits into from
Feb 25, 2021
Merged

[dask] Use client to persist collections #6722

merged 2 commits into from
Feb 25, 2021

Conversation

jose-moralez
Copy link
Contributor

This attempts to solve #6712 by using the client object when persisting collections. This ensures that the futures are computed by the client passed to xgb.dask.DaskDMatrix and allows for several trainings to happen concurrently on different clusters.

@jose-moralez
Copy link
Contributor Author

Hi @trivialfis, here's my proposal for solving #6712. I ran the included test with the current master branch and was able to reproduce the ValueError: Inputs contain futures that were created by another client. error. Is including weights enough to test that meta is being computed correctly as well? Looking forward to your thoughts on this.

Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! A small error in test.

#6726 Will fix the CI issue.

asynchronous=True,
dashboard_address=0) as cluster:
async with Client(cluster, asynchronous=True) as client:
X, y, w = generate_array()
Copy link
Member

@trivialfis trivialfis Feb 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generate_array(with_weights=True).

@jose-moralez
Copy link
Contributor Author

Thanks for the fix. Should I wait for #6726 to get merged to update this?

@trivialfis
Copy link
Member

Should I wait for #6726 to get merged to update this?

Yup. Waiting for review.

@codecov-io
Copy link

codecov-io commented Feb 25, 2021

Codecov Report

Merging #6722 (0579029) into master (a4101de) will decrease coverage by 0.02%.
The diff coverage is 86.90%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6722      +/-   ##
==========================================
- Coverage   81.55%   81.53%   -0.03%     
==========================================
  Files          13       13              
  Lines        3719     3769      +50     
==========================================
+ Hits         3033     3073      +40     
- Misses        686      696      +10     
Impacted Files Coverage Δ
python-package/xgboost/data.py 62.63% <77.77%> (+0.07%) ⬆️
python-package/xgboost/sklearn.py 90.08% <78.78%> (-0.62%) ⬇️
python-package/xgboost/core.py 81.94% <87.23%> (+0.09%) ⬆️
python-package/xgboost/dask.py 82.57% <90.90%> (-0.10%) ⬇️
python-package/xgboost/training.py 95.60% <100.00%> (+0.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c375173...0579029. Read the comment docs.

@trivialfis trivialfis merged commit b6167cd into dmlc:master Feb 25, 2021
@jose-moralez jose-moralez deleted the persist-with-client branch February 25, 2021 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants