Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix UCX examples for InfiniBand #556

Merged
merged 3 commits into from
Mar 26, 2021

Conversation

charlesbluca
Copy link
Member

Some small changes to the UCX examples to fix some IB-related issues I discussed with @pentschev:

  • Disables RDMACM for all examples; right now this isn't working, but when it is we can uncomment out the original lines
  • Use --scheduler-file to link dask-scheduler to dask-cuda-workers, as localhost cannot be used when IB is enabled
  • Add a small workload for the client and shut it down in the cluster and client setup examples

@charlesbluca charlesbluca added python python code needed 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 25, 2021
@github-actions github-actions bot removed the python python code needed label Mar 25, 2021
@codecov-io
Copy link

codecov-io commented Mar 25, 2021

Codecov Report

Merging #556 (3f81213) into branch-0.19 (9747c96) will increase coverage by 31.39%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-0.19     #556       +/-   ##
================================================
+ Coverage        61.06%   92.46%   +31.39%     
================================================
  Files               22       16        -6     
  Lines             2571     1605      -966     
================================================
- Hits              1570     1484       -86     
+ Misses            1001      121      -880     
Impacted Files Coverage Δ
dask_cuda/benchmarks/local_cupy_map_overlap.py
dask_cuda/benchmarks/local_cupy.py
dask_cuda/benchmarks/local_cudf_shuffle.py
dask_cuda/benchmarks/utils.py
dask_cuda/_version.py
dask_cuda/benchmarks/local_cudf_merge.py
dask_cuda/explicit_comms/dataframe/shuffle.py 97.94% <0.00%> (+0.68%) ⬆️
dask_cuda/utils.py 90.86% <0.00%> (+1.02%) ⬆️
dask_cuda/proxy_object.py 91.36% <0.00%> (+1.11%) ⬆️
dask_cuda/cli/dask_cuda_worker.py 96.96% <0.00%> (+1.51%) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9747c96...3f81213. Read the comment docs.

x = rs.random((10000, 10000), chunks=1000)
x.sum().compute()

# shutdown client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# shutdown client
# shutdown cluster

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shuts down the entire cluster, not the client: https://distributed.dask.org/en/latest/api.html#distributed.Client.shutdown .

client = Client(address) # noqa F841
client = Client(address)

# client code here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading, I would rename that to user code. Saying client code may sound like it's executed on the client-side only.

x = rs.random((10000, 10000), chunks=1000)
x.sum().compute()

# shutdown client
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# shutdown client
# shutdown cluster

@charlesbluca
Copy link
Member Author

Thanks for the review @pentschev!

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @charlesbluca !

@pentschev pentschev added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Mar 26, 2021
@pentschev
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit cb9bf35 into rapidsai:branch-0.19 Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants