Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spilling on demand #756

Merged
merged 8 commits into from
Oct 29, 2021
Merged

Conversation

madsbk
Copy link
Member

@madsbk madsbk commented Oct 15, 2021

Use rapidsai/rmm#892 to implement spilling on demand. Requires use of RMM and JIT-unspill enabled.

The device_memory_limit still works as usual -- when known allocations gets to device_memory_limit, Dask-CUDA starts spilling preemptively. However, with this PR it is should be possible to increase device_memory_limit significantly since memory spikes will be handled by spilling on demand.

Closes #755

@github-actions github-actions bot added the python python code needed label Oct 15, 2021
@madsbk madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed python python code needed labels Oct 15, 2021
@pentschev
Copy link
Member

The failure is an OOM in the new test_spill_on_demand. Perhaps we should look for a way to do some sort of mock test? Maybe we can find a way to rewrite RMM's allocation function or the callback function to trigger in a programatic way?

@madsbk madsbk force-pushed the spilling_on_demand branch from 181e1e5 to b858620 Compare October 27, 2021 14:05
@github-actions github-actions bot added the python python code needed label Oct 27, 2021
@codecov-commenter
Copy link

codecov-commenter commented Oct 27, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@ee58ad5). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 8cefb89 differs from pull request most recent head 9ab0f42. Consider uploading reports for the commit 9ab0f42 to get more accurate results
Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12     #756   +/-   ##
===============================================
  Coverage                ?   58.20%           
===============================================
  Files                   ?       21           
  Lines                   ?     2991           
  Branches                ?        0           
===============================================
  Hits                    ?     1741           
  Misses                  ?     1250           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee58ad5...9ab0f42. Read the comment docs.

@madsbk madsbk force-pushed the spilling_on_demand branch from 41d8c19 to 7146d26 Compare October 28, 2021 11:28
@madsbk madsbk force-pushed the spilling_on_demand branch 2 times, most recently from 2ab25da to a4d2b6f Compare October 28, 2021 13:27
@madsbk madsbk marked this pull request as ready for review October 28, 2021 13:50
@madsbk madsbk requested a review from a team as a code owner October 28, 2021 13:50
@madsbk madsbk changed the title [WIP] Spilling on demand [REVIEW] Spilling on demand Oct 28, 2021
@madsbk madsbk added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 28, 2021
@madsbk madsbk changed the title [REVIEW] Spilling on demand Spilling on demand Oct 28, 2021
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @madsbk , thanks for the great work. I added a couple of minor suggestions only.

madsbk and others added 2 commits October 28, 2021 17:07
Co-authored-by: Peter Andreas Entschev <peter@entschev.com>
@VibhuJawa
Copy link
Member

VibhuJawa commented Oct 29, 2021

I tested this on the spilling related workflow mentioned here and was able to increase the memory limit from 0.6 to 0.9 but some how we take more time 2min 32s with the PR vs 1min 46s on main. Any reason for that to happen ?

CC: @randerzander

@madsbk
Copy link
Member Author

madsbk commented Oct 29, 2021

I tested this on the spilling related workflow mentioned here and was able to increase the memory limit from 0.6 to 0.9 but some how we take more time 2min 32s with the PR vs 1min 46s on main. Any reason for that to happen ?

In principle yes, a memory limit of 0.6 could trigger favorable spilling by chance but generally spilling-on-demand should be better. @VibhuJawa how many runs did you do? In my experiment the memory requirement and runtime of a shuffle workflow fluctuate drastically.
Can I ask you to try comparing main vs this PR setting the a memory limit to 0.6 in both cases? If this PR doesn't slowdown the execute by itself, I think we are good to merge it.
Afterwards, we should investigate why setting the memory limit to 0.9 might be a disadvantageous in some cases.

@VibhuJawa
Copy link
Member

Can I ask you to try comparing main vs this PR setting the a memory limit to 0.6 in both cases? If this PR doesn't slowdown the execute by itself, I think we are good to merge it.

We get similar performence in this case on PR and main. Agreed that PR itself does not cause a slowdown.

@VibhuJawa how many runs did you do?

I did two runs so not a comprehensive test by any means.

@jakirkham
Copy link
Member

@gpucibot merge

@rapids-bot rapids-bot bot merged commit b52d1d6 into rapidsai:branch-21.12 Oct 29, 2021
@jakirkham
Copy link
Member

Thanks Mads for the PR! Also Peter and Vibhu for the review and testing 😄

@EvenOldridge
Copy link

Excited to try this out!

@jakirkham
Copy link
Member

Nightlies are up. So should be good to go Even 😀

@madsbk madsbk deleted the spilling_on_demand branch November 1, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change python python code needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] On demand memory spilling
6 participants