Spilling on demand #756

madsbk · 2021-10-15T06:58:49Z

Use rapidsai/rmm#892 to implement spilling on demand. Requires use of RMM and JIT-unspill enabled.

The device_memory_limit still works as usual -- when known allocations gets to device_memory_limit, Dask-CUDA starts spilling preemptively. However, with this PR it is should be possible to increase device_memory_limit significantly since memory spikes will be handled by spilling on demand.

Closes #755

pentschev · 2021-10-15T15:20:52Z

The failure is an OOM in the new test_spill_on_demand. Perhaps we should look for a way to do some sort of mock test? Maybe we can find a way to rewrite RMM's allocation function or the callback function to trigger in a programatic way?

codecov-commenter · 2021-10-27T14:16:04Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@ee58ad5). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 8cefb89 differs from pull request most recent head 9ab0f42. Consider uploading reports for the commit 9ab0f42 to get more accurate results

@@               Coverage Diff               @@
##             branch-21.12     #756   +/-   ##
===============================================
  Coverage                ?   58.20%           
===============================================
  Files                   ?       21           
  Lines                   ?     2991           
  Branches                ?        0           
===============================================
  Hits                    ?     1741           
  Misses                  ?     1250           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee58ad5...9ab0f42. Read the comment docs.

pentschev

LGTM @madsbk , thanks for the great work. I added a couple of minor suggestions only.

dask_cuda/tests/test_proxify_host_file.py

Co-authored-by: Peter Andreas Entschev <peter@entschev.com>

VibhuJawa · 2021-10-29T05:40:48Z

I tested this on the spilling related workflow mentioned here and was able to increase the memory limit from 0.6 to 0.9 but some how we take more time 2min 32s with the PR vs 1min 46s on main. Any reason for that to happen ?

CC: @randerzander

madsbk · 2021-10-29T12:03:41Z

I tested this on the spilling related workflow mentioned here and was able to increase the memory limit from 0.6 to 0.9 but some how we take more time 2min 32s with the PR vs 1min 46s on main. Any reason for that to happen ?

In principle yes, a memory limit of 0.6 could trigger favorable spilling by chance but generally spilling-on-demand should be better. @VibhuJawa how many runs did you do? In my experiment the memory requirement and runtime of a shuffle workflow fluctuate drastically.
Can I ask you to try comparing main vs this PR setting the a memory limit to 0.6 in both cases? If this PR doesn't slowdown the execute by itself, I think we are good to merge it.
Afterwards, we should investigate why setting the memory limit to 0.9 might be a disadvantageous in some cases.

VibhuJawa · 2021-10-29T16:07:56Z

Can I ask you to try comparing main vs this PR setting the a memory limit to 0.6 in both cases? If this PR doesn't slowdown the execute by itself, I think we are good to merge it.

We get similar performence in this case on PR and main. Agreed that PR itself does not cause a slowdown.

@VibhuJawa how many runs did you do?

I did two runs so not a comprehensive test by any means.

jakirkham · 2021-10-29T18:51:24Z

@gpucibot merge

jakirkham · 2021-10-29T18:52:06Z

Thanks Mads for the PR! Also Peter and Vibhu for the review and testing 😄

EvenOldridge · 2021-10-29T20:08:39Z

Excited to try this out!

jakirkham · 2021-10-29T20:32:22Z

Nightlies are up. So should be good to go Even 😀

github-actions bot added the python python code needed label Oct 15, 2021

madsbk added 2 - In Progress Currently a work in progress improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed python python code needed labels Oct 15, 2021

madsbk force-pushed the spilling_on_demand branch from 181e1e5 to b858620 Compare October 27, 2021 14:05

github-actions bot added the python python code needed label Oct 27, 2021

madsbk added 2 commits October 28, 2021 13:26

Implement force_evict_from_device()

e00b738

Implement spill on demand, disabled by default

7146d26

madsbk force-pushed the spilling_on_demand branch from 41d8c19 to 7146d26 Compare October 28, 2021 11:28

madsbk added 3 commits October 28, 2021 13:42

Adding test: test_spill_on_demand()

bf6b756

doc

3de8235

enable spill-on-demand by default

a4d2b6f

madsbk force-pushed the spilling_on_demand branch 2 times, most recently from 2ab25da to a4d2b6f Compare October 28, 2021 13:27

madsbk marked this pull request as ready for review October 28, 2021 13:50

madsbk requested a review from a team as a code owner October 28, 2021 13:50

madsbk changed the title ~~[WIP] Spilling on demand~~ [REVIEW] Spilling on demand Oct 28, 2021

madsbk added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Oct 28, 2021

madsbk changed the title ~~[REVIEW] Spilling on demand~~ Spilling on demand Oct 28, 2021

pentschev approved these changes Oct 28, 2021

View reviewed changes

dask_cuda/tests/test_proxify_host_file.py Outdated Show resolved Hide resolved

dask_cuda/tests/test_proxify_host_file.py Outdated Show resolved Hide resolved

madsbk and others added 2 commits October 28, 2021 17:07

doc

f608f74

Co-authored-by: Peter Andreas Entschev <peter@entschev.com>

test_spill_on_demand(): use get_device_total_memory()

8cefb89

test_spill_on_demand(): clean up

9ab0f42

rapids-bot bot merged commit b52d1d6 into rapidsai:branch-21.12 Oct 29, 2021

madsbk deleted the spilling_on_demand branch November 1, 2021 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spilling on demand #756

Spilling on demand #756

madsbk commented Oct 15, 2021

pentschev commented Oct 15, 2021

codecov-commenter commented Oct 27, 2021 •

edited

Loading

pentschev left a comment

VibhuJawa commented Oct 29, 2021 •

edited

Loading

madsbk commented Oct 29, 2021 •

edited

Loading

VibhuJawa commented Oct 29, 2021

jakirkham commented Oct 29, 2021

jakirkham commented Oct 29, 2021

EvenOldridge commented Oct 29, 2021

jakirkham commented Oct 29, 2021

Spilling on demand #756

Spilling on demand #756

Conversation

madsbk commented Oct 15, 2021

pentschev commented Oct 15, 2021

codecov-commenter commented Oct 27, 2021 • edited Loading

Codecov Report

pentschev left a comment

Choose a reason for hiding this comment

VibhuJawa commented Oct 29, 2021 • edited Loading

madsbk commented Oct 29, 2021 • edited Loading

VibhuJawa commented Oct 29, 2021

jakirkham commented Oct 29, 2021

jakirkham commented Oct 29, 2021

EvenOldridge commented Oct 29, 2021

jakirkham commented Oct 29, 2021

codecov-commenter commented Oct 27, 2021 •

edited

Loading

VibhuJawa commented Oct 29, 2021 •

edited

Loading

madsbk commented Oct 29, 2021 •

edited

Loading