Skip to content

Commit

Permalink
Amortize transfer cost
Browse files Browse the repository at this point in the history
As discussed in dask#5325. The idea is that if a key we need has many dependents, we should amortize the cost of transferring it to a new worker, since those other dependencies could then run on the new worker more cheaply. "We'll probably have to move this at some point anyway, might as well do it now."

This isn't actually intended to encourage transfers though. It's more meant to discourage transferring keys that could have just stayed in one place. The goal is that if A and B are on different workers, and we're the only task that will ever need A, but plenty of other tasks will need B, we should schedule alongside A even if B is a bit larger to move.

But this is all a theory and needs some tests.
  • Loading branch information
gjoseph92 committed Sep 17, 2021
1 parent 4c67b0b commit b4ebbee
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions distributed/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -3415,12 +3415,13 @@ def worker_objective(self, ts: TaskState, ws: WorkerState) -> tuple:
"""
dts: TaskState
nbytes: Py_ssize_t
comm_bytes: Py_ssize_t = 0
comm_bytes: double = 0
xfers: Py_ssize_t = 0
for dts in ts._dependencies:
if ws not in dts._who_has:
nbytes = dts.get_nbytes()
comm_bytes += nbytes
# amortize transfer cost over all waiters
comm_bytes += nbytes / len(dts._waiters)
xfers += 1

stack_time: double = ws._occupancy / ws._nthreads
Expand Down

0 comments on commit b4ebbee

Please sign in to comment.