Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encapsulate spill buffer and memory_monitor #5891

Closed
Tracked by #5736 ...
fjetter opened this issue Mar 2, 2022 · 3 comments · Fixed by #5904
Closed
Tracked by #5736 ...

Encapsulate spill buffer and memory_monitor #5891

fjetter opened this issue Mar 2, 2022 · 3 comments · Fixed by #5904
Assignees
Labels

Comments

@fjetter
Copy link
Member

fjetter commented Mar 2, 2022

A few mechanics in worker.py are very well self-contained, and it would make a lot of sense to encapsulate them into a separate module/class in order to reduce complexity:

  • memory limit and thresholds
  • initialisation of the SpillBuffer
  • memory_monitor(), which handles pause, unpause, and force evict from spillbuffer

From the Worker's perspective, Worker.data should be treated as an opaque MutableMapping.

This is just a refactoring, with no functional impact.

Consider deprecating init arguments to Worker using target, spill, pause threshold etc. in favour of dask.config.

This change is part of the wider epic

Related issues:

@gjoseph92
Copy link
Collaborator

From the Worker's perspective, Worker.data should be treated as an opaque MutableMapping.

This is just a refactoring, with no functional impact.

I see the value in doing this refactoring right now just to reduce code complexity. But as discussed in #4424 (comment), I think streaming data from disk to other workers will be a critical part of the new spilling design, and Worker.data will not be treatable as an opaque MutableMapping.

Spilling will need to be integrated into worker data transfer logic (and may even require changes to serialization and comms), which makes me wonder if spilling will actually make sense as a separate extension, or whether it needs to be a core part of the worker.

Or, maybe data transfer logic needs to move into some extensible WorkerDataStore interface? Either way, I think spilling and data transfer have to be somewhat coupled.

@fjetter
Copy link
Member Author

fjetter commented Mar 4, 2022

think streaming data from disk to other workers will be a critical part of the new spilling design, and Worker.data will not be treatable as an opaque MutableMapping.

shuffling is out of scope for this. The shuffling plan right now also does not include the mutable mapping anyhow so I would suggest to postpone this until it becomes relevant again

@crusaderky
Copy link
Collaborator

As described in #5900, encapsulation is still highly desirable even if a MutableMapping is no longer fit for purpose, as the Worker needs to store/retrieve explicitly pickled or explicitly unpickled data.

@crusaderky crusaderky changed the title Worker Extension to encapsulate spill buffer and memory_monitor Encapsulate spill buffer and memory_monitor Mar 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants