-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encapsulate spill buffer and memory_monitor #5891
Comments
I see the value in doing this refactoring right now just to reduce code complexity. But as discussed in #4424 (comment), I think streaming data from disk to other workers will be a critical part of the new spilling design, and Spilling will need to be integrated into worker data transfer logic (and may even require changes to serialization and comms), which makes me wonder if spilling will actually make sense as a separate extension, or whether it needs to be a core part of the worker. Or, maybe data transfer logic needs to move into some extensible |
shuffling is out of scope for this. The shuffling plan right now also does not include the mutable mapping anyhow so I would suggest to postpone this until it becomes relevant again |
As described in #5900, encapsulation is still highly desirable even if a MutableMapping is no longer fit for purpose, as the Worker needs to store/retrieve explicitly pickled or explicitly unpickled data. |
A few mechanics in worker.py are very well self-contained, and it would make a lot of sense to encapsulate them into a separate module/class in order to reduce complexity:
From the Worker's perspective, Worker.data should be treated as an opaque MutableMapping.
This is just a refactoring, with no functional impact.
Consider deprecating init arguments to Worker using target, spill, pause threshold etc. in favour of dask.config.
This change is part of the wider epic
Related issues:
The text was updated successfully, but these errors were encountered: