Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reconciler: Stopping allocs by name is error prone #12797

Open
DerekStrickland opened this issue Apr 27, 2022 · 1 comment
Open

reconciler: Stopping allocs by name is error prone #12797

DerekStrickland opened this issue Apr 27, 2022 · 1 comment

Comments

@DerekStrickland
Copy link
Contributor

Proposal

The reconciler relies on the allocation name (task name + index) when deciding which allocs to stop. This is a non-deterministic value that has led to numerous defects over time. While the name is useful for identifying allocations that conflict, by itself, it is not enough to know which allocation to keep and which to discard. We should design a more resilient approach to reconciling allocations that have the same name.

Use-cases

This issue was identified during work on PR12795. During code review, it came to light that this has been a source of historical problems and that we should look for an alternate approach.

Attempted Solutions

So far, all effort has been expended around ensuring that the surrounding code ensures allocs are filtered correctly into specific sets and then managing those sets. This approach, compounded by change over time, has led to sprawling logic spread across numerous functions that creates a significant cognitive load when trying to work on this aspect of the scheduler.

@tgross
Copy link
Member

tgross commented Apr 27, 2022

Potentially related issues: #12768 #10727

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants