Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

address issue with max concurrent and work fetch #5755

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

davidpanderson
Copy link
Contributor

@davidpanderson davidpanderson commented Aug 13, 2024

Fixes #5749 (hopefully)

Max concurrent is a limit on jobs, not processor instances.
The work fetch logic made the erroneous implicit assumption
that all jobs use 1 CPU.
So e.g. if project has max concurrent 4,
and the client has two 2-CPU jobs,
it will think (if work buf is zero) that there's
no point in fetching more work.
But in fact the project could use 8 CPUs, so 4 are idle.

Fix: if a project has MC constraints,
then for each resource compute 'mc_max_could_use':
the max # of instances the project could use, given its MC constraints.
Use this to compute the project's shortfall,
and hence to decide whether to fetch work from it.

Note: the way mc_max_could_use is computed is crude;
it takes the max over all apps,
when it's possible that only one of them has a MC constraint.
This could result in limited over-fetching,
but that's preferable to under-fetching and starvation.

Sim: show app name in timeline

Max concurrent is a limit on jobs, not processor instances.
The work fetch logic made the erroneous implicit assumption
that all jobs use 1 CPU.
So e.g. if project has max concurrent 4,
and the client has two 2-CPU jobs,
it will think (if work buf is zero) that there's
no point in fetching more work.
But in fact the project could use 8 CPUs, so 4 are idle.

Fix: if a project has MC constraints,
then for each resource compute 'mc_max_could_use':
the max # of instances the project could use, given its MC constraints.
Use this to compute the project's shortfall,
and hence to decide whether to fetch work from it.

Note: the way mc_max_could_use is computed is crude;
it takes the max over all apps,
when it's possible that only one of them has a MC constraint.
This could result in limited over-fetching,
but that's preferable to under-fetching and starvation.

Sim: show app name in timeline
@AenBleidd
Copy link
Member

@davidpanderson, are you sure this is the fix for #5743, not for #5749?

@davidpanderson
Copy link
Contributor Author

Oops! Fixed.

@AenBleidd AenBleidd merged commit af6c23c into master Aug 13, 2024
146 checks passed
@AenBleidd AenBleidd deleted the dpa_max_concurrent2 branch August 13, 2024 19:30
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Merged
Development

Successfully merging this pull request may close these issues.

Work fetch stops before per-app concurrency limit is reached even when there are CPUs are idling
2 participants