Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: Changing semantics of min runners to be min idle runners #3040

Merged
merged 4 commits into from
Nov 30, 2023

Conversation

nikola-jokic
Copy link
Contributor

Proposing design for having two ways of handling minRunners field

Addresses #2707


With the "lazy" strategy, the current behavior of `minRunners` will be preserved. The `minRunners` field will specify the minimum number of runners running in a cluster regardless of their state ("running" or "idle").

With the "eager" strategy, `minRunners` will be treated as `minIdleRunners`. This strategy will calculate the number of runners based on the number of workflows acquired plus the number of minRunners. If the `maxRunners` field is specified, it will be respected, so the number of idle runners can be less than the number of idle `minRunners` when the number of acquired jobs plus the number of min runners is greater than `maxRunners`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't stop acquire jobs after we reaching maxRunners, would this cause we creating too many runners?
also since 2 runner scale sets can acquire the same job, but only one of them will get run, will we cause too much wasting resource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It should not because the patch from the listener should only be up to maxRunners regardless of how many are acquired
  2. Here we can over-provision them if other scale set "steals" the job, but eventually we will scale down on the next message. The unlucky scenario here can be that no events are created after the job is stolen, so we may not scale down to the desired number of min runners. Good point...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also want to fix the scale down process if we start doing this since it might make the extra scale up/down even worse.


## Context

Current implementation treats the `minRunners` field as the number of runners that should be running on your cluster. They can be busy running the job, or they can be idle. This ensures faster cold startup time when workflows are acquired as well as trying to use the minimum amount of runners needed to fulfill the scaling requirement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think runner pod kind of having 3 stage:

  • Running jobs
  • Idle, runner online to the service
  • Runner pod starting, runner is trying to be online to the service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll add it (I merged this case with the idle one as it is not busy, but I should have been more precise)

@nikola-jokic nikola-jokic changed the title ADR: Introducing min runner strategies ADR: Changing semantics of min runners to be min idle runners Nov 28, 2023
@nikola-jokic
Copy link
Contributor Author

Hey everyone,

The ADR has been changed not to introduce multiple scaling strategies, but rather to change the semantics of the field.

Copy link
Member

@Link- Link- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship it

@nikola-jokic nikola-jokic merged commit 5347e2c into master Nov 30, 2023
12 checks passed
@nikola-jokic nikola-jokic deleted the nikola-jokic/adr-min-runner-strategy branch November 30, 2023 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants