Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing list based fleet autoscaler to scale up from 0 replicas #4016

Merged

Conversation

geopaulm
Copy link
Contributor

@geopaulm geopaulm commented Oct 12, 2024

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:
Updated the controller to initialize the list status map with empty AggregatedListStatus from fleet spec.
This allows scaling up if replicas is set to 0. Which is usually the case when deployed via Gitops systems ex. Flux.

Additionally updated the scaleUp function does limited scale to MinCapacity as well. Added this since I saw the desired capacity being set to Max first and then brought back to min! (This could be dangerous if the max is set to a high value)

I see that in most of the places we round up for calculating the number of replicas. This was missing for limited scaling using lists which is causing values to flap a bit for which I added a fix. Some of the tests were failing. Ex. to have a target max capacity of 45 with capacity of 10, the system was calculating 4 instead of 5. Let me know if that change is okay.

Which issue(s) this PR fixes:

Closes #3943

Special notes for your reviewer:

@github-actions github-actions bot added kind/bug These are bugs. size/M labels Oct 12, 2024
@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 25e4d80b-55cc-4532-8766-327c7aef4f6b

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😭

Build Id: 331c643b-c73d-4936-bb87-5c807f6ed5b9

Status: FAILURE

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@geopaulm geopaulm marked this pull request as draft October 12, 2024 23:03
@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 3dace961-2d7e-4b5e-a39c-67940acfadb7

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4016/head:pr_4016 && git checkout pr_4016
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.45.0-dev-5b38729

@geopaulm geopaulm marked this pull request as ready for review October 13, 2024 18:06
Copy link
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting the PR and tests!

pkg/fleets/controller.go Outdated Show resolved Hide resolved
Comment on lines 589 to 591
} else {
additionalReplicas = int32(math.Ceil((float64(minCapacity) - float64(aggCapacity)) / float64(capacity)))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need this here? The case of scaling up to minCapacity should get caught here

limited, scale := isLimited(aggCapacity, minCapacity, maxCapacity)
which eventually will call
func scaleUpLimited(replicas int32, capacity, aggCapacity, minCapacity int64) (int32, bool, error) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case limited = true, since aggCapacity = 0 is less than minCapacity.
Since the scaleup function only does a limit for MaxCapacity, the desired capacity is set to max value.
In the next run it will be set back to min value!. I saw this in the fleet autoscaler log. Isn't that dangerous?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the scaleUp function itself shouldn't be called when aggCapacity = 0. The function called when aggCapacity = 0 is scaleUpLimited.

pkg/fleetautoscalers/fleetautoscalers_test.go Outdated Show resolved Hide resolved
@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: f259b1da-ce0c-4710-b641-d87f4cc659b4

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4016/head:pr_4016 && git checkout pr_4016
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.45.0-dev-72ae1c2

@geopaulm geopaulm requested a review from igooch October 24, 2024 21:14
@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: 0148e88a-48e7-42a2-929d-80dbe09607fe

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4016/head:pr_4016 && git checkout pr_4016
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.45.0-dev-67c11a0

Copy link
Collaborator

@igooch igooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for all your work on this!

pkg/fleetautoscalers/fleetautoscalers.go Outdated Show resolved Hide resolved
@igooch igooch enabled auto-merge (squash) October 31, 2024 22:12
@agones-bot
Copy link
Collaborator

Build Succeeded 🥳

Build Id: aaaab275-0af0-41e1-a3be-4b2d40010c48

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

git fetch https://github.com/googleforgames/agones.git pull/4016/head:pr_4016 && git checkout pr_4016
helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.45.0-dev-87a7689

@igooch igooch merged commit 0056256 into googleforgames:main Oct 31, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fleet autoscaler with "List" policy throws an error if configured with a fleet with no replicas
3 participants