Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: rollout strategy issues #3762

Merged
merged 6 commits into from
May 2, 2024

Conversation

nrwiersma
Copy link
Contributor

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

This PR fixes issues with the fleet rollout strategy that have been observed in production. It seems in clusters that are massively allocation heavy, rollout does not behave as expected or as documented. These patches have been confirmed to work in out production environment, using a custom build.

Each issue is in a separate commit, to allow them to be evaluated individually.

Which issue(s) this PR fixes:

Closes #3688

Special notes for your reviewer:

Additional to the 2 bug fixes (first 2 commits), we observe that once an inactive GSS gets to the point of Allocated == Replicas, it is still possible for it to spawn GSes, if the new GS comes up fast enough and there is enough churn in the controller. Once this point is reached, there is no real reason for the GSS to have Spec Replicas, but rather it would be better for it to get its Spec Replicas set to 0 and be allowed to loose it allocations until it is cleaned up. This patch makes up the last commit.

@github-actions github-actions bot added kind/bug These are bugs. size/S labels Apr 15, 2024
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 04b542f9-9a13-4947-bbd7-a3b8364d9b4c

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions for how to feature gate this change.

Is there a e2e type scenario (akin to what we see in #3438) you can think of that would highlight the issues you were running into - that would also mean we don't run into it again in the future.

pkg/fleets/controller.go Outdated Show resolved Hide resolved
@nrwiersma nrwiersma marked this pull request as ready for review April 18, 2024 08:16
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 0ce99505-3c74-45ec-aaa4-a8033f1d7f7d

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 1193d3f4-b99a-4bc9-9bf4-a4896d0f6ac3

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 621017df-61f9-4a38-a663-85d710f0cbc4

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3762/head:pr_3762 && git checkout pr_3762
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.40.0-dev-1c0bc18-amd64

@markmandel
Copy link
Member

markmandel commented Apr 19, 2024

Thanks for making these changes - I'm probably going to opt to let this one slide past the next release on Tuesday, and give me (someone? 😄 ) some time to sit down with this, and make sure everything is good (which really means running some rolling updates and watching them happen 😄 )

@nrwiersma
Copy link
Contributor Author

We have production data of this patch if that would help.

@markmandel
Copy link
Member

We have production data of this patch if that would help.

TBH - totally trust that you do. I just want to know that I've spent some time with this as an extra verification step, and give it time to back between releases as well.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 8981dcfc-9e41-418c-b309-09dffc754cc9

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3762/head:pr_3762 && git checkout pr_3762
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-a2f39cf-amd64

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 1d95bf2c-5e7e-4b13-ab31-569d414ea084

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3762/head:pr_3762 && git checkout pr_3762
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-3a8aee7-amd64

@markmandel
Copy link
Member

Just a heads up - will take this for a spin this week 👍🏻

@@ -31,6 +31,7 @@ The current set of `alpha` and `beta` feature gates:
| [Support for Extended Duration Pods on GKE Autopilot (*1.28+ only*)](https://github.com/googleforgames/agones/issues/3386) | `GKEAutopilotExtendedDurationPods` | Disabled | `Alpha` | 1.37.0 |
| [GameServer player capacity filtering on GameServerAllocations](https://github.com/googleforgames/agones/issues/1239) | `PlayerAllocationFilter` | Disabled | `Alpha` | 1.14.0 |
| [Player Tracking]({{< ref "/docs/Guides/player-tracking.md" >}}) | `PlayerTracking` | Disabled | `Alpha` | 1.6.0 |
| [Rolling Update Fixes](https://github.com/googleforgames/agones/issues/3688) | `RollingUpdateFix` | Disabled | `Alpha` | 1.41.0 |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small thing, can you make a copy of this table, one with the new feature and one without, and use the feature shortcodes to keep this feature flag hidden until the next release please.

See: https://agones.dev/site/docs/contribute/#within-a-page for details

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 5fca4a12-50e0-4401-9193-f5a92d9137a2

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3762/head:pr_3762 && git checkout pr_3762
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-06b190f-amd64

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally took this for a spin, basically just wanted to see it in action - look great, let's go ahead with it!

(Rolling update on a Fleet of 50 gameservers)
Peek 2024-05-02 09-50

@markmandel markmandel enabled auto-merge (squash) May 2, 2024 16:56
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 819f3e3b-9f05-4011-b38c-22c9dbb01880

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3762/head:pr_3762 && git checkout pr_3762
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-26860d7-amd64

@markmandel markmandel merged commit 0500efa into googleforgames:main May 2, 2024
4 checks passed
@nrwiersma nrwiersma deleted the rollout-issues branch May 2, 2024 20:20
spiceratops referenced this pull request in spiceratops/k8s-gitops Jun 8, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [agones](https://agones.dev)
([source](https://github.com/googleforgames/agones)) | minor |
`1.40.0` -> `1.41.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>googleforgames/agones (agones)</summary>

###
[`v1.41.0`](https://github.com/googleforgames/agones/blob/HEAD/CHANGELOG.md#v1410-2024-06-04)

[Compare
Source](https://github.com/googleforgames/agones/compare/v1.40.0...v1.41.0)

[Full
Changelog](https://github.com/googleforgames/agones/compare/v1.40.0...v1.41.0)

**Implemented enhancements:**

- Configure Allocator Status Code by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3782](https://github.com/googleforgames/agones/pull/3782)
- Graduate Counters and Lists to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3801](https://github.com/googleforgames/agones/pull/3801)
- Passthrough autopilot - Adds an AutopilotPassthroughPort Feature Gate
and new pod label by [@&#8203;vicentefb](https://github.com/vicentefb)
in
[https://github.com/googleforgames/agones/pull/3809](https://github.com/googleforgames/agones/pull/3809)
- CountsAndLists: Move to Beta Protobuf by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3806](https://github.com/googleforgames/agones/pull/3806)
- feat: support multiple port ranges by
[@&#8203;nrwiersma](https://github.com/nrwiersma) in
[https://github.com/googleforgames/agones/pull/3747](https://github.com/googleforgames/agones/pull/3747)
- Changes `sdk-server` to Patch instead of Update by
[@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3803](https://github.com/googleforgames/agones/pull/3803)
- Generate grpc for nodejs from alpha to beta by
[@&#8203;lacroixthomas](https://github.com/lacroixthomas) in
[https://github.com/googleforgames/agones/pull/3825](https://github.com/googleforgames/agones/pull/3825)
- Update CountsAndLists from Alpha to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3824](https://github.com/googleforgames/agones/pull/3824)
- feat(gameserver): New DirectToGameServer PortPolicy allows direct
traffic to a GameServer by
[@&#8203;daniellee](https://github.com/daniellee) in
[https://github.com/googleforgames/agones/pull/3807](https://github.com/googleforgames/agones/pull/3807)
- Passthrough autopilot - Adds mutating webhook by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3833](https://github.com/googleforgames/agones/pull/3833)
- Passthrough autopilot - added ports array case and updated unit tests
by [@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3842](https://github.com/googleforgames/agones/pull/3842)
- Nodejs counters and lists by
[@&#8203;steven-supersolid](https://github.com/steven-supersolid) in
[https://github.com/googleforgames/agones/pull/3726](https://github.com/googleforgames/agones/pull/3726)
- Promote AutopilotPassthroughPort feature gate to Alpha by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3849](https://github.com/googleforgames/agones/pull/3849)

**Fixed bugs:**

- Helm Param Update: Default to agones.controller if agones.extensions
is Missing by [@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84)
in
[https://github.com/googleforgames/agones/pull/3773](https://github.com/googleforgames/agones/pull/3773)
- fix: rollout strategy issues by
[@&#8203;nrwiersma](https://github.com/nrwiersma) in
[https://github.com/googleforgames/agones/pull/3762](https://github.com/googleforgames/agones/pull/3762)
- Set Minimum Buffer Size to 1 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3749](https://github.com/googleforgames/agones/pull/3749)
- Pin ltsc2019 to older SHA by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3829](https://github.com/googleforgames/agones/pull/3829)
- TestGameServerAllocationDuringMultipleAllocationClients: Readdress
flake by [@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3831](https://github.com/googleforgames/agones/pull/3831)
- Refactor finalizer name to include valid domain name and path by
[@&#8203;indexjoseph](https://github.com/indexjoseph) in
[https://github.com/googleforgames/agones/pull/3840](https://github.com/googleforgames/agones/pull/3840)
- agones-{extensions,allocator}: Be more defensive about draining by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3839](https://github.com/googleforgames/agones/pull/3839)
- agones-{extensions,allocator}: Pause after cancelling context by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3843](https://github.com/googleforgames/agones/pull/3843)
- Change the line to modify in Quickstart: Edit a Game Server by
[@&#8203;peterzhongyi](https://github.com/peterzhongyi) in
[https://github.com/googleforgames/agones/pull/3844](https://github.com/googleforgames/agones/pull/3844)

**Other:**

- Prep for Release v1.41.0 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3800](https://github.com/googleforgames/agones/pull/3800)
- Update site documentation to reflect firewall prefix and default to
Autopilot cluster creation for Agones by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3769](https://github.com/googleforgames/agones/pull/3769)
- Add a System Diagram and overview page by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3792](https://github.com/googleforgames/agones/pull/3792)
- Update Side Menu: Preserve and Restore Scroll Position by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3805](https://github.com/googleforgames/agones/pull/3805)
- fix: typo by [@&#8203;skmpf](https://github.com/skmpf) in
[https://github.com/googleforgames/agones/pull/3808](https://github.com/googleforgames/agones/pull/3808)
- Helm Config: Add httpUnallocatedStatusCode in Allocator Service by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3802](https://github.com/googleforgames/agones/pull/3802)
- Update Docs: CountersAndLists to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3810](https://github.com/googleforgames/agones/pull/3810)
- Disable Dev feature FeatureAutopilotPassthroughPort by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3815](https://github.com/googleforgames/agones/pull/3815)
- Disable FeatureAutopilotPassthroughPort in features.go by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3816](https://github.com/googleforgames/agones/pull/3816)
- SDK proto compatibility guarantees and deprecation policies
documentation by [@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3774](https://github.com/googleforgames/agones/pull/3774)
- Fix dangling "as of" by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3827](https://github.com/googleforgames/agones/pull/3827)
- Steps to Promote SDK Features from Alpha to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3814](https://github.com/googleforgames/agones/pull/3814)
- Adds comment for help troubleshooting issues with terraform tfstate by
[@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3822](https://github.com/googleforgames/agones/pull/3822)
- docs: improve counter and list example comments by
[@&#8203;yonbh](https://github.com/yonbh) in
[https://github.com/googleforgames/agones/pull/3818](https://github.com/googleforgames/agones/pull/3818)
- Skip /tmp/ on yamllint by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3838](https://github.com/googleforgames/agones/pull/3838)
- TestAllocatorAfterDeleteReplica: More logging by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3837](https://github.com/googleforgames/agones/pull/3837)
- Instructions for upgrading golang version by
[@&#8203;gongmax](https://github.com/gongmax) in
[https://github.com/googleforgames/agones/pull/3819](https://github.com/googleforgames/agones/pull/3819)
- Remove unused function FindGameServerContainer by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3841](https://github.com/googleforgames/agones/pull/3841)
- Adds Unreal to the List of URL Links to Not Check by
[@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3847](https://github.com/googleforgames/agones/pull/3847)
- docs: clarify virtualization setup for Windows versions by
[@&#8203;andresromerodev](https://github.com/andresromerodev) in
[https://github.com/googleforgames/agones/pull/3850](https://github.com/googleforgames/agones/pull/3850)

**New Contributors:**

- [@&#8203;skmpf](https://github.com/skmpf) made their first
contribution in
[https://github.com/googleforgames/agones/pull/3808](https://github.com/googleforgames/agones/pull/3808)
- [@&#8203;yonbh](https://github.com/yonbh) made their first
contribution in
[https://github.com/googleforgames/agones/pull/3818](https://github.com/googleforgames/agones/pull/3818)
- [@&#8203;peterzhongyi](https://github.com/peterzhongyi) made their
first contribution in
[https://github.com/googleforgames/agones/pull/3844](https://github.com/googleforgames/agones/pull/3844)
- [@&#8203;andresromerodev](https://github.com/andresromerodev) made
their first contribution in
[https://github.com/googleforgames/agones/pull/3850](https://github.com/googleforgames/agones/pull/3850)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTAuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5MC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9oZWxtIiwidHlwZS9taW5vciJdfQ==-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MaxSurge and MinAvailable calculated incorrectly
3 participants