Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise GameServer Sub-Controller Queues #3781

Merged
merged 2 commits into from
Apr 19, 2024

Conversation

markmandel
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix
/kind release

What this PR does / Why we need it:

Looking at pprof profiles of the Agones Controller, we see a lot memory in the Migration controller and the Missing Controller queues.

This implements improved checking at the event layer, before placing items in the workerqueue, which drops the memory footprint, and also allows for faster processing of the edge cases that Missing and Migration Controllers implement.

Also did a review of the Health Controller and made a small improvement there as well.

Looking at the memory pprof profile and load tests, we see a decrease in memory usage, and also less of a slow "step up" of memory over time.

Which issue(s) this PR fixes:

Closes #3748

Special notes for your reviewer:

Here's some pretty pictures!

Here's the two tests run side by side - I only did 30m, but you can see a drop in memory, and less of a "step up" in memory usage you slowly see the beginnings of with the original code
two-tests

Also updated pprof, it's all basically JSON de/serialisation, which is pretty much what we would want to see.

pprof-flame-fixed

@github-actions github-actions bot added kind/bug These are bugs. size/M labels Apr 18, 2024
@markmandel markmandel added area/performance Anything to do with Agones being slow, or making it go faster. area/operations Installation, updating, metrics etc labels Apr 18, 2024
Copy link
Collaborator

@zmerlynn zmerlynn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit on a log line but o/w LGTM

pkg/gameservers/migration.go Outdated Show resolved Hide resolved
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 3ee3fd37-5b52-4519-93dd-24db3d90c475

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member Author

Aaah, didn't have enough time to become leader!

23: e2e-wait-to-become-leader

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 19c72705-1899-45c0-9eb1-96d46abfe65f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3781/head:pr_3781 && git checkout pr_3781
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.40.0-dev-13d2be1-amd64

Looking at pprof profiles of the Agones Controller, we see a lot memory
in the Migration controller and the Missing Controller queues.

This implements improved checking at the event layer, before placing
items in the workerqueue, which drops the memory footprint, and also
allows for faster processing of the edge cases that Missing and
Migration Controllers implement.

Also did a review of the Health Controller and made a small improvement
there as well.

Looking at the memory pprof profile and load tests, we see a decrease in
memory usage, and also less of a slow "step up" of memory over time.

Closes googleforgames#3748
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 39adee09-897b-4873-8465-f1d098eb616d

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3781/head:pr_3781 && git checkout pr_3781
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.40.0-dev-7b51704-amd64

@zmerlynn zmerlynn merged commit f0c205a into googleforgames:main Apr 19, 2024
4 checks passed
@markmandel markmandel deleted the perf/memory-decrease branch April 22, 2024 23:29
spiceratops referenced this pull request in spiceratops/k8s-gitops May 13, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [agones](https://agones.dev)
([source](https://github.com/googleforgames/agones)) | minor |
`1.39.0` -> `1.40.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>googleforgames/agones (agones)</summary>

###
[`v1.40.0`](https://github.com/googleforgames/agones/blob/HEAD/CHANGELOG.md#v1400-2024-04-23)

[Compare
Source](https://github.com/googleforgames/agones/compare/v1.39.0...v1.40.0)

[Full
Changelog](https://github.com/googleforgames/agones/compare/v1.39.0...v1.40.0)

**Breaking changes:**

- Counters and Lists: Remove Bool Returns by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3738](https://github.com/googleforgames/agones/pull/3738)

**Implemented enhancements:**

- Leader Election in Custom Controller by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3696](https://github.com/googleforgames/agones/pull/3696)
- Migrating from generate-groups.sh to kube_codegen.sh by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3722](https://github.com/googleforgames/agones/pull/3722)
- Move GKEAutopilotExtendedDurationPods to Alpha in 1.28+ by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3729](https://github.com/googleforgames/agones/pull/3729)
- Move DisableResyncOnSDKServer to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3732](https://github.com/googleforgames/agones/pull/3732)
- Counters & Lists landing page and doc improvements by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3649](https://github.com/googleforgames/agones/pull/3649)
- Graduate FleetAllocationOverflow to Stable by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3733](https://github.com/googleforgames/agones/pull/3733)
- Adds Counters and Lists to CSharp SDK by
[@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3581](https://github.com/googleforgames/agones/pull/3581)
- Feat/counter and list defaulting order to ascending by
[@&#8203;lacroixthomas](https://github.com/lacroixthomas) in
[https://github.com/googleforgames/agones/pull/3734](https://github.com/googleforgames/agones/pull/3734)
- Add handling for StatusAddresses in GameServerStatus for the Unity SDK
by [@&#8203;charlesvien](https://github.com/charlesvien) in
[https://github.com/googleforgames/agones/pull/3739](https://github.com/googleforgames/agones/pull/3739)
- Feat(gameservers): Shared pod IPs with GameServer Addresses by
[@&#8203;lacroixthomas](https://github.com/lacroixthomas) in
[https://github.com/googleforgames/agones/pull/3764](https://github.com/googleforgames/agones/pull/3764)
- Be prescriptive about rotating regions when updating Kubernetes
versions by [@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3716](https://github.com/googleforgames/agones/pull/3716)
- Fix ensure-e2e-infra-state-bucket by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3719](https://github.com/googleforgames/agones/pull/3719)
- Create Performance Cluster 1.28 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3720](https://github.com/googleforgames/agones/pull/3720)
- Optimise GameServer Sub-Controller Queues by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3781](https://github.com/googleforgames/agones/pull/3781)

**Fixed bugs:**

- Counters & Lists: Consolidate `priorities` sorting by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3690](https://github.com/googleforgames/agones/pull/3690)
- Fix(Counter & Lists): Add validation for `priorities` by
[@&#8203;lacroixthomas](https://github.com/lacroixthomas) in
[https://github.com/googleforgames/agones/pull/3714](https://github.com/googleforgames/agones/pull/3714)
- fix:
[#&#8203;3607](https://github.com/googleforgames/agones/issues/3607)
Metrics data loss in K8S controller by
[@&#8203;alvin-7](https://github.com/alvin-7) in
[https://github.com/googleforgames/agones/pull/3692](https://github.com/googleforgames/agones/pull/3692)
- Deflake GameServerAllocationDuringMultipleAllocationClients by
allowing errors by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3750](https://github.com/googleforgames/agones/pull/3750)

**Security fixes:**

- Bump protobufjs from 7.2.4 to 7.2.6 in /sdks/nodejs by
[@&#8203;dependabot](https://github.com/dependabot) in
[https://github.com/googleforgames/agones/pull/3755](https://github.com/googleforgames/agones/pull/3755)
- Bump golang.org/x/net from 0.19.0 to 0.23.0 by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3793](https://github.com/googleforgames/agones/pull/3793)

**Other:**

- Flaky: TestGameServerCreationAfterDeletingOneExtensionsPod by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3699](https://github.com/googleforgames/agones/pull/3699)
- Prep for release v1.40.0 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3700](https://github.com/googleforgames/agones/pull/3700)
- Bumps cpp-simple Image and Refactoring Example Makefiles by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3695](https://github.com/googleforgames/agones/pull/3695)
- Upgrade Protobuf to 1.33.0 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3711](https://github.com/googleforgames/agones/pull/3711)
- Modify Script for Makefile Version Updates in Examples Directory by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3712](https://github.com/googleforgames/agones/pull/3712)
- Adds simple genai server example documentation to the Agones site by
[@&#8203;igooch](https://github.com/igooch) in
[https://github.com/googleforgames/agones/pull/3713](https://github.com/googleforgames/agones/pull/3713)
- Update Supported Kubernetes to 1.27, 1.28, 1.29 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3654](https://github.com/googleforgames/agones/pull/3654)
- fix: typo in docs by [@&#8203;qhyun2](https://github.com/qhyun2) in
[https://github.com/googleforgames/agones/pull/3723](https://github.com/googleforgames/agones/pull/3723)
- Tweak: Setting up the Game Server by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3717](https://github.com/googleforgames/agones/pull/3717)
- Docs: gke.md - spelling by
[@&#8203;daniellee](https://github.com/daniellee) in
[https://github.com/googleforgames/agones/pull/3740](https://github.com/googleforgames/agones/pull/3740)
- Aesthetic rearrangement of cloudbuild.yaml by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3741](https://github.com/googleforgames/agones/pull/3741)
- Docs: Make hitting <enter> on connection explicit by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3743](https://github.com/googleforgames/agones/pull/3743)
- CI: Don't check Unreal Link by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3745](https://github.com/googleforgames/agones/pull/3745)
- New recommendation for multi-cluster allocation by
[@&#8203;markmandel](https://github.com/markmandel) in
[https://github.com/googleforgames/agones/pull/3744](https://github.com/googleforgames/agones/pull/3744)
- Custom Controller Example Page on Agones Website by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3725](https://github.com/googleforgames/agones/pull/3725)
- Add Nitrado logo by [@&#8203;towolf](https://github.com/towolf) in
[https://github.com/googleforgames/agones/pull/3753](https://github.com/googleforgames/agones/pull/3753)
- Remove unnecessary args from e2e-test-cloudbuild by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[https://github.com/googleforgames/agones/pull/3754](https://github.com/googleforgames/agones/pull/3754)
- Update Allocation from Fleet Documentation by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3761](https://github.com/googleforgames/agones/pull/3761)
- Transform Lint Warnings into Errors by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3756](https://github.com/googleforgames/agones/pull/3756)
- Update Canary Testing Documentation by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3760](https://github.com/googleforgames/agones/pull/3760)
- Supertuxkart Example on Agones Site by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3728](https://github.com/googleforgames/agones/pull/3728)
- Xonotic Example on Agones Site by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3742](https://github.com/googleforgames/agones/pull/3742)
- nit documentation fix in kind cluster section when building Agones by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3770](https://github.com/googleforgames/agones/pull/3770)
- Merged steps inside documentation about webhook certificate creation
by [@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3768](https://github.com/googleforgames/agones/pull/3768)
- Example Images: Increment Tags by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[https://github.com/googleforgames/agones/pull/3796](https://github.com/googleforgames/agones/pull/3796)
- Update simple game server example documentation by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[https://github.com/googleforgames/agones/pull/3776](https://github.com/googleforgames/agones/pull/3776)

**New Contributors:**

- [@&#8203;lacroixthomas](https://github.com/lacroixthomas) made their
first contribution in
[https://github.com/googleforgames/agones/pull/3714](https://github.com/googleforgames/agones/pull/3714)
- [@&#8203;daniellee](https://github.com/daniellee) made their first
contribution in
[https://github.com/googleforgames/agones/pull/3740](https://github.com/googleforgames/agones/pull/3740)
- [@&#8203;charlesvien](https://github.com/charlesvien) made their
first contribution in
[https://github.com/googleforgames/agones/pull/3739](https://github.com/googleforgames/agones/pull/3739)
- [@&#8203;vicentefb](https://github.com/vicentefb) made their first
contribution in
[https://github.com/googleforgames/agones/pull/3770](https://github.com/googleforgames/agones/pull/3770)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTYuMSIsInVwZGF0ZWRJblZlciI6IjM3LjM1Ni4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9oZWxtIiwidHlwZS9taW5vciJdfQ==-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/operations Installation, updating, metrics etc area/performance Anything to do with Agones being slow, or making it go faster. kind/bug These are bugs. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GameServer Sub-Controllers queue more than they should
3 participants