Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agones-{extensions,allocator}: Pause after cancelling context #3843

Merged
merged 3 commits into from
May 30, 2024

Conversation

zmerlynn
Copy link
Collaborator

After #3839 went in, we noticed the flakes in TestAllocatorAfterDeleteReplica disappear, but TestGameServerCreationRightAfterDeletingOneExtensionsPod remained. I looked more closely at this and I developed a theory: My guess is that kube-apiserver connections to webhooks are actually "sticky" using http(s) keepalives. If the TCP connection never closes, we'd see the behavior we do, which is the EOF from kube-apiserver going to write on a dead socket.

I looked at how to close sockets on the server side to "finish" the drain, but realized that we actually do, by cancelling the context. The thing is that we cancel the context and immediately exit, but it leaves no time for anything to shut down.

Of course, then I looked more into this and ... I'm not sure why this is helping. But I was able to reproduce the e2e flake locally, and with this small change I'm able to see it go away, so 🤷 .

Reverts #3839

After googleforgames#3839 went in, we noticed the flakes in
TestAllocatorAfterDeleteReplica disappear, but
TestGameServerCreationRightAfterDeletingOneExtensionsPod remained. I
looked more closely at this and I developed a theory: My guess is that
`kube-apiserver` connections to webhooks are actually "sticky" using
http(s) keepalives. If the TCP connection never closes, we'd see the
behavior we do, which is the `EOF` from kube-apiserver going to write
on a dead socket.

I looked at how to close sockets on the server side to "finish" the
drain, but realized that we actually do, by cancelling the context.
The thing is that we cancel the context and immediately exit, but it
leaves no time for anything to shut down.

I tested this and it seems to drive away the flake, without adding in
the extra delay.

Reverts googleforgames#3839
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 8679ca3f-4c75-46fe-b0bd-9a1408b22e87

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3843/head:pr_3843 && git checkout pr_3843
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-6dda265-amd64

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down to give it a shot 😁

@markmandel markmandel enabled auto-merge (squash) May 30, 2024 22:03
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 06c30836-c078-47ed-9094-84555411c0f7

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/3843/head:pr_3843 && git checkout pr_3843
  • helm install agones ./install/helm/agones --namespace agones-system --set agones.image.registry=us-docker.pkg.dev/agones-images/ci --set agones.image.tag=1.41.0-dev-90dcbf2-amd64

@markmandel markmandel merged commit 80dd2ea into googleforgames:main May 30, 2024
4 checks passed
zmerlynn added a commit to zmerlynn/agones that referenced this pull request May 31, 2024
I ran the HA tests on googleforgames#3843 overnight in a loop and still noticed a
very low grade flake in the allocator, so I decided to go ahead and
clean up the issues I noticed while working on the previous PR:

* adds an `httpserver` utility package to handle the `Run` function
that controller/extensions use. Make that context aware using the same
method as https.Run:
https://github.com/googleforgames/agones/blob/dfa414e5e4da37798833bbf8c33919acb5f3c2ea/pkg/util/https/server.go#L127-L130
(note that I think this ^ is why the extensions flake disappeared
after I added a pause).

* also plumbs context-awareness through the allocator
run{Mux,REST,GRPC} functions, which is I suspect hy we're seeing a
low-grade flake in allocator still. (I think the delay I added
previously has sufficient to drive it off, too, but this PR should be
more better.)
@Kalaiselvi84 Kalaiselvi84 added kind/bug These are bugs. and removed kind/other labels Jun 4, 2024
spiceratops added a commit to spiceratops/k8s-gitops that referenced this pull request Jun 8, 2024
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [agones](https://agones.dev)
([source](https://github.com/googleforgames/agones)) | minor |
`1.40.0` -> `1.41.0` |

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>googleforgames/agones (agones)</summary>

###
[`v1.41.0`](https://github.com/googleforgames/agones/blob/HEAD/CHANGELOG.md#v1410-2024-06-04)

[Compare
Source](https://github.com/googleforgames/agones/compare/v1.40.0...v1.41.0)

[Full
Changelog](https://github.com/googleforgames/agones/compare/v1.40.0...v1.41.0)

**Implemented enhancements:**

- Configure Allocator Status Code by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3782
- Graduate Counters and Lists to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3801
- Passthrough autopilot - Adds an AutopilotPassthroughPort Feature Gate
and new pod label by [@&#8203;vicentefb](https://github.com/vicentefb)
in
[googleforgames/agones#3809
- CountsAndLists: Move to Beta Protobuf by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3806
- feat: support multiple port ranges by
[@&#8203;nrwiersma](https://github.com/nrwiersma) in
[googleforgames/agones#3747
- Changes `sdk-server` to Patch instead of Update by
[@&#8203;igooch](https://github.com/igooch) in
[googleforgames/agones#3803
- Generate grpc for nodejs from alpha to beta by
[@&#8203;lacroixthomas](https://github.com/lacroixthomas) in
[googleforgames/agones#3825
- Update CountsAndLists from Alpha to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3824
- feat(gameserver): New DirectToGameServer PortPolicy allows direct
traffic to a GameServer by
[@&#8203;daniellee](https://github.com/daniellee) in
[googleforgames/agones#3807
- Passthrough autopilot - Adds mutating webhook by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3833
- Passthrough autopilot - added ports array case and updated unit tests
by [@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3842
- Nodejs counters and lists by
[@&#8203;steven-supersolid](https://github.com/steven-supersolid) in
[googleforgames/agones#3726
- Promote AutopilotPassthroughPort feature gate to Alpha by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3849

**Fixed bugs:**

- Helm Param Update: Default to agones.controller if agones.extensions
is Missing by [@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84)
in
[googleforgames/agones#3773
- fix: rollout strategy issues by
[@&#8203;nrwiersma](https://github.com/nrwiersma) in
[googleforgames/agones#3762
- Set Minimum Buffer Size to 1 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3749
- Pin ltsc2019 to older SHA by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3829
- TestGameServerAllocationDuringMultipleAllocationClients: Readdress
flake by [@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3831
- Refactor finalizer name to include valid domain name and path by
[@&#8203;indexjoseph](https://github.com/indexjoseph) in
[googleforgames/agones#3840
- agones-{extensions,allocator}: Be more defensive about draining by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3839
- agones-{extensions,allocator}: Pause after cancelling context by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3843
- Change the line to modify in Quickstart: Edit a Game Server by
[@&#8203;peterzhongyi](https://github.com/peterzhongyi) in
[googleforgames/agones#3844

**Other:**

- Prep for Release v1.41.0 by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3800
- Update site documentation to reflect firewall prefix and default to
Autopilot cluster creation for Agones by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3769
- Add a System Diagram and overview page by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3792
- Update Side Menu: Preserve and Restore Scroll Position by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3805
- fix: typo by [@&#8203;skmpf](https://github.com/skmpf) in
[googleforgames/agones#3808
- Helm Config: Add httpUnallocatedStatusCode in Allocator Service by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3802
- Update Docs: CountersAndLists to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3810
- Disable Dev feature FeatureAutopilotPassthroughPort by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3815
- Disable FeatureAutopilotPassthroughPort in features.go by
[@&#8203;vicentefb](https://github.com/vicentefb) in
[googleforgames/agones#3816
- SDK proto compatibility guarantees and deprecation policies
documentation by [@&#8203;igooch](https://github.com/igooch) in
[googleforgames/agones#3774
- Fix dangling "as of" by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3827
- Steps to Promote SDK Features from Alpha to Beta by
[@&#8203;Kalaiselvi84](https://github.com/Kalaiselvi84) in
[googleforgames/agones#3814
- Adds comment for help troubleshooting issues with terraform tfstate by
[@&#8203;igooch](https://github.com/igooch) in
[googleforgames/agones#3822
- docs: improve counter and list example comments by
[@&#8203;yonbh](https://github.com/yonbh) in
[googleforgames/agones#3818
- Skip /tmp/ on yamllint by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3838
- TestAllocatorAfterDeleteReplica: More logging by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3837
- Instructions for upgrading golang version by
[@&#8203;gongmax](https://github.com/gongmax) in
[googleforgames/agones#3819
- Remove unused function FindGameServerContainer by
[@&#8203;zmerlynn](https://github.com/zmerlynn) in
[googleforgames/agones#3841
- Adds Unreal to the List of URL Links to Not Check by
[@&#8203;igooch](https://github.com/igooch) in
[googleforgames/agones#3847
- docs: clarify virtualization setup for Windows versions by
[@&#8203;andresromerodev](https://github.com/andresromerodev) in
[googleforgames/agones#3850

**New Contributors:**

- [@&#8203;skmpf](https://github.com/skmpf) made their first
contribution in
[googleforgames/agones#3808
- [@&#8203;yonbh](https://github.com/yonbh) made their first
contribution in
[googleforgames/agones#3818
- [@&#8203;peterzhongyi](https://github.com/peterzhongyi) made their
first contribution in
[googleforgames/agones#3844
- [@&#8203;andresromerodev](https://github.com/andresromerodev) made
their first contribution in
[googleforgames/agones#3850

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Renovate
Bot](https://github.com/renovatebot/renovate).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zOTAuMCIsInVwZGF0ZWRJblZlciI6IjM3LjM5MC4wIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJyZW5vdmF0ZS9oZWxtIiwidHlwZS9taW5vciJdfQ==-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs. size/S
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants