Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validation bug in FleetAutoscaler #2242

Merged
merged 5 commits into from
Aug 31, 2021

Conversation

yoshd
Copy link
Contributor

@yoshd yoshd commented Aug 27, 2021

What type of PR is this?

Uncomment only one /kind <> line, press enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug
/kind cleanup
/kind documentation
/kind feature
/kind hotfix

What this PR does / Why we need it:

Relates to #2171 .

Congratulations on the release of v1.17.0-rc!
I tried this version and found a behavior that seems to be a bug.

I think it assumes that the sync field in FleetAutoscaler is not specified if CustomFasSyncInterval is not enabled for the feature gate.
However, if you don't specify the sync field, it will cause a panic by dereferencing invalid nil pointer.

This is the log of applying to a cluster with CustomFasSyncInterval disabled with a manifest that does not have sync set:

Error from server (InternalError): error when creating "path/to/fleet_autoscaler.yaml": Internal error occurred: failed calling webhook "validations.agones.dev": Post "https://agones-controller-service.agones-system.svc:443/validate?timeout=10s": EOF

agones-controller:

agones-controller-7dcfcf75c-9q8v8 agones-controller 2021/08/27 10:18:29 http: panic serving x.x.x.x:58204: runtime error: invalid memory address or nil pointer dereference
agones-controller-7dcfcf75c-9q8v8 agones-controller goroutine 2553 [running]:
agones-controller-7dcfcf75c-9q8v8 agones-controller net/http.(*conn).serve.func1(0xc001266be0)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:1801 +0x147
agones-controller-7dcfcf75c-9q8v8 agones-controller panic(0x1995bc0, 0x29c20b0)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/runtime/panic.go:975 +0x47a
agones-controller-7dcfcf75c-9q8v8 agones-controller agones.dev/agones/pkg/apis/autoscaling/v1.(*FleetAutoscaler).Validate(0xc000502900, 0x0, 0x0, 0x0, 0xc000502900, 0x0, 0x0)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/go/src/agones.dev/agones/pkg/apis/autoscaling/v1/fleetautoscaler.go:207 +0x6e
agones-controller-7dcfcf75c-9q8v8 agones-controller agones.dev/agones/pkg/fleetautoscalers.(*Controller).validationHandler(0xc000252000, 0xc0017e3230, 0xf, 0xc001895220, 0x13, 0xc000b02340, 0xc000f53f20, 0x0, 0x0, 0x0, ...)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/go/src/agones.dev/agones/pkg/fleetautoscalers/controller.go:180 +0x52b
agones-controller-7dcfcf75c-9q8v8 agones-controller agones.dev/agones/pkg/util/webhooks.(*WebHook).handle(0xc00020b680, 0x1be5d6c, 0x9, 0x1e5fc40, 0xc0007d6d20, 0xc0013d3300, 0x1be5d6c, 0x9)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/go/src/agones.dev/agones/pkg/util/webhooks/webhooks.go:95 +0x68f
agones-controller-7dcfcf75c-9q8v8 agones-controller agones.dev/agones/pkg/util/webhooks.(*WebHook).AddHandler.func1(0x1e5fc40, 0xc0007d6d20, 0xc0013d3300)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/go/src/agones.dev/agones/pkg/util/webhooks/webhooks.go:63 +0x7d
agones-controller-7dcfcf75c-9q8v8 agones-controller net/http.HandlerFunc.ServeHTTP(0xc0002bc140, 0x1e5fc40, 0xc0007d6d20, 0xc0013d3300)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:2042 +0x44
agones-controller-7dcfcf75c-9q8v8 agones-controller net/http.(*ServeMux).ServeHTTP(0xc0007fa6c0, 0x1e5fc40, 0xc0007d6d20, 0xc0013d3300)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:2417 +0x1ad
agones-controller-7dcfcf75c-9q8v8 agones-controller net/http.serverHandler.ServeHTTP(0xc0007d61c0, 0x1e5fc40, 0xc0007d6d20, 0xc0013d3300)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:2843 +0xa3
agones-controller-7dcfcf75c-9q8v8 agones-controller net/http.(*conn).serve(0xc001266be0, 0x1e64640, 0xc000a3de80)
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:1925 +0x8ad
agones-controller-7dcfcf75c-9q8v8 agones-controller created by net/http.(*Server).Serve
agones-controller-7dcfcf75c-9q8v8 agones-controller 	/usr/local/go/src/net/http/server.go:2969 +0x36c

The following additional bugs have also been fixed.
Similarly, I deployed it on GKE and applied it without setting sync to FleetAutoscaler and confirmed that no error occurred.

#2242 (comment)

Special notes for your reviewer:

I have installed this modified version of Agones in GKE environment and confirmed that the apply succeeds without setting sync in FleetAutoscaler with CustomFasSyncInterval disabled.

@google-cla
Copy link

google-cla bot commented Aug 27, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

Comment on lines 118 to 120
if !runtime.FeatureEnabled(runtime.FeatureCustomFasSyncInterval) {
// Do not run test if FeatureCustomFasSyncInterval is not enabled
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FeatureGate setting in runtime is global.
I concerned that if we set FeatureGate in this test, it may affect other tests that are running in parallel.
In this case, I decided to only check if CustomFasSyncInterval is enabled in FeatureGate.
I wish there was a better way to test this...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than return, let's do a:

Suggested change
if !runtime.FeatureEnabled(runtime.FeatureCustomFasSyncInterval) {
// Do not run test if FeatureCustomFasSyncInterval is not enabled
return
if !runtime.FeatureEnabled(runtime.FeatureCustomFasSyncInterval) {
// Do not run test if FeatureCustomFasSyncInterval is not enabled
t.Skip()

So we can see that it was skipped 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review.
I fixed it.

@yoshd
Copy link
Contributor Author

yoshd commented Aug 27, 2021

/assign @markmandel

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 8136e661-d63e-4ede-b8dc-d616d39e0916

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 996d43d8-4362-42a5-99b2-67ed07e1da9c

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2242/head:pr_2242 && git checkout pr_2242
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.17.0-65bdd23

@yoshd
Copy link
Contributor Author

yoshd commented Aug 27, 2021

@markmandel

Hi,

I have found an additional bug.
This is reproduced when the feature-gate of CustomFasSyncInterval is enabled, but sync is not set.
I think this is due to the fact that the default value of FleetAutoscaler is not being updated by the mutation webhook, so it is dereferencing the nil pointer.
I'm working on a fix now.

agones-controller:

E0827 17:32:55.257955       1 runtime.go:78] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 882 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic(0x1995bc0, 0x29c20b0)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:74 +0x95
k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:48 +0x89
panic(0x1995bc0, 0x29c20b0)
	/usr/local/go/src/runtime/panic.go:969 +0x1b9
agones.dev/agones/pkg/fleetautoscalers.(*Controller).createFasThread(0xc00023a000, 0x1e64640, 0xc00009f540, 0xc000a12000, 0x0, 0x0)
	/go/src/agones.dev/agones/pkg/fleetautoscalers/controller.go:348 +0x96
agones.dev/agones/pkg/fleetautoscalers.(*Controller).syncFleetAutoscalerWithCustomSyncInterval(0xc00023a000, 0x1e64640, 0xc00009f540, 0xc0009f21b0, 0x23, 0x0, 0x19b2d40)
	/go/src/agones.dev/agones/pkg/fleetautoscalers/controller.go:400 +0x3c7
agones.dev/agones/pkg/fleetautoscalers.(*Controller).syncFleetAutoscaler(0xc00023a000, 0x1e64640, 0xc00009f540, 0xc0009f21b0, 0x23, 0x0, 0xffffffffffffffff)
	/go/src/agones.dev/agones/pkg/fleetautoscalers/controller.go:203 +0x78
agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem(0xc0001057c0, 0x1e64640, 0xc00009f540, 0x0)
	/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:180 +0x292
agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker(0xc0001057c0, 0x1e64640, 0xc00009f540)
	/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:156 +0x3f
agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run.func1()
	/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:215 +0x3c
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00140cf88)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155 +0x5f
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0017cbf88, 0x1e1eec0, 0xc000d1aa80, 0xc00009f501, 0xc00011f260)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156 +0xad
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00140cf88, 0x3b9aca00, 0x0, 0x1, 0xc00011f260)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x98
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90
agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run(0xc0001057c0, 0x1e64640, 0xc00009f540)
	/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:215 +0xd7
created by agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).Run
	/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:204 +0x1d2
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x14f2fb6]

@markmandel
Copy link
Member

Thanks for jumping on it!

Comment on lines 13595 to 13603
- apiGroups:
- autoscaling.agones.dev
resources:
- "fleetautoscalers"
apiVersions:
- "v1"
operations:
- CREATE
- UPDATE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shows a lot of differences for some reason, but this is the only change.

@yoshd
Copy link
Contributor Author

yoshd commented Aug 27, 2021

@markmandel
I also fixed the bug in #2242 (comment), and confirmed that no error occurs on GKE with feature-gate's CustomFasSyncInterval both enabled and disabled.
It's ready for review!

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: ea68baba-344a-486d-8d74-4758dbcc793d

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member

Sorry for the delay - to fix this issue, please run make install, and it will regen the install.yaml (this needs to happen any time a change to the helm template happens).

But let me do a quick review!

@yoshd
Copy link
Contributor Author

yoshd commented Aug 31, 2021

@markmandel
Thank you for your review.
I thought it was probably make gen-install, and when I regenerated install.yaml, the unnecessary diffs were removed.

All fixes are complete!

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: ae457bcd-91b4-4a77-bc3a-e1de3f46125f

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member

Sorry yes! make gen-install.

Looks like a flakey test. Rerunning!

@markmandel
Copy link
Member

I was sure I wrote a review for some extra unit and e2e tests, but now I can't find it! 🤔 (I must have messed things up somewhere).

In https://github.com/googleforgames/agones/blob/main/test/e2e/fleetautoscaler_test.go#L72

Let's add a line that if the feature flag for this is enabled, check that the fas.spec.sync values are set to the default values.

I'd also recommend a unit test for the mutationHandler in the controller similar to the one that is here:

func TestControllerCreationMutationHandler(t *testing.T) {

That way, we have coverage on automated tests to ensure that default states are handled.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: e9658c2b-4893-44bf-99cf-3f69eefa8055

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 1a82d51b-8030-48f7-9905-41d3cf42e9bf

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 9164de7d-197c-40e5-82a1-312877654e5f

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2242/head:pr_2242 && git checkout pr_2242
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.17.0-9efcf5e

@markmandel
Copy link
Member

@cindy52 - I'd love to get this bug fix into the RC before we release stable. It's passing in its current form, but I would love a couple more automated tests (#2242 (comment)) - should we merge in its current form so we can get it in before release - or wait for the tests?

@yoshd
Copy link
Contributor Author

yoshd commented Aug 31, 2021

@markmandel
Thank you for your patience.
Added e2e and mutation tests for CustomFasSyncInterval in FleetAutoscaler.

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix! LGTM!

@cindy52 can we squeeze this bug fix into the stable release?

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, yoshd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cindy52
Copy link
Contributor

cindy52 commented Aug 31, 2021

Thanks for the quick fix! LGTM!

@cindy52 can we squeeze this bug fix into the stable release?

Will do!

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 7115c350-b011-4975-857f-108bd1e248f9

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: da4993d6-31dd-4b95-a8f8-3d55d3cf7318

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2242/head:pr_2242 && git checkout pr_2242
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.17.0-697cfb2

@markmandel markmandel added the kind/bug These are bugs. label Aug 31, 2021
@markmandel markmandel merged commit 445d09d into googleforgames:main Aug 31, 2021
@markmandel markmandel added this to the 1.17.0 milestone Aug 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants