Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect error logging caused by ResourceVersion conflict on SDK Patch call #3967

Closed
daniellee opened this issue Aug 27, 2024 · 2 comments · Fixed by #3973
Closed

Incorrect error logging caused by ResourceVersion conflict on SDK Patch call #3967

daniellee opened this issue Aug 27, 2024 · 2 comments · Fixed by #3973
Assignees
Labels
kind/bug These are bugs.

Comments

@daniellee
Copy link
Contributor

What happened:
After upgrading to 1.41, we started getting a couple of new error messages in the logs for the Agones sidecar:

{"message":"the server rejected our request due to an error in our request","severity":"error","time":"2024-08-27T05:11:38.171542173Z"} 

and:

{"error":"the server rejected our request due to an error in our request","gsKey":"updateLabel","message":"","queue":"agones.dev.gameservers.xxx","severity":"error","source":"*sdkserver.SDKServer","time":"2024-08-27T00:33:16.155807016Z"}

and:

{"message":"could not update GameServer gameservers/xxx to state Shutdown: the server rejected our request due to an error in our request","severity":"error","stack":["agones.dev/agones/pkg/sdkserver.(*SDKServer).updateState\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:412","agones.dev/agones/pkg/sdkserver.(*SDKServer).syncGameServer\n\t/go/src/agones.dev/agones/pkg/sdkserver/sdkserver.go:331","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:182","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:158","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run.func1\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226","k8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:217","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"],"time":"2024-08-26T10:06:47.046294688Z"} 

It is due to the change in #3803 that adds a test Json Patch operation that checks the ResourceVersion and rejects patch updates if there is a conflict. If two calls are made to update a field in the gameserver spec (like setting a label/annotation or changing the state) in quick succession then a conflict can happen.

After the failed patch update, it gets retried and then succeeds - so I don't think this is actually an error and is just how Kubernetes works.

What you expected to happen:

For this error message to not be an error or to be debug logging.

How to reproduce it (as minimally and precisely as possible):

The error message is from Kubernetes and can be recreated with kubectl by setting an older/smaller value for the /metdata/resourceVersion field for a running gameserver:

kubectl patch -v=8 gameserver xxx-aaa-bbb --type json -p '[{"op":"test","path":"/metadata/resourceVersion","value":"37909512"},{"op":"replace","path":"/status/state","value":"RequestReady"}]'

To recreate with Agones, then two clients setting a label or annotation on the same gameserver at the same time will generate the error.

Anything else we need to know?:

Environment:

  • Agones version: 1.41
  • Kubernetes version (use kubectl version): v1.29.6-gke.1038001
  • Cloud provider or hardware configuration: GKE
  • Install method (yaml/helm): helm
  • Troubleshooting guide log(s):
  • Others:
@daniellee daniellee added the kind/bug These are bugs. label Aug 27, 2024
@markmandel
Copy link
Member

I know why this is -- the conflict doesn't reigster under IsConflict . Should be easy enough to fix, I'm just trying to work out how to force a conflict in the SDK Sidecar 🤦🏻 😁 so I can at least see it in action.

@markmandel
Copy link
Member

Got it! Fix coming shortly.

@markmandel markmandel self-assigned this Aug 27, 2024
markmandel added a commit to markmandel/agones that referenced this issue Aug 28, 2024
Missed this little of nuance in code review when moving to Patch with
the SDK Sidecar - in that a Patch operation doesn't return a Conflict
error type/status value if it fails, so every time it does, the Agones
sidecar will log it as an error on production installs.

So this cleaned up a few things:

* renamed "debugError" type to "traceError" type, since we only log it
under Trace, not Debug.
* if `patchGameServer` fails under the error `IsInvalid`, wrap it in a
TraceError so it doesn't show up in prod.
* Wrapped the error with `errors` context, such that if the patch does
fail, it's far easier now to work out where it failed.

Closes googleforgames#3967
markmandel added a commit to markmandel/agones that referenced this issue Aug 28, 2024
Missed this little of nuance in code review when moving to Patch with
the SDK Sidecar - in that a Patch operation doesn't return a Conflict
error type/status value if it fails, so every time it does, the Agones
sidecar will log it as an error on production installs.

So this cleaned up a few things:

* renamed "debugError" type to "traceError" type, since we only log it
under Trace, not Debug.
* if `patchGameServer` fails under the error `IsInvalid`, wrap it in a
TraceError so it doesn't show up in prod.
* Wrapped the error with `errors` context, such that if the patch does
fail, it's far easier now to work out where it failed.

Closes googleforgames#3967
markmandel added a commit to markmandel/agones that referenced this issue Aug 28, 2024
Missed this little of nuance in code review when moving to Patch with
the SDK Sidecar - in that a Patch operation doesn't return a Conflict
error type/status value if it fails, so every time it does, the Agones
sidecar will log it as an error on production installs.

So this cleaned up a few things:

* renamed "debugError" type to "traceError" type, since we only log it
under Trace, not Debug.
* if `patchGameServer` fails under the error `IsInvalid`, wrap it in a
TraceError so it doesn't show up in prod.
* Wrapped the error with `errors` context, such that if the patch does
fail, it's far easier now to work out where it failed.

Closes googleforgames#3967
markmandel added a commit to markmandel/agones that referenced this issue Sep 12, 2024
Missed this little of nuance in code review when moving to Patch with
the SDK Sidecar - in that a Patch operation doesn't return a Conflict
error type/status value if it fails, so every time it does, the Agones
sidecar will log it as an error on production installs.

So this cleaned up a few things:

* renamed "debugError" type to "traceError" type, since we only log it
under Trace, not Debug.
* if `patchGameServer` fails under the error `IsInvalid`, wrap it in a
TraceError so it doesn't show up in prod.
* Wrapped the error with `errors` context, such that if the patch does
fail, it's far easier now to work out where it failed.

Closes googleforgames#3967
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants