Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop logging Pods without Nodes yet as an Error #2014

Merged

Conversation

markmandel
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking
/kind bug

/kind cleanup

/kind documentation
/kind feature
/kind hotfix

What this PR does / Why we need it:

A common debugging red herring I see is people seeing an error that reads something akin to:

"message":"error retrieving node  for Pod game-servercbtld: node \"\" not found","severity":"error"

And (not unsurprisingly, it's an error) wondering if it causing whatever issue they are having with Agones.

Except these errors are totally expected, and normal, as Pods often aren't populated with the Node they live on straight away, so I wanted to have a way to be able to return an error from a workerqueue Handler function to retry, but not have it be logged at an Error level.

So that's what this does - provide a wrapper Error type that only gets logged at the Debug level, and the corresponding changes to make that error message above be logged at the Debug level and not cause so much noise.

Which issue(s) this PR fixes:

N/A

Special notes for your reviewer:

If anyone has a better name than "DebugError" please suggest. I wasn't 100% happy, but couldn't come up with anything better.

@markmandel markmandel added kind/cleanup Refactoring code, fixing up documentation, etc area/operations Installation, updating, metrics etc labels Mar 1, 2021
@google-cla google-cla bot added the cla: yes label Mar 1, 2021
@markmandel markmandel changed the title Stop logging Pod's without Node's yet as an Error Stop logging Pods without Nodes yet as an Error Mar 1, 2021
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 26ba2997-b2f6-4bd2-a6c1-108bafcb45e7

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2014/head:pr_2014 && git checkout pr_2014
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.13.0-b4e6533

@markmandel markmandel requested review from pooneh-m and roberthbailey and removed request for cyriltovena and dzlier-gcp March 2, 2021 21:16
@@ -37,6 +37,29 @@ const (
workFx = time.Second
)

// DebugError is a marker type for errors that that should only be logged at a Debug level.
// Useful if you want a Handler to be retried, but not logged at an Error level.
type DebugError struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type intended to be used outside of this package? Given that the constructor below returns error it seems like this should be package scoped and callers should use NewDebugError to create them.

pkg/util/workerqueue/workerqueue.go Show resolved Hide resolved
pkg/util/workerqueue/workerqueue_test.go Outdated Show resolved Hide resolved
@markmandel
Copy link
Member Author

Done! This should be good to go now. 👍

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 2d7115c2-6a16-4f0b-a4e7-100606d3bf94

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2014/head:pr_2014 && git checkout pr_2014
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.13.0-b26e212

// Error returns the error string
func (l *debugError) Error() string {
if l.err == nil {
return "nil"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be return ""?

https://play.golang.org/p/gbFHmpOtdtJ

Copy link
Member Author

@markmandel markmandel Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on that. In the rare occurrence in which it could be nil I thought it might be handy to be able to easily see that. Empty string so often disappears amongst other things.

Maybe it should be "<nil>" like the playground then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to put "", I guess I didn't paste the middle part in though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I go to edit your comment, I see you wrote <nil>, which gets escaped by Github! Cool. I'll make this change 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.....and done!

@@ -194,3 +195,14 @@ func TestWorkerQueueEnqueueAfter(t *testing.T) {
assert.Fail(t, "should have got a queue'd message by now")
}
}

func TestIsDebugError(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit but this doesn't just test IsDebugError any longer, so it should be renamed to TestDebugError instead.

pkg/util/workerqueue/workerqueue_test.go Show resolved Hide resolved
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 203cd8c4-a3ce-44a3-b570-7392332ab066

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 9185cd2b-71e3-45c7-902b-ad9d2fa5f5aa

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: abd7ca2a-2893-4401-a932-ab9dc9daa0c1

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member Author

Hmnn. Lots of flakiness agian

2021-03-04 19:09:18.748 PSTStep #22 - "e2e-stable": --- FAIL: TestFleetRollingUpdate/Use_fleet_Patch_true_25% (363.90s)
Info
2021-03-04 19:09:18.748 PSTStep #22 - "e2e-stable": fleet_test.go:264: error waiting for fleet condition on fleet simple-fleet-w6szw

@markmandel
Copy link
Member Author

2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": --- FAIL: TestGameServerReadyAllocateReady (149.27s)
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": gameserver_test.go:511:
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": Error Trace: gameserver_test.go:511
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": Error: Received unexpected error:
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": timed out waiting for the condition
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": waiting for GameServer to be Ready 1614911819/game-serverwwmxz
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": agones.dev/agones/test/e2e/framework.(*Framework).WaitForGameServerStateWithLogger
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": /go/src/agones.dev/agones/test/e2e/framework/framework.go:260
Info
2021-03-04 18:41:49.994 PSTStep #21 - "e2e-feature-gates": agones.dev/agones/test/e2e.TestGameServerReadyAllocateReady

This one seems common 🤔

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 718ad84a-10f2-4ec8-bd03-fdcdc08b88d3

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 79fa2378-004d-40e7-8e7d-c2eef83a5f7b

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2014/head:pr_2014 && git checkout pr_2014
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.13.0-009dac6

@markmandel
Copy link
Member Author

ERMAGHERD THAT WAS A ROUGH ONE

A common debugging red herring I see is people seeing an error that
reads something akin to:

```
"message":"error retrieving node  for Pod game-servercbtld: node \"\"
not found","severity":"error"
```

And (not unsurprisingly, it's an `error`) wondering if it causing
whatever issue they are having with Agones.

Except these errors are totally expected, and normal, as Pods often
aren't populated with the Node they live on straight away, so I wanted
to have a way to be able to return an error from a workerqueue Handler
function to retry, but not have it be logged at an Error level.

So that's what this does - provide a wrapper Error type that only gets
logged at the Debug level, and the corresponding changes to make that
error message above be logged at the Debug level and not cause so much
noise.
@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, roberthbailey

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [markmandel,roberthbailey]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 4a6f2a4d-52bb-41e9-9378-07f137447cdc

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 2f0ee545-4b21-418d-92cf-1ca501ede07d

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member Author

--- FAIL: TestGameServerAllocationDuringMultipleAllocationClients (84.88s)
    gameserverallocation_test.go:454: could not completed gsa1 allocation : an error on the server ("error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev \"preferred-p5pwn-h7x4d-g2qqz\": the object has been modified; please apply your changes to the latest version and try again") has prevented the request from succeeding (post gameserverallocations.allocation.agones.dev)
    gameserverallocation_test.go:474: 
        	Error Trace:	gameserverallocation_test.go:474
        	Error:      	Not equal: 
        	            	expected: 100
        	            	actual  : 99
        	Test:       	TestGameServerAllocationDuringMultipleAllocationClients

That's a legit flake.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 94797748-8580-4db1-9f73-7d7cf080ecc5

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/2014/head:pr_2014 && git checkout pr_2014
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.13.0-b229934

@markmandel markmandel added this to the 1.13.0 milestone Mar 9, 2021
@markmandel markmandel merged commit 3db9b97 into googleforgames:main Mar 9, 2021
@markmandel markmandel deleted the refactor/log-level-wq-err branch March 9, 2021 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/operations Installation, updating, metrics etc cla: yes kind/cleanup Refactoring code, fixing up documentation, etc size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants