-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky: TestGameServerUnhealthyAfterDeletingPod #968
Flaky: TestGameServerUnhealthyAfterDeletingPod #968
Conversation
Build Failed 😱 Build Id: 25ab8109-adaa-4353-9ed4-e28891d88cbc To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
5b4591c
to
278e713
Compare
Is it easy to run this test say 100 times before and after this change to a) see if you can reproduce the error and b) verify that this fixes it? Also, I'm always a bit leery of changes to tests that just twiddle timeouts as they can mask actual issues that should be investigated. |
I can do that locally for sure! In this case, I figured that the gameserver being removed is an event that either happens, or it doesn't. Given that the default terminationGracePeriodSeconds is 30 seconds, and controller resync occurs every 30 seconds - I'm wondering if there is a chance that sometimes the pod takes ~ 30 seconds to actually be deleted, and then if the event gets missed by the controller for some reason, then it'll take another 30 seconds to resync the event and ping that it is missing - which hits the 60 second timeout on this test -- but otherwise it would still actually pass. So I figured a bump in timeout may be appropriate here. If my theory is off, the test will remain being flaky, in which case I'm totally wrong, and something else is afoot. |
Your theory makes sense. It would be awesome to have some supporting evidence (e.g. 2/100 tests failed before, now 0/100 fail). |
Love it! 👍 https://varunksaini.com/blog/run-go-test-multiple-times/ Makes it very easy to do. I will do this! (just not right now, I have to run off) |
Build Succeeded 👏 Build Id: 38ace15c-b585-44ec-b37a-10a26776b4a2 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Since this only changes test code, I'm ok merging it between the 0.12.0 RC and 0.12.0 release. |
I have run it around 50 time so far, without an error - but hit the default go test timeout of 10 minutes 😕 will rerun it with more soon. |
1 minute may not be enough time. Let's find out.
Ran 100 times, with the 3m timeout, no errors. For reference (i.e. in case I need to find it again), here is the command I used: Running it 100 times with the 1m timeout, also passed... so now I have no idea. I figured this was a good first educated guess at a solution. |
278e713
to
51db834
Compare
Build Succeeded 👏 Build Id: 7e149715-d334-4920-9166-131b8e9d7acb The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Build Failed 😱 Build Id: 01d7c27d-1476-474a-829b-f8805b6a2ab4 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 983aef01-343b-4864-a798-f7c8709d9c47 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 36cd084e-4730-4051-b967-1b66015eae7b To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 4677894f-7a16-4159-a7a5-300e7c322700 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
1 minute may not be enough time. Let's find out.