-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Pod deletion during unavailable controller #1279
Conversation
/cc @KamiMay just letting you know this is in the queue. Would like to merge it after the stable release, to give it some time to bake. |
Build Failed 😱 Build Id: cb21efef-1805-474b-b047-a851eb46de71 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Pasting here, to keep track. Looks like I introduced a panic.
|
4232154
to
e13c004
Compare
Build Failed 😱 Build Id: 988c3e1b-faa4-40e0-b0f4-84c9f047a92f To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
e13c004
to
9501a7c
Compare
Build Failed 😱 Build Id: 33be285f-6095-4308-ac18-0184536fecb8 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
9501a7c
to
46afcac
Compare
Build Succeeded 👏 Build Id: 40d1a33f-fb9b-46d8-9a85-c376f5174350 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
46afcac
to
d7ef191
Compare
Build Succeeded 👏 Build Id: 76db447b-5b82-4711-92d1-52b9b7438da0 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
Oooh! I think I know how to e2e test this. Please still free to review, but will add an e2e test shortly. |
d7ef191
to
1e81aca
Compare
Added e2e test suite for controller crashes. |
Build Failed 😱 Build Id: 75552c42-6e57-44d5-ac05-29cb11ae41f1 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: b80cb67c-5b59-4987-b758-7d8f715e105a To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: e8ea7b45-f8a5-49cf-bd7c-ba56cd16cdcc The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
That's a great question! There is a check in here for dev gameservers: And a Unit test as well: I didn't write a e2e, as I figure the e2e dev gameserver test would suffice (it would at least fail sometimes). The only way I could think to e2e test it is to wait 30 seconds - but do we want that slow an e2e test? WDYT? @roberthbailey rebased! |
Build Failed 😱 Build Id: d098bbf3-2190-4785-8f8d-aae8795ccbef To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: 6213e9fb-cc3e-4b61-b803-b2b9aaca5578 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
That's great that Dev Gameservers are now also supported. |
If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access master, when the controller comes back up, the GameServer is left in a zombie state in which it could be Allocated, but there is no Pod process to back it. Ideally, scenarios like this shouldn't happen, but it is possible, depending on user interaction with Kubernetes, so we should cover the scenario, as it requires manual intervention to fix otherwise. This PR implements a controller that periodically checks GameServers to ensure they have backing Pods, such that if this happens the GameServer is marked as Unhealthy, and a Fleet can eventually return to a healed, stable state, and not require manual intervention. Closes googleforgames#1170 Closes googleforgames#398 (especially combined with fix for googleforgames#1245)
7eda264
to
fda22a9
Compare
Build Failed 😱 Build Id: 1c60176f-02b8-421d-8218-451a3e566cd6 To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 95345276-e7ab-4433-b60e-91408651485a To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Failed 😱 Build Id: 499b7b58-ce15-4793-8efa-39b8d45c6dfe To get permission to view the Cloud Build view, join the agones-discuss Google Group. |
Build Succeeded 👏 Build Id: aae1b4f0-cfec-44b2-900a-7942646551b6 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
pkg/gameservers/missing_test.go
Outdated
"k8s.io/client-go/tools/cache" | ||
) | ||
|
||
func TestIsBeforePodCreated(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment was lost in my last review, but this test seems like it belongs in pkg/gameservers/gameservers_test.go
instead of here, since it's testing code in pkg/gameservers/gameservers.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh! yeah. Moving!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, nothing to add to Robert's comments.
fda22a9
to
118c7af
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aLekSer, markmandel, roberthbailey The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Build Succeeded 👏 Build Id: e953efe1-a8f8-4f23-8f66-d0878c68b5a9 The following development artifacts have been built, and will exist for the next 30 days:
A preview of the website (the last 30 builds are retained): To install this version:
|
* Fix for Pod deletion during unavailable controller If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access master, when the controller comes back up, the GameServer is left in a zombie state in which it could be Allocated, but there is no Pod process to back it. Ideally, scenarios like this shouldn't happen, but it is possible, depending on user interaction with Kubernetes, so we should cover the scenario, as it requires manual intervention to fix otherwise. This PR implements a controller that periodically checks GameServers to ensure they have backing Pods, such that if this happens the GameServer is marked as Unhealthy, and a Fleet can eventually return to a healed, stable state, and not require manual intervention. Closes googleforgames#1170 Closes googleforgames#398 (especially combined with fix for googleforgames#1245)
If a Pod gets deleted, especially during GameServer Ready or Allocated state, and the controller is either crashed, missing or unable to access master, when the controller comes back up, the GameServer is left in a zombie state in which it could be Allocated, but there is no Pod process
to back it.
Ideally, scenarios like this shouldn't happen, but it is possible, depending on user interaction with Kubernetes, so we should cover the scenario, as it requires manual intervention to fix otherwise.
This PR implements a controller that periodically checks GameServers to ensure they have backing Pods, such that if this happens the GameServer is marked as Unhealthy, and a Fleet can eventually return to a healed, stable state, and not require manual intervention.
Closes #1170
Closes #398 (especially combined with fix for #1245)