-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gameserver is not removed when node hosting gameserver pod is shutdown #1102
Comments
I'm getting this too. Running Kubernetes on GKE with Agones v0.12.0. I have to delete the gameservers manually every time I 'turn on' (scale from 0 to 1) my development cluster. |
I've also found that if you (accidentally or purposefully) delete a pod backing a gameserver (e.g. Last time I did it, I asked @markmandel if this was expected behavior and (if I'm remembering correctly) he said that if someone is manually tinkering with resources under the gameserver it's ok to let people shoot themselves in the foot. And that the system should eventually self heal (but it sounds like that may not be happening for you). On the other hand, if scaling the cluster (rather than manually deleting resources) can also get you into this situation, then I think that might raise the priority for addressing it. Another question - how long did you wait with the pod in the pending state? I would expect that the pod going pending would cause the gameserver to go unhealthy. At which point if the game server is running as part of a fleet then the fleet controller would replace it with a fresh game server. It's possible that the check for game servers no longer running (in the controller since the sidecar can no longer provide health checks) doesn't catch this edge case. |
@roberthbailey In my case, the system has been left in that state over the weekend, and come Monday hasn't repaired itself. I always delete the gameservers shortly after scaling the cluster back up though, so maybe it only self-heals after scaling back up. It's worth noting that if I do |
Just to be clear, that pending pod has nothing to do with agones or my game servers. It is just another k8s Deployment in the same namespace. It is 0/1 pending because the cluster is at size 0 (no nodes available to run the pod). |
And to recover my game server state after a scale to 0 and back to 1, I just normally run Also, I should mention that I get in this situation not because I actually am scaling my cluster to 0, but because I have a cheap dev cluster of size 1 using GKE with preemptible instances. Every now and then the instance will be terminated and afterwards I need to manually delete non-existent game servers in order for the fleet to recover. While I think this specific case is unlikely to happen for production setups, I do worry that some other events could lead a production system to get into this condition. |
@DJSel
After scaling to 3 nodes in
So gameservers in a simple-udp (
|
@djsell Daniel, Were you scaling all your nodes in a cluster to 0, which led to removing If I scale both So probably it is better to have an issue, After restarting the controller state of the fleets are not accurate either current does not set to 0:
|
I was thinking that we need OwnerReference for GameServer backing pod, then I assume that Kubernetes garbage collector automatically should remove the GameServer CRD: |
@aLekSer Thanks for checking up on this. No, I did not disable health check. Yes, I am using the same cluster for
I believe this is correct. |
I will retest this, I think missingPodsController added recently should handle this properly as well. |
Testing on current GKE cluster configuration
Then I resized
Events for the one of gameservers:
@markmandel I think we can close this ticket now. |
Awesome. Closing ticket! |
What happened:
After a node hosting a game server shuts down, the game server is not removed from the game server list. A fleet will not replace the missing game server.
What you expected to happen:
The gameserver should be removed once the pod is no longer running, and the fleet should start a new gameserver up.
How to reproduce it (as minimally and precisely as possible):
I'm using GKE.
Create a cluster with node size 1
Create a fleet that is running a game server (I'm using a fleet autoscaler with bufferSize: 1, minReplicas: 1, maxReplicas: 5)
Resize the cluster to 0
GS will still be listed, even though no pod is listed
Resize the cluster to 1
Old GS will still be listed, no pod is running, no new pod is started
Anything else we need to know?:
Environment:
kubectl version
):Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.1", GitCommit:"4485c6f18cee9a5d3c3b4e523bd27972b1b53892", GitTreeState:"clean", BuildDate:"2019-07-18T14:25:20Z", GoVersion:"go1.12.7", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.3-gke.11", GitCommit:"cde86d2e1416a0c6c4bb964e1a13e8fa0a83a616", GitTreeState:"clean", BuildDate:"2019-08-12T20:57:47Z", GoVersion:"go1.12.5b4", Compiler:"gc", Platform:"linux/amd64"}
Cluster size 1:
Resize cluster to 0:
Resize cluster to 1:
The text was updated successfully, but these errors were encountered: