Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Presence of >1 inactive GSS can cause a rolling update to become stuck until an allocation ends #2574

Closed
ivotimev opened this issue May 11, 2022 · 6 comments
Labels
kind/bug These are bugs.
Milestone

Comments

@ivotimev
Copy link

ivotimev commented May 11, 2022

What happened:
Setup:

fleetautoscaler.yaml
---
...
Spec:
   FleetName: ...
   Policy:
      BufferSize: 10
      MaxReplicas: 100
      MinReplicas: 10
...
fleet.yaml
---
Spec:
    Replicas: 10
    Strategy:
        RollingUpdate:
            MaxSurge: 25%
            MaxUnavailable: 25%
  1. Created 5 allocations (v0)
  2. Updated Fleet spec (v1)
  3. Waited until all non-allocated GameServers with the old spec were terminated
  4. Created 1 allocation (v1)
  5. Updated Fleet spec (v2)

As a result, the second rolling update became stuck in the following state:

$ kubectl get gss --all-namespaces
... SCHEDULING   DESIRED   CURRENT   ALLOCATED   READY   AGE
v2  Packed       10        10        0           10      9m51s
v1  Packed       2         2         1           1       10m
v0  Packed       0         5         5           0       13m

There is a v1 GameServer Ready 10m after v2 was created. It'll remain until one of the allocations from v1 or v0 ends. While this is unlikely to cause an issue in an environment with a lot of allocation churn, it can lead to poor development experience in test environments where updates happen often and GameServers remain allocated for long.

What you expected to happen:
The v1 GameServerSet to be scaled to 0 Desired replicas. All Ready v1 replicas to be terminated as soon as the buffer size was satisfied.

How to reproduce it (as minimally and precisely as possible):
Minimal reproduction as unit test: #2575

Anything else we need to know?:

Environment:

  • Agones version: 1.22.0
  • Kubernetes version (use kubectl version): v1.21.5
  • Cloud provider or hardware configuration: rke2r2
  • Install method (yaml/helm): helm
@ivotimev ivotimev added the kind/bug These are bugs. label May 11, 2022
@ivotimev ivotimev changed the title Presence of >2 GSS can cause a rolling update to become stuck until an allocation ends [WIP] Presence of >2 GSS can cause a rolling update to become stuck until an allocation ends May 11, 2022
@ivotimev ivotimev reopened this May 11, 2022
@roberthbailey
Copy link
Member

Is this a similar issue to #2432?

@ivotimev
Copy link
Author

Hey @roberthbailey I think it is similar to #2432, but that would have gotten resolved with #2420 .

More generally I think #2420 fixed issues involving 1 inactive GSS with allocated replicas, however the issue may still occur if there's more than 1.

@ivotimev ivotimev changed the title [WIP] Presence of >2 GSS can cause a rolling update to become stuck until an allocation ends Presence of >2 GSS can cause a rolling update to become stuck until an allocation ends May 11, 2022
@ivotimev ivotimev changed the title Presence of >2 GSS can cause a rolling update to become stuck until an allocation ends Presence of >1 inactive GSS can cause a rolling update to become stuck until an allocation ends May 11, 2022
@roberthbailey
Copy link
Member

I think this may have been fixed by #2623 - we should re-test.

@markmandel
Copy link
Member

RC comes out tomorrow, so would be a good opportunity 😄

@markmandel
Copy link
Member

Just touching base again, see if this issue has been resolved for you?

@ivotimev
Copy link
Author

ivotimev commented Aug 2, 2022

Sorry for the delay to reply, can confirm that this has been resolved as of 1.24.0, testing with the reproduction above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug These are bugs.
Projects
None yet
Development

No branches or pull requests

3 participants