Rolling updates should wait for batches to become healthy before iterating #1625

jonbuckley33 · 2020-06-12T17:23:40Z

Is your feature request related to a problem? Please describe.
Our game servers take a little while to boot up (a few min), so when we perform a rolling update with the default settings (25% maxUnavailable/maxSurge), the whole fleet is taken down before any of the updated game servers can become "Ready."

Describe the solution you'd like
We would like for Agones to wait for each batch of updated game servers to become Ready before updating another batch of game servers. (One potential extension of this is that Agones could abort an update if the game servers don't become Ready within a certain period.)

Describe alternatives you've considered
Reducing the maxUnavailable/maxSurge percentage may mitigate the issue.

aLekSer · 2020-06-15T15:04:27Z

Thanks for posting this issue. There is a Note section in the documents:
https://agones.dev/site/docs/guides/fleet-updates/#rolling-update-strategy
An issue is actually related with boot up time.

When Fleet update contains only changes to the replicas parameter, then new GameServers will be created/deleted straight away, which means in that case maxSurge and maxUnavailable parameters for a RollingUpdate will not be used. The RollingUpdate strategy takes place when you update spec parameters other than replicas.

Do you use FleetAutoScaler with a Fleet?

I was able to reproduce your issue.

aLekSer · 2020-06-15T15:12:46Z

Steps to reproduce are below. @jonbuckley33 please add more comments if your steps was a bit different.

kubectl apply -f the fleet:

fleet.yaml

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: simple-udp
spec:
  replicas: 20
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
  template:
    spec:
      ports:
      - name: default
        containerPort: 7654
      template:
        spec:
          containers:
          - name: simple-udp
            image: gcr.io/agones-images/udp-server:0.21
            resources:
              requests:
                memory: "64Mi"
                cpu: "20m"
              limits:
                memory: "64Mi"
                cpu: "20m"
            args:
              - "-ready=true"

Wait 20 Ready GameServers to become available.
edit fleet.yaml. Add
args:
- "-ready=false"
Which means all gameservers could become ready only after connecting to them.
Apply the fleet with -ready=false
See that all new GameServers in portions of 25% would become in a Scheduled state.

aLekSer · 2020-06-15T21:00:41Z

Deployment is waiting Pods to become ready before the second wave of scaling down old Pods https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy

For example, when this value [max unavailable] is set to 30%, the old ReplicaSet can be scaled down to 70% of desired Pods immediately when the rolling update starts. Once new Pods are ready, old ReplicaSet can be scaled down further,

jonbuckley33 · 2020-06-16T04:02:15Z

Thanks for posting concise repro steps -- that's exactly the behavior I am seeing.

Do you use FleetAutoScaler with a Fleet?

No, I do not.

Excited to see that you have a PR out that may fix this. I appreciate the quick turnaround! :)

aLekSer · 2020-09-13T11:04:56Z

I can add steps number 6 and 7 to verify:
6. Edit a fleet and change containerPort
k get gss would return 3 gss now
7. Fix Container Port and set ready=true:
Now k get gss should return only one gss.
On master after all these steps we have actually one GSS. Fixing my PR to fix this new regression.

jonbuckley33 added the kind/feature New features for Agones label Jun 12, 2020

aLekSer self-assigned this Jun 15, 2020

aLekSer added kind/bug These are bugs. and removed kind/feature New features for Agones labels Jun 15, 2020

aLekSer mentioned this issue Jun 15, 2020

Fix Fleets RollingUpdate #1626

Merged

aLekSer pinned this issue Sep 18, 2020

markmandel closed this as completed in #1626 Sep 22, 2020

markmandel added this to the 1.9.0 milestone Sep 22, 2020

markmandel unpinned this issue Sep 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rolling updates should wait for batches to become healthy before iterating #1625

Rolling updates should wait for batches to become healthy before iterating #1625

jonbuckley33 commented Jun 12, 2020

aLekSer commented Jun 15, 2020 •

edited

Loading

aLekSer commented Jun 15, 2020 •

edited

Loading

aLekSer commented Jun 15, 2020 •

edited

Loading

jonbuckley33 commented Jun 16, 2020

aLekSer commented Sep 13, 2020 •

edited

Loading

Rolling updates should wait for batches to become healthy before iterating #1625

Rolling updates should wait for batches to become healthy before iterating #1625

Comments

jonbuckley33 commented Jun 12, 2020

aLekSer commented Jun 15, 2020 • edited Loading

aLekSer commented Jun 15, 2020 • edited Loading

aLekSer commented Jun 15, 2020 • edited Loading

jonbuckley33 commented Jun 16, 2020

aLekSer commented Sep 13, 2020 • edited Loading

aLekSer commented Jun 15, 2020 •

edited

Loading

aLekSer commented Jun 15, 2020 •

edited

Loading

aLekSer commented Jun 15, 2020 •

edited

Loading

aLekSer commented Sep 13, 2020 •

edited

Loading