Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ShutdownReplicas count #810

Merged
merged 1 commit into from
Jun 6, 2019

Conversation

aLekSer
Copy link
Collaborator

@aLekSer aLekSer commented Jun 5, 2019

Perform graceful scale down of inactive GameServerSet on RollingUpdate.
Introduce ShutdownReplicas count for GameServerSet Status, to better control life cycle of the GameServerSet, used by a fleet.
Closes #799.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 5, 2019

Some make test-go could be broken.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 29f0811f-ee80-49bb-a746-38590c15c037

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 5ef7427d-229d-443a-8e04-b649b1bd1b3d

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 09044425-6954-45a8-9ae5-643da2047326

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 6, 2019

E2E test succeed locally, trying to write different waiting scenario, the test which fails for current master and working as expected without overshooting Gameservers. And check that old GSSet does not put all GS into Shutdown all at once.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 11f62e4d-2dfd-4a62-beec-8bcdaca58527

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 08777b7b-108f-4897-bf2e-1070dc3f75ef

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

Perform graceful scale down of inactive GameServerSet on RollingUpdate.
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: bc183c38-ab1c-44b2-ab2c-ffbb9bf84cc5

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/810/head:pr_810 && git checkout pr_810
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.11.0-81e0e73

@aLekSer aLekSer marked this pull request as ready for review June 6, 2019 17:07
@markmandel
Copy link
Member

Going to run this one more time, just to make sure there is nothing flaky in it, but giving it a review now 👍

@markmandel markmandel added kind/bug These are bugs. area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Jun 6, 2019
@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 6de39c10-736e-4a2a-a294-b66663e0304c

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member

That wasn't you - that was a flaky unit test. 🔴 🐟

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 6, 2019

Failed on pkg/gameservers not fleets, seems not related to this change.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 7c83c1c1-e311-4470-aaea-c4c983346ae9

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/810/head:pr_810 && git checkout pr_810
  • helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.11.0-81e0e73

@markmandel
Copy link
Member

So I just took this for a spin locally. One thing seems weird to me.

This is my fleet:

Name:         simple-udp
Namespace:    default
API Version:  stable.agones.dev/v1alpha1
Kind:         Fleet
Metadata:
  Creation Timestamp:  2019-06-06T18:34:04Z
  Generation:          4
  Resource Version:    7917748
  Self Link:           /apis/stable.agones.dev/v1alpha1/namespaces/default/fleets/simple-udp
  UID:                 a99ee48b-8889-11e9-b3ed-42010aa80085
Spec:
  Replicas:    20
  Scheduling:  Packed
  Strategy:
    Rolling Update:
      Max Surge:        25%
      Max Unavailable:  25%
    Type:               RollingUpdate
  Template:
    Metadata:
      Creation Timestamp:  <nil>
    Spec:
      Health:
      Ports:
        Container Port:  9000
        Name:            default
      Template:
        Metadata:
          Creation Timestamp:  <nil>
        Spec:
          Containers:
            Image:  gcr.io/agones-images/udp-server:0.9
            Name:   simple-udp

What I noticed when a RollingUpdate occurred (I change the container port), I have 30 gameserver in flight at a given moment:

NAME                     STATE      ADDRESS         PORT      NODE                                     AGE
simple-udp-slfd9-5lqhg   Ready      35.235.114.81   7728      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-8h6bs   Ready      35.235.114.81   7131      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-9gdx8   Shutdown   35.235.114.81   7560      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-cnvk4   Shutdown   35.235.114.81   7558      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-lz2jp   Ready      35.235.114.81   7113      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-m7x44   Ready      35.235.114.81   7047      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-nvhpv   Shutdown   35.235.114.81   7177      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-pfbxp   Ready      35.235.114.81   7354      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-qqg4x   Shutdown   35.235.114.81   7072      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-slfd9-vvtll   Shutdown   35.235.114.81   7049      gke-test-cluster-default-f11755a7-57br   2m
simple-udp-szlxw-2gx62   Ready      35.235.114.81   7136      gke-test-cluster-default-f11755a7-57br   22s
simple-udp-szlxw-2hgr4   Ready      35.235.114.81   7571      gke-test-cluster-default-f11755a7-57br   7s
simple-udp-szlxw-5hq4z   Ready      35.235.114.81   7676      gke-test-cluster-default-f11755a7-57br   22s
simple-udp-szlxw-5wqnb   Ready      35.235.114.81   7160      gke-test-cluster-default-f11755a7-57br   7s
simple-udp-szlxw-87vvz   Ready      35.235.114.81   7375      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-b86vw   Ready      35.235.114.81   7308      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-bjdhm   Ready      35.235.114.81   7321      gke-test-cluster-default-f11755a7-57br   22s
simple-udp-szlxw-fldxp   Ready      35.235.114.81   7821      gke-test-cluster-default-f11755a7-57br   22s
simple-udp-szlxw-gd7nz   Ready      35.235.114.81   7720      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-hgnzg   Ready      35.235.114.81   7087      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-jzwxw   Ready      35.235.114.81   7079      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-p4x5t   Ready      35.235.114.81   7656      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-p7xkl   Ready      35.235.114.81   7316      gke-test-cluster-default-f11755a7-57br   7s
simple-udp-szlxw-qwmzq   Ready      35.235.114.81   7387      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-sm2tc   Ready      35.235.114.81   7185      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-vc597   Ready      35.235.114.81   7432      gke-test-cluster-default-f11755a7-57br   38s
simple-udp-szlxw-vjk4h   Ready      35.235.114.81   7403      gke-test-cluster-default-f11755a7-57br   22s
simple-udp-szlxw-vjsqz   Ready      35.235.114.81   7348      gke-test-cluster-default-f11755a7-57br   7s
simple-udp-szlxw-vtncv   Ready      35.235.114.81   7400      gke-test-cluster-default-f11755a7-57br   7s
simple-udp-szlxw-w46vc   Ready      35.235.114.81   7315      gke-test-cluster-default-f11755a7-57br   38s

Shouldn't it be 25 in flight? I have a 25% max surge, and 1.15 * 20 = 25 - unless I missing something?

I think the math is off somewhere.

@markmandel
Copy link
Member

Oh wait - is this because we have 5 actively being shut down, while we spin up 5 new ones? Hence we end up at 30? If so, that makes sense 👍

@aLekSer
Copy link
Collaborator Author

aLekSer commented Jun 6, 2019

Yes, so 5 Gameservers got added as MaxSurge to Active GSS and MaxUnavailable got subtracted from rest GSSs.
As you can see I added this statement in a test case:

target+math.Ceil(target*float64(maxSurge)/100.)+math.Ceil(target*float64(maxUnavailable)/100.)

For 10 Replicas we would have max value of 16. If we have 10% it would be 12.
It seems that we should document these and some other requirements somewhere. To know how the process should look like in most common use cases.

Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Nice work.

@markmandel markmandel merged commit 51358ba into googleforgames:master Jun 6, 2019
@markmandel markmandel added this to the 0.11.0 milestone Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fleet Rolling Update doesn't seem to be rolling
3 participants