Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agones fails to start the pod after update cpu limits to 1000m #1184

Closed
topochan opened this issue Nov 13, 2019 · 18 comments · Fixed by #1188
Closed

Agones fails to start the pod after update cpu limits to 1000m #1184

topochan opened this issue Nov 13, 2019 · 18 comments · Fixed by #1188
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Milestone

Comments

@topochan
Copy link
Contributor

topochan commented Nov 13, 2019

What happened:

After update deployment we saw the pods not be able to start, the update was change it from 1500m to 1000m

What you expected to happen:

nothing crashes

How to reproduce it (as minimally and precisely as possible):

deploy a fleet with gameserver cpu limits to 1000m, it will work without any issue, update the fleet with other image tag, label or other change and kubectl apply -f the manifest again, you will see all the pods not being able to start.

Anything else we need to know?:

Looks like when the fleet is created does the conversion from 1000m to 1 in the cpu limit, but not in the update of the fleet.

Environment:

  • Agones version: 1.1.0
  • Kubernetes version (use kubectl version):
    Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.10", GitCommit:"e3c134023df5dea457638b614ee17ef234dc34a6", GitTreeState:"clean", BuildDate:"2019-07-08T03:40:54Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}
  • Cloud provider or hardware configuration: AWS/kops
  • Install method (yaml/helm): yaml
@topochan topochan added the kind/bug These are bugs. label Nov 13, 2019
@markmandel
Copy link
Member

Can you explain what "crashing of the pods" means?

If you could provide a kubectl describe pod of one of the pods in question, that would be useful.

Are you able to reproduce with our simple-udp example? Having a repro case would also be useful

@topochan
Copy link
Contributor Author

It should happen with the simple-udp, I can later try to reproduce in minikube a create a full example.

I rephrased "crashing of the pods" with "pods are not able to start" basically the behaviour is that new gameserver pod never start, they are intermediately terminated.

I will try to create a simple-udp example to help to debug the issue

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 14, 2019

@topochan I was able to reproduce the issue - Gameservers stuck in a Creating state after changing of CPU limits to 1000m (was 20m) and updating image: gcr.io/agones-images/udp-server:0.17
to image: gcr.io/agones-images/udp-server:0.16.
Now I see 7 Gameservers in a fleet but 2 Replicas was in config
Events in a fleet could be found here:
https://gist.github.com/aLekSer/07d667c4d239538b3e615b3f16802c5c

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 14, 2019

Errors in logs of Agones controller:

{"error":"error updating GameServer  to Starting state: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-w8cgb-4v9s9\": the object has been modified; please apply your changes to the latest version and try again","gsKey":"default/simple-udp-w8cgb-4v9s9","message":"","queue":"agones.dev.GameServerControllerCreation","severity":"error","source":"*gameservers.Controller","subqueue":"creation","time":"2019-11-14T12:33:18.685533286Z"}                                                                            
{"message":"error updating GameServer  to Starting state: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-w8cgb-4v9s9\": the object has been modified; please apply your changes to the latest version and try again","severity":"error","stack":["agones.dev/agones/pkg/gameservers.(*Controller).syncGameServerCreatingState\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:489","agones.dev/agones/pkg/gameservers.(*Controller).syncGameServer\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:373","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:152","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:128","k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:180","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"],"time":"2019-11-14T12:33:18.685645256Z"}
{"error":"error updating GameServer  to default values: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-8cp2s-7tj56\": the object has been modified; please apply your changes to the latest version and try again","gsKey":"default/simple-udp-8cp2s-7tj56","message":"","queue":"agones.dev.GameServerControllerCreation","severity":"error","source":"*gameservers.Controller","subqueue":"creation","time":"2019-11-14T12:33:18.874096813Z"}                                                                            
{"message":"error updating GameServer  to default values: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-8cp2s-7tj56\": the object has been modified; please apply your changes to the latest version and try again","severity":"error","stack":["agones.dev/agones/pkg/gameservers.(*Controller).syncGameServerPortAllocationState\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:454","agones.dev/agones/pkg/gameservers.(*Controller).syncGameServer\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:370","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:152","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:128","k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:180","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"],"time":"2019-11-14T12:33:18.8742146Z"}
{"error":"error updating GameServer  to Starting state: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-xrmxn-psrrf\": the object has been modified; please apply your changes to the latest version and try again","gsKey":"default/simple-udp-xrmxn-psrrf","message":"","queue":"agones.dev.GameServerControllerCreation","severity":"error","source":"*gameservers.Controller","subqueue":"creation","time":"2019-11-14T12:33:19.212432262Z"}                                                                            
{"message":"error updating GameServer  to Starting state: Operation cannot be fulfilled on gameservers.agones.dev \"simple-udp-xrmxn-psrrf\": the object has been modified; please apply your changes to the latest version and try again","severity":"error","stack":["agones.dev/agones/pkg/gameservers.(*Controller).syncGameServerCreatingState\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:489","agones.dev/agones/pkg/gameservers.(*Controller).syncGameServer\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:373","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:152","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:128","k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:180","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"],"time":"2019-11-14T12:33:19.212540214Z"}

Some additional logs which might be useful:

{"gs":{"metadata":{"name":"simplest-udp3-k7sph-fbw7j","generateName":"simplest-udp3-k7sph-","namespace":"default","selfLink":"/apis/agones.dev/v1/namespaces/default/gameservers/simplest-udp3-k7sph-fbw7j","uid":"7deed1e6-06de-11ea-a918-42010a8a00cb","resourceVersion":"155084","generation":1,"creationTimestamp":"2019-11-14T12:58:45Z","labels":{"agones.dev/fleet":"simplest-udp3","agones.dev/gameserverset":"simplest-udp3-k7sph"},"annotations":{"agones.dev/sdk-version":"1.2.0-eefc7ad"},"ownerReferences":[{"apiVersion":"agones.dev/v1","kind":"GameServerSet","name":"simplest-udp3-k7sph","uid":"7dcc723e-06de-11ea-a918-42010a8a00cb","controller":true,"blockOwnerDeletion":true}],"finalizers":["agones.dev"},"spec":{"container":"simple-udp","ports":[{"name":"default","portPolicy":"Dynamic","containerPort":7654,"hostPort":7396,"protocol":"UDP"}],"health":{"periodSeconds":5,"failureThreshold":3,"initialDelaySeconds":5},"scheduling":"Packed","sdkServer":{"logLevel":"Info","grpcPort":59357,"httpPort":59358},"template":{"metadata":{"creationTimestamp":null},"spec":{"containers":[{"name":"simple-udp","image":"gcr.io/agones-images/udp-server:0.16","resources":{"limits":{"cpu":"1","memory":"64Mi"},"requests":{"cpu":"1","memory":"64Mi"}}}]}}},"status":{"state":"Starting","ports":null,"address":"","nodeName":"","reservedUntil":null}},"gsKey":"default/simplest-udp3-k7sph-fbw7j","message":"Syncing Starting GameServerState","severity":"info","source":"*gameservers.Controller","time":"2019-11-14T12:58:45.445973068Z"}
{"error":"error getting external address for GameServer simplest-udp3-k7sph-fbw7j: error retrieving node  for Pod simplest-udp3-k7sph-fbw7j: node \"\" not found","gsKey":"default/simplest-udp3-k7sph-fbw7j","message":"","queue":"agones.dev.GameServerController","severity":"error","source":"*gameservers.Controller","time":"2019-11-14T12:58:45.44618961Z"}
{"message":"error getting external address for GameServer simplest-udp3-k7sph-fbw7j: error retrieving node  for Pod simplest-udp3-k7sph-fbw7j: node \"\" not found","severity":"error","stack":["agones.dev/agones/pkg/gameservers.(*Controller).applyGameServerAddressAndPort\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:734","agones.dev/agones/pkg/gameservers.(*Controller).syncGameServerStartingState\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:714","agones.dev/agones/pkg/gameservers.(*Controller).syncGameServer\n\t/go/src/agones.dev/agones/pkg/gameservers/controller.go:376","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).processNextWorkItem\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:152","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).runWorker\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:128","k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133","k8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134","k8s.io/apimachinery/pkg/util/wait.Until\n\t/go/src/agones.dev/agones/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88","agones.dev/agones/pkg/util/workerqueue.(*WorkerQueue).run\n\t/go/src/agones.dev/agones/pkg/util/workerqueue/workerqueue.go:180","runtime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1337"],"time":"2019-11-14T12:58:45.446330568Z"}
{"gsKey":"default/simplest-udp3-k7sph-fbw7j","message":"Processing","queue":"agones.dev.GameServerControllerCreation","severity":"info","source":"*gameservers.Controller","subqueue":"creation","time":"2019-11-14T12:58:45.446849273Z"}
{"gsKey":"default/simplest-udp3-k7sph-fbw7j","message":"Synchronising","severity":"info","source":"*gameservers.Controller","time":"2019-11-14T12:58:45.446914065Z"}

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 14, 2019

I assume that we end up overcommitting nodes in Kubernetes cluster, I used standard make gcloud-test-cluster

One more option to reproduce similar situation is to edit the simple-udp/fleet.yaml, change fleet Replicas to 12 and change Limits and Requests to 1000m, kubectl describe fleet output:
https://gist.github.com/aLekSer/4b63633008b310763959430247af9e51

which would results in only 10 Ready replicas and nodes being overcommitted:

$ kubectl describe nodes 

ProviderID:                  gce://agones-[...]/us-west1-c/gke-test-cluster-e2e-default-78f2b1f5-8l1d
Non-terminated Pods:         (10 in total)
  Namespace                  Name                                                     CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                  ----                                                     ------------  ----------  ---------------  -------------  ---
  default                    simplest-udp4-hdtr7-5wlrm                                1030m (26%)   1 (25%)     64Mi (0%)        64Mi (0%)      7m29s
  default                    simplest-udp4-hdtr7-szl26                                1030m (26%)   1 (25%)     64Mi (0%)        64Mi (0%)      7m29s
  default                    simplest-udp4-hdtr7-wlrvp                                1030m (26%)   1 (25%)     64Mi (0%)        64Mi (0%)      7m29s
  kube-system                event-exporter-v0.2.4-5f7d5d7dd4-g9d2x                   0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h23m
  kube-system                fluentd-gcp-scaler-5b5ff6f8bd-8gjqw                      0 (0%)        0 (0%)      0 (0%)           0 (0%)         3h23m
  kube-system                fluentd-gcp-v3.2.0-94fqr                                 100m (2%)     1 (25%)     200Mi (1%)       500Mi (4%)     3h22m
  kube-system                kube-dns-67947d6c68-999cb                                260m (6%)     0 (0%)      110Mi (0%)       170Mi (1%)     3h23m
  kube-system                kube-dns-autoscaler-76fcd5f658-hzt24                     20m (0%)      0 (0%)      10Mi (0%)        0 (0%)         3h23m
  kube-system                kube-proxy-gke-test-cluster-e2e-default-78f2b1f5-8l1d    100m (2%)     0 (0%)      0 (0%)           0 (0%)         3h23m
  kube-system                prometheus-to-sd-d9h24                                   1m (0%)       3m (0%)     20Mi (0%)        20Mi (0%)      3h23m
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource                   Requests     Limits
  --------                   --------     ------
  cpu                        3571m (91%)  4003m (102%)
  memory                     532Mi (4%)   882Mi (7%)
  ephemeral-storage          0 (0%)       0 (0%)
  attachable-volumes-gce-pd  0            0
Events:                      <none>

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 14, 2019

From GCloud console these 2 pods (out of 12) are unschedulable:

0/6 nodes are available: 2 node(s) had taints that the pod didn't tolerate, 4 Insufficient cpu.

In similar case when we are configuring the Fleet updating the tag (this subj) we have a bigger issue that pods become terminated and new are started and Memory consumption of the Agones Controller becomes to raise linearly.

@markmandel
Copy link
Member

What is the resultant pod config that is trying and failing to start?

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

@markmandel There are two cases I mentioned above:

  1. The fleet.yaml to reproduce the issue as in description:
apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: simple-udp
spec:
  replicas: 2
  template:
    spec:
      ports:
      - name: default
        containerPort: 7654
      template:
        spec:
          containers:
          - name: simple-udp
            image: gcr.io/agones-images/udp-server:0.17
            resources:
              requests:
                memory: "64Mi"
                cpu: "1000m"
              limits:
                memory: "64Mi"
                cpu: "1000m"

And then updating next line in a Ready (2 of 2 GS is Ready) fleet to udp-server:0.16

            image: gcr.io/agones-images/udp-server:0.16

would produce an infinite loop of GSSets creations and terminations occurs:
https://gist.github.com/aLekSer/07d667c4d239538b3e615b3f16802c5c

  Normal  CreatingGameServerSet  7m50s (x4 over 7m51s)     fleet-controller  (combined from similar events): Created GameServerSet simple-udp-cpsb2
  Normal  DeletingGameServerSet  3m44s (x1063 over 7m50s)  fleet-controller  (combined from similar events): Deleting inactive GameServerSet simple-udp-p7qhm
  1. If we create a fleet same as above but with replicas: 12 only 10 GameServers would be created because of overcommited nodes.

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

My conclusion of an issue root cause is that we have Packed scheduling and that's why new GSSet is trying to be created on the same Node.
If we divide 10/4 number of nodes we would get only about 3 Pods with 1 CPU (1000m CPU) request limit could be created without overcommiting the nodes.
I will add more details soon.

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

Changing scheduling parameter from Packed to Distributed does not help.
Strange enough why Fleet Controller perform scaling inactive GSS from 2 to 1:

  Normal  CreatingGameServerSet  103s               fleet-controller  Created GameServerSet simple-udp-7nf4v
  Normal  ScalingGameServerSet   81s                fleet-controller  Scaling inactive GameServerSet simple-udp-7nf4v from 2 to 1
  Normal  CreatingGameServerSet  81s                fleet-controller  Created GameServerSet simple-udp-tb6qf
  Normal  DeletingGameServerSet  81s                fleet-controller  Deleting inactive GameServerSet simple-udp-tb6qf
  Normal  CreatingGameServerSet  81s                fleet-controller  Created GameServerSet simple-udp-znb62
  Normal  DeletingGameServerSet  81s                fleet-controller  Deleting inactive GameServerSet simple-udp-znb62

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

One more detail about the problem that actually we are creating more GameServerSet that they should be ( should be 2 at most). I assume we should create one, but we have (according to GS prefix) 6 at a time:

$ kubectl get gs
NAME                     STATE      ADDRESS   PORT   NODE   AGE
simple-udp-2nvd8-dkqqk   Creating                           5s
simple-udp-78mxt-rpcgb   Creating                           4s
simple-udp-bxjx7-8q2sb   Creating                           6s
simple-udp-vrvlm-gxlgf   Creating                           3s
simple-udp-x46vs-c6lmf   Creating                           7s
simple-udp-zw9xb-8dk4t   Creating                           8s

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

Bug reproduces with 1 Replicas in a Fleet and 1000m CPU Requests and Limits.
But not reproduced on 1 Replicas in a Fleet with 999m .

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

It might be related to kubernetes/kubernetes#66450
More precisely this comment contains values causing an issue:
kubernetes/kubernetes#66450 (comment)
Because I have noticed that when we use 1000m and apply without changing anything second time we have:

$ kubectl apply -f ../examples/simple-udp/fleet.yaml
fleet.agones.dev/simple-udp created
$ kubectl apply -f ../examples/simple-udp/fleet.yaml
fleet.agones.dev/simple-udp configured

And we trap into situation with inifinite creating GSSets.
Changing CPU limits to equivalent of cpu: "1" resolves an issue.

Equivalent full Containers spec without an issue:

          containers:
          - name: simple-udp
            image: gcr.io/agones-images/udp-server:0.16
            resources:
              requests:
                memory: "64Mi"
                cpu: "1"
              limits:
                memory: "64Mi"
                cpu: "1"

@topochan
Copy link
Contributor Author

It might be related to kubernetes/kubernetes#66450
More precisely this comment contains values causing an issue:
kubernetes/kubernetes#66450 (comment)
Because I have noticed that when we use 1000m and apply without changing anything second time we have:

$ kubectl apply -f ../examples/simple-udp/fleet.yaml
fleet.agones.dev/simple-udp created
$ kubectl apply -f ../examples/simple-udp/fleet.yaml
fleet.agones.dev/simple-udp configured

And we trap into situation with inifinite creating GSSets.
Changing CPU limits to equivalent of cpu: "1" resolves an issue.

Equivalent full Containers spec without an issue:

          containers:
          - name: simple-udp
            image: gcr.io/agones-images/udp-server:0.16
            resources:
              requests:
                memory: "64Mi"
                cpu: "1"
              limits:
                memory: "64Mi"
                cpu: "1"

This is exactly the issue, funny enough when is created "1000m" became "1" cpu but not when we update. checking kubernetes/kubernetes#66450 and the comment you linked looks like that is the same behaviour, the question if this weird state of the pods is related to kubernetes or agones (probably kubernetes due the limits are in the template section for pods).

@markmandel
Copy link
Member

I was going to ask this next - glad to see it works. That is a weird one.

We've started testing on 1.13, and are planning on moving to it in the next release -- do we know if this also happens on 1.13?

(I'm also assuming we are all using the 1.12 version of kubectl at well?)

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 15, 2019

I was testing on the most recent master today, which uses 1.13:

kubectl version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.11-dispatcher", GitCommit:"2e298c7e992f83f47af60cf4830b11c7370f6668", GitTreeState:"clean", BuildDate:"2019-09-19T22:20:12Z", GoVersion:"go1.11.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.12-gke.8", GitCommit:"39fe6cf6b77a3a0f620bd89db92f5133be67aa91", GitTreeState:"clean", BuildDate:"2019-11-07T19:14:31Z", GoVersion:"go1.12.11b4", Compiler:"gc", Platform:"linux/amd64"}

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 16, 2019

First: the issue itself is about changing the scale of Requests by Kubernetes. I have added additional debug and find out that in a Fleet Controller that we receive different strings but equal vallue in filterGameServerSetByActive():

gsSet -> Limits:map[cpu:{i:{value:1 scale:0} d:{Dec:<nil>} s:1 Format:DecimalSI} 
fleet -> Limits:map[cpu:{i:{value:1000 scale:-3} d:{Dec:<nil>} s: Format:DecimalSI}

So we need to use resource Cmp() function:
https://godoc.org/k8s.io/apimachinery/pkg/api/resource#Quantity.Cmp
Started draft version of this bug fix here:
https://github.com/aLekSer/agones/tree/fix-1000m-cpu-limit

@aLekSer
Copy link
Collaborator

aLekSer commented Nov 17, 2019

Tested my proposed solution with changing deepequal to more accurate equal function works.
And we can set 1000m and Fleet work as it should, without infinite loop and creating 5 GameServerSets per a Fleet.

@markmandel markmandel added this to the 1.2.0 milestone Dec 4, 2019
@markmandel markmandel added the area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc label Dec 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc kind/bug These are bugs.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants