Specify CPU Request for the SDK Server Sidecar #390

markmandel · 2018-10-18T00:15:24Z

This provides the mechanism (and defaults) for being able to set both the CPU request, and CPU limits for the SDK Server GameServer sidecar.

I've only set the Request level, as it seems that the major issue is not CPU usage, but actually how the scheduler allots space for the sidecar (by default 100/0.1 vCPU is alloted to each container.

I've set the default request level to be 5m/0.005 vCPU -- while this is above what load tests have shown, I wanted to be conservative. Also, the controls exist to tweak this value yourself via the Helm chart.

I've not set a CPU limit, as I found when setting a low (<= 20m) CPU limit on the sidecar it mostly stopped working. But if people want to experiment with this, it is also configurable via the Helm chart.

Closes #344

markmandel · 2018-10-18T00:15:55Z

/cc @KamiShepard

agones-bot · 2018-10-18T00:28:00Z

Build Succeeded 👏

Build Id: 345e8292-9184-47ea-92eb-707caf13f672

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.6.0-0b46fb4
image: gcr.io/agones-images/agones-sdk:0.6.0-0b46fb4
Linux C++ SDK (build: dev): agonessdk-0.6.0-0b46fb4-dev-linux-arch_64.tar.gz
Linux C++ SDK (build: runtime): agonessdk-0.6.0-0b46fb4-runtime-linux-arch_64.tar.gz
Linux C++ SDK (source): agonessdk-0.6.0-0b46fb4-src.zip
SDK Server: agonessdk-server-0.6.0-0b46fb4.zip

(experimental) To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/390/head:pr_390 && git checkout pr_390
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.6.0-0b46fb4

victor-prodan · 2018-10-18T04:25:03Z

If it stopa working when you set the limit to 20ms it means that at runtime it has cpu usage spikes that top that value.

As the scheduling is done by requested usage and not by limit, my concern is that with bin packing the servers will be packed tighter than they should and these sidecar cpu spikes will impact the performance/functionality of game servers

markmandel · 2018-10-18T05:13:18Z

Looking at the graphs, it never gets anywhere close to 20m/0.02 (the max is ~ 0.003) 🤷‍♂️ but maybe it's being sampled?

Looking at the limits description:

The spec.containers[].resources.limits.cpu is converted to its millicore value and multiplied by 100. The resulting value is the total amount of CPU time that a container can use every 100ms. A container cannot use more than its share of CPU time during this interval.

Let's see if I can get this math right.

20m = 0.02
0.02 * 100 = 2
So, the container gets 2ms every 100ms, which is only 20ms per 1s.
Maybe this is actually too little to get things done? (But you likely have a better idea of lower level cpu cycles than I do)

For requests:

The spec.containers[].resources.requests.cpu is converted to its core value, which is potentially fractional, and multiplied by 1024. The greater of this number or 2 is used as the value of the --cpu-shares flag in the docker run command.

Looking at cpu-shares, this is a CPU limiter, but:

The proportion will only apply when CPU-intensive processes are running. When tasks in one container are idle, other containers can use the left-over CPU time. The actual amount of CPU time will vary depending on the number of containers running on the system.

That being said - this is also why I allowed both to configured, in case this is all totally wrong, it can be adjusted as necessary (and in fact, I'd definitely like this to be configured with real, proper data and evidence to be sure)

20m mostly worked -- 30m was more consistently good -- at least on the CPUs I was running in my cluster.

WDYT?

victor-prodan · 2018-10-18T06:25:06Z

The sidecar is normally Idle, it only does stuff at discrete events. This is why looking at the average cpu usage is misleading.

I dont think we should put a default limit though, as if we're wrong (not all cloud cpus are equal) it will break it.

But I think that the default value for any parameter of a service should err on the safe side. Its better to have a robust service that can be optimized.

Therefore I suggest to go with a default closer to the observed consistent limit, 20-30 ms.

markmandel · 2018-10-18T16:47:02Z

That sounds quite reasonable to me. Agreed that we should err on the side of robustness.

Also, in the long term, if we find our numbers are too big, I don't think people will complain about shrinking them down. Pushing them up may cause some concerns.

So to confirm:

We all agree, no default hard imit
We set a request level of 30m.

Question: I'm thinking about adding a "Advanced" doc on CPU (and memory?) limits and requests - both for this, and for general gameservers. Should this be part of this PR?

@KamiShepard - any thoughts on the above?

This provides the mechanism (and defaults) for being able to set both the CPU request, and CPU limits for the SDK Server `GameServer` sidecar. I've only set the Request level, as it seems that the major issue is not CPU usage, but actually how the scheduler allots space for the sidecar (by default 100/0.1 vCPU is alloted to each container. After discussion, the CPU request has been set to 30m, but is also configurable via the helm chart. I've not set a CPU limit, as I found when setting a low (<= 20m) CPU limit on the sidecar it mostly stopped working. But if people want to experiment with this, it is also configurable via the Helm chart. Closes googleforgames#344

markmandel · 2018-10-19T20:49:00Z

Not hearing any objections - so updated to 30m on the cpu request, and also wrote an advanced doc on cpu and memory limiting.

agones-bot · 2018-10-19T20:59:00Z

Build Succeeded 👏

Build Id: f00eddfe-0ab6-496f-ab81-28c5c3bfa461

The following development artifacts have been built, and will exist for the next 30 days:

image: gcr.io/agones-images/agones-controller:0.6.0-c6f714a
image: gcr.io/agones-images/agones-sdk:0.6.0-c6f714a
Linux C++ SDK (build: dev): agonessdk-0.6.0-c6f714a-dev-linux-arch_64.tar.gz
Linux C++ SDK (build: runtime): agonessdk-0.6.0-c6f714a-runtime-linux-arch_64.tar.gz
Linux C++ SDK (source): agonessdk-0.6.0-c6f714a-src.zip
SDK Server: agonessdk-server-0.6.0-c6f714a.zip

(experimental) To install this version:

git fetch https://github.com/GoogleCloudPlatform/agones.git pull/390/head:pr_390 && git checkout pr_390
helm install install/helm/agones --namespace agones-system --name agones --set agones.image.tag=0.6.0-c6f714a

cyriltovena

LGTM

markmandel added kind/feature New features for Agones area/user-experience Pertaining to developers trying to use Agones, e.g. SDK, installation, etc labels Oct 18, 2018

markmandel added this to the 0.6.0 milestone Oct 18, 2018

markmandel force-pushed the feature/sidecar-limit-cpu branch from 0b46fb4 to c6f714a Compare October 19, 2018 20:46

cyriltovena approved these changes Oct 22, 2018

View reviewed changes

markmandel merged commit c30c70c into googleforgames:master Oct 22, 2018

markmandel deleted the feature/sidecar-limit-cpu branch October 22, 2018 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify CPU Request for the SDK Server Sidecar #390

Specify CPU Request for the SDK Server Sidecar #390

markmandel commented Oct 18, 2018

markmandel commented Oct 18, 2018

agones-bot commented Oct 18, 2018

victor-prodan commented Oct 18, 2018

markmandel commented Oct 18, 2018 •

edited

Loading

victor-prodan commented Oct 18, 2018

markmandel commented Oct 18, 2018

markmandel commented Oct 19, 2018

agones-bot commented Oct 19, 2018

cyriltovena left a comment

Specify CPU Request for the SDK Server Sidecar #390

Specify CPU Request for the SDK Server Sidecar #390

Conversation

markmandel commented Oct 18, 2018

markmandel commented Oct 18, 2018

agones-bot commented Oct 18, 2018

victor-prodan commented Oct 18, 2018

markmandel commented Oct 18, 2018 • edited Loading

victor-prodan commented Oct 18, 2018

markmandel commented Oct 18, 2018

markmandel commented Oct 19, 2018

agones-bot commented Oct 19, 2018

cyriltovena left a comment

Choose a reason for hiding this comment

markmandel commented Oct 18, 2018 •

edited

Loading