-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify CPU Request for the SDK Server Sidecar #390
Specify CPU Request for the SDK Server Sidecar #390
Conversation
/cc @KamiShepard |
Build Succeeded 👏 Build Id: 345e8292-9184-47ea-92eb-707caf13f672 The following development artifacts have been built, and will exist for the next 30 days:
(experimental) To install this version:
|
If it stopa working when you set the limit to 20ms it means that at runtime it has cpu usage spikes that top that value. As the scheduling is done by requested usage and not by limit, my concern is that with bin packing the servers will be packed tighter than they should and these sidecar cpu spikes will impact the performance/functionality of game servers |
Looking at the graphs, it never gets anywhere close to 20m/0.02 (the max is ~ 0.003) 🤷♂️ but maybe it's being sampled? Looking at the limits description:
Let's see if I can get this math right.
For requests:
Looking at
That being said - this is also why I allowed both to configured, in case this is all totally wrong, it can be adjusted as necessary (and in fact, I'd definitely like this to be configured with real, proper data and evidence to be sure) 20m mostly worked -- 30m was more consistently good -- at least on the CPUs I was running in my cluster. WDYT? |
The sidecar is normally Idle, it only does stuff at discrete events. This is why looking at the average cpu usage is misleading. I dont think we should put a default limit though, as if we're wrong (not all cloud cpus are equal) it will break it. But I think that the default value for any parameter of a service should err on the safe side. Its better to have a robust service that can be optimized. Therefore I suggest to go with a default closer to the observed consistent limit, 20-30 ms. |
That sounds quite reasonable to me. Agreed that we should err on the side of robustness. Also, in the long term, if we find our numbers are too big, I don't think people will complain about shrinking them down. Pushing them up may cause some concerns. So to confirm:
Question: I'm thinking about adding a "Advanced" doc on CPU (and memory?) limits and requests - both for this, and for general gameservers. Should this be part of this PR? @KamiShepard - any thoughts on the above? |
This provides the mechanism (and defaults) for being able to set both the CPU request, and CPU limits for the SDK Server `GameServer` sidecar. I've only set the Request level, as it seems that the major issue is not CPU usage, but actually how the scheduler allots space for the sidecar (by default 100/0.1 vCPU is alloted to each container. After discussion, the CPU request has been set to 30m, but is also configurable via the helm chart. I've not set a CPU limit, as I found when setting a low (<= 20m) CPU limit on the sidecar it mostly stopped working. But if people want to experiment with this, it is also configurable via the Helm chart. Closes googleforgames#344
0b46fb4
to
c6f714a
Compare
Not hearing any objections - so updated to 30m on the cpu request, and also wrote an advanced doc on cpu and memory limiting. |
Build Succeeded 👏 Build Id: f00eddfe-0ab6-496f-ab81-28c5c3bfa461 The following development artifacts have been built, and will exist for the next 30 days:
(experimental) To install this version:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This provides the mechanism (and defaults) for being able to set both the CPU request, and CPU limits for the SDK Server
GameServer
sidecar.I've only set the Request level, as it seems that the major issue is not CPU usage, but actually how the scheduler allots space for the sidecar (by default 100/0.1 vCPU is alloted to each container.
I've set the default request level to be 5m/0.005 vCPU -- while this is above what load tests have shown, I wanted to be conservative. Also, the controls exist to tweak this value yourself via the Helm chart.
I've not set a CPU limit, as I found when setting a low (<= 20m) CPU limit on the sidecar it mostly stopped working. But if people want to experiment with this, it is also configurable via the Helm chart.
Closes #344