Assess impact of watch cache on large production cluster (5k+ namespaces) #16112

smarterclayton · 2017-09-01T18:50:13Z

On a large dense cluster memory use of the api server is of primary concern (for scale) due to the watch cache keeping copies of all objects in memory. We conducted a test on a cluster to assess the impact of disabling the watch cache.

Scenario:

HA cluster, masters running OpenShift 3.6 (Kube 1.6+openshift patches). ~170 nodes 5k namespaces
Disabled watch cache on the API server of the master that the active controller was talking to
Measured results via prometheus on that master
Restart was at 17:00 UTC

CPU over time for that master (red is apiserver, blue is etcd)

RSS for that master (purple is apiserver, brown is controller)

Internal go heap stats for all masters

No observed change to tail latency. Still measuring impact to CPU vs number of small watches

smarterclayton · 2017-09-01T21:09:46Z

The watch cache benefits the api server by avoiding decodes from etcd (CPU+garbage), and some by collapsing common watches. It functions best when N x C x W is high - (N = number of resources queried pre filtering, C is number of clients, and W is write rate).

Endpoints and pods are high write, high N, and high clients. Nodes are high write, small N (mostly) and small C. Most other resources are either low write or low client, where the memory use is a wash.

I think the recommendation should be that on dense clusters only resources that hit N x C x W high should have watch cache on. On small clusters the same settings can be used safely. On very large node count clusters, any node accessed resource may want the watch cache enabled.

smarterclayton · 2017-09-01T21:29:04Z

Ran 1k simultaneous endpoint watches (ramping up over 2 minutes), where endpoints are changing at a rate of 22 writes a second). Observed memory increase, but did not stay consistently high. No CPU impact was observed.

A 40-60% reduction in master memory with only a small amount of CPU increase for disabling the watch cache for the majority of resources seems a desirable tradeoff.

smarterclayton · 2017-09-01T21:29:12Z

@openshift/sig-scalability

jeremyeder · 2017-09-05T11:39:50Z

What about other resources? Disk I/O, network, and sockets? How might this affect performance in a partitioned scenario?

smarterclayton · 2017-09-05T12:56:11Z

Network was the same, sockets was the same, disk/io needs to be measured. We use HTTP/2 to backend etcd so it's all multiplexed.

Define partitioned.

jeremyeder · 2017-09-05T13:32:50Z

By that I meant ... any implications that differ in a single master vs multi-master scenario.

I see your resource graphs but ... what were the latencies all along the histogram (not just tail)?

I ask because you are saying that the watch cache for (most) resources buys us nothing until we have a certain scale, and you're optimizing for reduced memory usage in the common case. As long as there is supporting data, this makes sense to me. It would be good to have the data supporting this included in the BZ here.

The side-effect of this proposed change is needing to document how to identify when a cluster would benefit from more watch caches, how to enable them (and which to enable), and how to calculate memory requirements for enabling watch caches down the line.

smarterclayton · 2017-09-05T20:10:03Z

There was no change in latency tails. We're going to try the whole cluster tomorrow and will gather that. I don't expect an IO change but will look.

smarterclayton · 2017-09-06T16:28:49Z

Whole cluster conversion, RSS for masters on core services

Whole cluster converted, in use heap on api-servers

initial state - one master (blue) has watch cache disabled
14:40 - restarted masters to clear loaded things in heap
15:02 - restarted second master (non-active-controller) to disable watch cache
15:10 - restarted third master (active-controller) to disable watch cache

Master CPU over longer period (core services)

No change to etcd write volume.

smarterclayton · 2017-09-06T16:30:03Z

99th percentile latency did not change significantly.

Automatic merge from submit-queue (batch tested with PRs 16546, 16398, 16157) Backport upstream changes to watch cache enablement Disables the watch cache for most resources by default, except those accessed by many clients. This has been shown to have minor impacts on the production workload. Fixes #16112

openshift-ci-robot added the sig/scalability label Sep 1, 2017

smarterclayton mentioned this issue Sep 1, 2017

Duplicate Kubernetes watch caches, also some caches should be considered disabled by default kubernetes/kubernetes#51825

Closed

pweil- assigned jeremyeder Sep 11, 2017

pweil- added area/performance priority/P2 component/restapi labels Sep 11, 2017

pweil- assigned mfojtik and jeremyeder and unassigned jeremyeder Sep 11, 2017

smarterclayton mentioned this issue Sep 17, 2017

Backport upstream changes to watch cache enablement #16398

Merged

openshift-merge-robot closed this as completed in #16398 Sep 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess impact of watch cache on large production cluster (5k+ namespaces) #16112

Assess impact of watch cache on large production cluster (5k+ namespaces) #16112

smarterclayton commented Sep 1, 2017 •

edited

Loading

smarterclayton commented Sep 1, 2017

smarterclayton commented Sep 1, 2017

smarterclayton commented Sep 1, 2017

jeremyeder commented Sep 5, 2017

smarterclayton commented Sep 5, 2017

jeremyeder commented Sep 5, 2017

smarterclayton commented Sep 5, 2017

smarterclayton commented Sep 6, 2017

smarterclayton commented Sep 6, 2017

Assess impact of watch cache on large production cluster (5k+ namespaces) #16112

Assess impact of watch cache on large production cluster (5k+ namespaces) #16112

Comments

smarterclayton commented Sep 1, 2017 • edited Loading

smarterclayton commented Sep 1, 2017

smarterclayton commented Sep 1, 2017

smarterclayton commented Sep 1, 2017

jeremyeder commented Sep 5, 2017

smarterclayton commented Sep 5, 2017

jeremyeder commented Sep 5, 2017

smarterclayton commented Sep 5, 2017

smarterclayton commented Sep 6, 2017

smarterclayton commented Sep 6, 2017

smarterclayton commented Sep 1, 2017 •

edited

Loading