-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Probe using 70% CPU #812
Comments
I restarted the probes with profiling ( |
At a cursory analysis, the probes are spending:
I suspect fixing the copying will also fix the GC, but I can’t think of anything we can do to ‘fix’ the proc cpu usage, as this has already been heavily optimised (we only read the tcp files for net namespaces we haven't seen before). I guess we can decrease the polling frequency / proc spy interval, and maybe even come up with some rate limiting scheme (ie only read 10/s, which would get us ~constant cpu usage, and at larger scales would means the UI updates less frequently). |
@2opremio tried that suggestion ( |
The situation improves considerably after #898 . Now the probes (2ed3968) are using 30-40% CPU with peaks of 50% Here go two probe profiles I have obtained from two different probes: pprof.localhost_4041.samples.cpu.001.pb.gz pprof.localhost_4041.samples.cpu.002.pb.gz
@paulbellamy Any ideas on how improve this? SelectorFromSet seems to be prohibitively expensive, mostly due to the regular expression matching (maybe we should fix that):
|
We made the gob encoding once-per-report a while back (it used to be How many probes are there? How many apps? I suspect we might just be Other things you highlight also look good:
Thanks for working on this fons! |
Thanks, I didn't think about the (apps * probes) scenario. This use-case involves multiple probes (5) reporting to a single app (which should be the most common one ... if we were not doing (apps*probes) in standalone mode). However, take into account that, with websockets, the type-sending part of the encoding (aparently the most expensive one) will only be done once per app. With http, it's done once per report. My intuition says that with gob, until certain number (N) of reports, it will be cheaper to encode a report multiple types with existing encoder than encoding it only once with a new encoder each time. Either way, the fact that we need to make clear-cut performance choices for the gob codec between ...
... is another sign that gob is a bad fit due to it's stateful nature. We should be probably be using a codec which lets you optimize globally instead of per-stream.
5 probes 1 app. Yep, splitting the app would help performance-wise, but it would make Scope unusable on Kubernetes since the user normally doesn't have a saying about where the apps end up. Using a tree-structure with intermediate report-merging nodes would also do but it may probe tricky to deploy. Either way, I would say 5 probes for a single app should be more than reasonable.
No problem, thanks for the feedback The rate limiting from #912 helps quite a bit Now the profile is dominated by garbage collection and the kubernetes problem. |
After cherry-picking @paul's https://github.com/weaveworks/scope/tree/k8s-efficiency on top of the rate limiting, the CPU consumption is between 25% and 35%: Now the profile is completely dominated by encoding, copying and garbage collection. I am going to work on the generated json encoding, clean and get #912 in and then see how we can improve the garbage collection dominance if it's still there. |
After the codec improvements, the probes (a38db2e) still consume ~20% CPU in the service. pprof.localhost:4041.samples.cpu.001.pb.gz The main two places where to improve are:
|
@tomwilkie @paulbellamy @peterbourgon Any ideas on how to reduce or optimize the copying? |
We can save a ton of allocs and CPU time by leveraging sync.Pool. For sure we can use pools of []byte or *bytes.Buffer when we serialize reports (and deserialize on the other end). That's easy and may get us some quick wins. Beyond that, it'd be great to use pools for the reports themselves, i.e. don't allocate a new report and fill it, but fetch one from the pool, reset it, and write into it. But, we'd need implement a Reset method, and I'm not sure how simple that would be given all of our nested maps. This also assumes that Resetting a map will maintain its allocated capacity; I think that's true but without a Other thoughts? |
It's a bit unclear to me why copying is taking that much CPU/making that much garbage. Copy should be almost a no-op. Unless there is stuff left which is being copied? It looks a bit from the graph like Metrics? Alternatively we could ditch the whole immutability/copying thing and rethink how the pipeline works. Go doesn't exactly make working with immutable datastructures easy/elegant anyway. |
👍 to this. The current report structure and pipeline architecture was designed for ease of use and expediency. If it's no longer easy to use (i.e. maintain and mutate with changing product requirements) and it's a performance bottleneck with no obvious fix, it's absolutely appropriate to revisit the overall design. |
AFAIK the new codec library already uses pools. However, I believe that the fromIntermediate/toIntermediate methods used for custom serialization (e.g. the ones needed for mndrix/ps like LatestMap) may be expensive
This sounds more like it! |
Unfortunately, after a bit of research and discussion with the community, I don't think the current report structure is going to lend itself to Pooling very well. In the end it's a collection of maps, and maps only 'reset' in linear time, i.e. for _, topo := range report.Topologies() {
for k := range topo {
delete(topo, k)
}
} It's worth a profile, of course, but the cost of reset increases as the map gets bigger, making it less and less appealing vs. GC. My intuition is that if we want to Pool reports, we need to get rid of most if not all of the maps, in favor of slices or something else — IMO worth considering together with @paulbellamy's bag-of-nodes approach to the pipelines. |
Actually, after having a closer look at the profile, one of the nodes (17.76%) spent in the codec doesn't come from Encoding but Decoding, so it can't originate from the reports being sent to the app. 10% of that time is spent reading http bodies, so the decoding itself is only 7%. Docker/kubernetes info? It's hard to tell because apparently it's being done in a separate goroutine. I'll investigate but we may be pooling something way too often. If needed, to accelerate the decoding of that information we can always generate |
Also, to get a better sense of where we can reduce garbage generation, I just found out that pprof lets you obtain heap profiles with |
Yep, it's docker stats being gathered:
And I don't think it was done in a separate goroutine, pprof just omitted the parents in the tree. |
go-dockerclient (and docker API) has a streaming stats thing that might reduce this. |
Nah, there code is basically identical. On Friday, 19 February 2016, Paul Bellamy notifications@github.com wrote:
|
Here's the |
After #1000, the probes are running stably under 20% (whilst kubelet 1.1 is running >=70% in all nodes) and the garbage generated is tolerable. However, the CPU profile: pprof.localhost:4041.samples.cpu.002.pb.gz Objects allocations profile: |
After making another measurement and looking at the syscalls ( |
@2opremio reports the probe is using 70% CPU on the service hosts.
The text was updated successfully, but these errors were encountered: