Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

App using lots of CPU #854

Closed
tomwilkie opened this issue Jan 24, 2016 · 13 comments
Closed

App using lots of CPU #854

tomwilkie opened this issue Jan 24, 2016 · 13 comments
Assignees
Milestone

Comments

@tomwilkie
Copy link
Contributor

No description provided.

@tomwilkie
Copy link
Contributor Author

I think this can be closed now; its the probe using CPU.

@2opremio
Copy link
Contributor

While testing the 0.12 candidate ( d0d6cac) with the ECS demo I see that the app in one of the three hosts is consuming ~50% CPU with peaks of ~75% CPU. The other apps are running with less than 10% CPU consumption.

The app showing the problem is the only one whose UI I had opened in my browser

It doesn't happen systematically. I managed to reproduce it once yesterday and spent the rest of the day retrying without luck, until I saw it happening today again.

Here are two CPU profiles:

http://filebin.ca/2V8B4tkAgwWP/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.samples.cpu.001.pb.gz
http://filebin.ca/2V8Bmv9bJ5Xf/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.samples.cpu.002.pb.gz

And here's the png from one of them:

profile

Which suggests it's the GC.

Here's the memory consumption profile: http://filebin.ca/2V8F07Fc8QZr/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.inuse_objects.inuse_space.001.pb.gz

memory_profile

Note: This is happening in a node whose Scope probe is experiencing a bad memory leak (#881), at the time I took the measurements, the Probe was consuming ~176MB, which may be related

@2opremio
Copy link
Contributor

2opremio commented Feb 4, 2016

This is the profile of the App (2ed3968) obtained at 70% CPU while monitoring the 5 probes in the scope.weave.works service: http://filebin.ca/2VkNcTTMcyob/pprof.localhost_4040.samples.cpu.001.pb.gz

app

$ go tool pprof ~/home-vagrant/scope-tested pprof.localhost_4040.samples.cpu.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list app.RegisterReportPostHandler.func1
Total: 18.57s
ROUTINE ======================== github.com/weaveworks/scope/app.RegisterReportPostHandler.func1 in /go/src/github.com/weaveworks/scope/app/router.go
         0     16.01s (flat, cum) 86.21% of Total
         .          .    122:                   rpt    report.Report
         .          .    123:                   reader = r.Body
         .          .    124:                   err    error
         .          .    125:           )
         .          .    126:           if strings.Contains(r.Header.Get("Content-Encoding"), "gzip") {
         .       40ms    127:                   reader, err = gzip.NewReader(r.Body)
         .          .    128:                   if err != nil {
         .          .    129:                           http.Error(w, err.Error(), http.StatusBadRequest)
         .          .    130:                           return
         .          .    131:                   }
         .          .    132:           }
         .          .    133:
         .          .    134:           decoder := gob.NewDecoder(reader).Decode
         .          .    135:           if strings.HasPrefix(r.Header.Get("Content-Type"), "application/json") {
         .          .    136:                   decoder = json.NewDecoder(reader).Decode
         .          .    137:           }
         .     15.97s    138:           if err := decoder(&rpt); err != nil {
         .          .    139:                   http.Error(w, err.Error(), http.StatusBadRequest)
         .          .    140:                   return
         .          .    141:           }
         .          .    142:           a.Add(rpt)
         .          .    143:           if len(rpt.Pod.Nodes) > 0 {
(pprof) 

Same behavior as above. After having a closer look at the profile, the app spends most of the time:

  1. Decoding gobs, allocating memory and garbage collecting.
  2. Compiling gob decoders!

I am not super familiar with the App, I guess we moved to gobs for efficiency reasons.

Also, I am totally new to the gob encoding (so the following may be stupid and wrong) but I think it's optimized for continuous streamed communication and not individual REST calls. From http://blog.golang.org/gobs-of-data:

The first time you send a given type, the gob package includes in the data stream a description of that type. In fact, what happens is that the encoder is used to encode, in the standard gob encoding format, an internal struct that describes the type and gives it a unique number
After the type is described, it can be referenced by its type number.

Here's what I think is happening: the type information is included in every report, causing the gob decoder (and maybe the encoder too, I still have to check the probe profile) to be created (compiled) every single time an http request arrives.

So I would suggest to either:

  1. Move the report endpoint from http into a websocket
  2. Move away from gob into an encoding better suited for http
  3. Somehow cache the decoder compilation, but (understandably), encoding/gob doesn't seem to offer a way to do this. Maybe we can use do some trick with Gobencoder/Gobdecoder

If we take (2) maybe we could use a decoder which used Pools of Reports, also fixing the GC problem.

@2opremio
Copy link
Contributor

2opremio commented Feb 4, 2016

Maybe there's even a 4th, quick and hacky option:

(4). Create a reader which concats all the http bodies for a cached decoder. But mixing this with gobs from different probes may be problematic or even make it useless due to its state.

@2opremio
Copy link
Contributor

2opremio commented Feb 4, 2016

This seems to confirm my theory, I should probably had started there :):

https://golang.org/pkg/encoding/gob/

The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

@tomwilkie
Copy link
Contributor Author

I'd suggest trying Json first. Then websocket second. Your right, gob is a bad fit for this.

@2opremio
Copy link
Contributor

2opremio commented Feb 4, 2016

Using json ( #916 ) improves the App's situation considerably without impacting the probes. The app is now at 30% CPU with peaks of 40%. However, the decoding completely dominates the execution time:

pprof.localhost:4040.samples.cpu.001.pb.gz

app_json

Plus (and this is an important point I didn't mention) this all happens without even opening the UI in a browser. When doing this it gets peak of ~70% CPU.

I could try websockets+gob next but I think we should just aim at a better-suited codec instead of compromising on REST

@2opremio
Copy link
Contributor

2opremio commented Feb 4, 2016

Using ffjson without code generation ( see #916 ) helps a bit. The app is now down to between 25% with peaks of 35% CPU.

app_ffjson

And it also seems to help when connecting the UI, when it gets at 60% CPU.

I guess this already starts to make Scope usable with the service, but I think we still want to use a better performing decoder. I guess I will try code generation with ffjson or a different one.

@2opremio
Copy link
Contributor

Apart from optimizing the codecs, @bboreham made a fair point that we need to reduce the report sizes/transfer rates to the apps. I measured the size of the (uncompressed) gob reports and they are around 1MB each

<app> DEBU: 2016/02/11 11:09:34.664072 Decoded report with uncompressed size: 883188 bytes
<app> DEBU: 2016/02/11 11:09:38.115087 Decoded report with uncompressed size: 1025674 bytes
<app> DEBU: 2016/02/11 11:09:38.342998 Decoded report with uncompressed size: 935714 bytes
<app> DEBU: 2016/02/11 11:09:38.426127 Decoded report with uncompressed size: 1031151 bytes
<app> DEBU: 2016/02/11 11:09:38.815101 Decoded report with uncompressed size: 888347 bytes
<app> DEBU: 2016/02/11 11:09:41.632514 Decoded report with uncompressed size: 1039347 bytes

Probes send regular reports once every 3 seconds (and we also have asynchronous reports once every 10 seconds from the kubernetes and docker reporters) so the data processing requirements of the app get very quickly out of hand.

@2opremio
Copy link
Contributor

After the codec improvements here's the CPU profile of the app running in the service, with two probes attached and with its UI connected to a browser.

pprof.localhost:4040.samples.cpu.001.pb.gz

app

The main bottlenecks are similar to the ones in the probe (#812 (comment)):

@2opremio
Copy link
Contributor

Here's the --alloc_objects heap profile for the app (d744278) running in the service, connected to 4 probes and with a UI attached:

pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz

app_obj_allocs

@2opremio
Copy link
Contributor

After #1000 the codec is not generating almost any garbage compared to the immutable datastructures, copies and merge operations. The GC is clearly the bottleneck.

Note how runtime.scanobject accounts to almost ~50% of the CPU time (scanobject+mallocgc) and how ps.Cons+ps.Set generate almost 30% of the garbage.

Also worth noting how ps.Foreach takes 10% of the CPU due to merges.

CPU profile:

pprof.localhost:4043.samples.cpu.004.pb.gz

app

Object allocation profile:

pprof.localhost:4043.alloc_objects.alloc_space.001.pb.gz

app_object_alloc

Note how cons generates 20% of the garbage.

@2opremio
Copy link
Contributor

Closing in favor of #1010

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants