App using lots of CPU #854

tomwilkie · 2016-01-24T17:24:42Z

No description provided.

tomwilkie · 2016-01-28T03:33:15Z

I think this can be closed now; its the probe using CPU.

2opremio · 2016-01-29T13:47:44Z

While testing the 0.12 candidate ( d0d6cac) with the ECS demo I see that the app in one of the three hosts is consuming ~50% CPU with peaks of ~75% CPU. The other apps are running with less than 10% CPU consumption.

The app showing the problem is the only one whose UI I had opened in my browser

It doesn't happen systematically. I managed to reproduce it once yesterday and spent the rest of the day retrying without luck, until I saw it happening today again.

Here are two CPU profiles:

http://filebin.ca/2V8B4tkAgwWP/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.samples.cpu.001.pb.gz
http://filebin.ca/2V8Bmv9bJ5Xf/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.samples.cpu.002.pb.gz

And here's the png from one of them:

Which suggests it's the GC.

Here's the memory consumption profile: http://filebin.ca/2V8F07Fc8QZr/pprof.ec2-52-28-149-93.eu-central-1.compute.amazonaws.com4040.inuse_objects.inuse_space.001.pb.gz

Note: This is happening in a node whose Scope probe is experiencing a bad memory leak (#881), at the time I took the measurements, the Probe was consuming ~176MB, which may be related

2opremio · 2016-02-04T01:11:02Z

This is the profile of the App (2ed3968) obtained at 70% CPU while monitoring the 5 probes in the scope.weave.works service: http://filebin.ca/2VkNcTTMcyob/pprof.localhost_4040.samples.cpu.001.pb.gz

$ go tool pprof ~/home-vagrant/scope-tested pprof.localhost_4040.samples.cpu.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list app.RegisterReportPostHandler.func1
Total: 18.57s
ROUTINE ======================== github.com/weaveworks/scope/app.RegisterReportPostHandler.func1 in /go/src/github.com/weaveworks/scope/app/router.go
         0     16.01s (flat, cum) 86.21% of Total
         .          .    122:                   rpt    report.Report
         .          .    123:                   reader = r.Body
         .          .    124:                   err    error
         .          .    125:           )
         .          .    126:           if strings.Contains(r.Header.Get("Content-Encoding"), "gzip") {
         .       40ms    127:                   reader, err = gzip.NewReader(r.Body)
         .          .    128:                   if err != nil {
         .          .    129:                           http.Error(w, err.Error(), http.StatusBadRequest)
         .          .    130:                           return
         .          .    131:                   }
         .          .    132:           }
         .          .    133:
         .          .    134:           decoder := gob.NewDecoder(reader).Decode
         .          .    135:           if strings.HasPrefix(r.Header.Get("Content-Type"), "application/json") {
         .          .    136:                   decoder = json.NewDecoder(reader).Decode
         .          .    137:           }
         .     15.97s    138:           if err := decoder(&rpt); err != nil {
         .          .    139:                   http.Error(w, err.Error(), http.StatusBadRequest)
         .          .    140:                   return
         .          .    141:           }
         .          .    142:           a.Add(rpt)
         .          .    143:           if len(rpt.Pod.Nodes) > 0 {
(pprof)

Same behavior as above. After having a closer look at the profile, the app spends most of the time:

Decoding gobs, allocating memory and garbage collecting.
Compiling gob decoders!

I am not super familiar with the App, I guess we moved to gobs for efficiency reasons.

Also, I am totally new to the gob encoding (so the following may be stupid and wrong) but I think it's optimized for continuous streamed communication and not individual REST calls. From http://blog.golang.org/gobs-of-data:

The first time you send a given type, the gob package includes in the data stream a description of that type. In fact, what happens is that the encoder is used to encode, in the standard gob encoding format, an internal struct that describes the type and gives it a unique number
After the type is described, it can be referenced by its type number.

Here's what I think is happening: the type information is included in every report, causing the gob decoder (and maybe the encoder too, I still have to check the probe profile) to be created (compiled) every single time an http request arrives.

So I would suggest to either:

Move the report endpoint from http into a websocket
Move away from gob into an encoding better suited for http
Somehow cache the decoder compilation, but (understandably), encoding/gob doesn't seem to offer a way to do this. Maybe we can use do some trick with Gobencoder/Gobdecoder

If we take (2) maybe we could use a decoder which used Pools of Reports, also fixing the GC problem.

2opremio · 2016-02-04T01:52:38Z

Maybe there's even a 4th, quick and hacky option:

(4). Create a reader which concats all the http bodies for a cached decoder. But mixing this with gobs from different probes may be problematic or even make it useless due to its state.

2opremio · 2016-02-04T01:55:45Z

This seems to confirm my theory, I should probably had started there :):

https://golang.org/pkg/encoding/gob/

The implementation compiles a custom codec for each data type in the stream and is most efficient when a single Encoder is used to transmit a stream of values, amortizing the cost of compilation.

tomwilkie · 2016-02-04T07:14:39Z

I'd suggest trying Json first. Then websocket second. Your right, gob is a bad fit for this.

2opremio · 2016-02-04T15:50:12Z

Using json ( #916 ) improves the App's situation considerably without impacting the probes. The app is now at 30% CPU with peaks of 40%. However, the decoding completely dominates the execution time:

pprof.localhost:4040.samples.cpu.001.pb.gz

Plus (and this is an important point I didn't mention) this all happens without even opening the UI in a browser. When doing this it gets peak of ~70% CPU.

I could try websockets+gob next but I think we should just aim at a better-suited codec instead of compromising on REST

2opremio · 2016-02-04T17:17:37Z

Using ffjson without code generation ( see #916 ) helps a bit. The app is now down to between 25% with peaks of 35% CPU.

And it also seems to help when connecting the UI, when it gets at 60% CPU.

I guess this already starts to make Scope usable with the service, but I think we still want to use a better performing decoder. I guess I will try code generation with ffjson or a different one.

2opremio · 2016-02-11T20:32:52Z

Apart from optimizing the codecs, @bboreham made a fair point that we need to reduce the report sizes/transfer rates to the apps. I measured the size of the (uncompressed) gob reports and they are around 1MB each

<app> DEBU: 2016/02/11 11:09:34.664072 Decoded report with uncompressed size: 883188 bytes
<app> DEBU: 2016/02/11 11:09:38.115087 Decoded report with uncompressed size: 1025674 bytes
<app> DEBU: 2016/02/11 11:09:38.342998 Decoded report with uncompressed size: 935714 bytes
<app> DEBU: 2016/02/11 11:09:38.426127 Decoded report with uncompressed size: 1031151 bytes
<app> DEBU: 2016/02/11 11:09:38.815101 Decoded report with uncompressed size: 888347 bytes
<app> DEBU: 2016/02/11 11:09:41.632514 Decoded report with uncompressed size: 1039347 bytes

Probes send regular reports once every 3 seconds (and we also have asynchronous reports once every 10 seconds from the kubernetes and docker reporters) so the data processing requirements of the app get very quickly out of hand.

2opremio · 2016-02-18T18:24:33Z

After the codec improvements here's the CPU profile of the app running in the service, with two probes attached and with its UI connected to a browser.

pprof.localhost:4040.samples.cpu.001.pb.gz

The main bottlenecks are similar to the ones in the probe (#812 (comment)):

Copying/Merging which causes a lot of garbage
Decoding ( Send incremental reports probe->app #985 should improve this)

2opremio · 2016-02-19T18:31:08Z

Here's the --alloc_objects heap profile for the app (d744278) running in the service, connected to 4 probes and with a UI attached:

pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz

2opremio · 2016-02-22T16:18:56Z

After #1000 the codec is not generating almost any garbage compared to the immutable datastructures, copies and merge operations. The GC is clearly the bottleneck.

Note how runtime.scanobject accounts to almost ~50% of the CPU time (scanobject+mallocgc) and how ps.Cons+ps.Set generate almost 30% of the garbage.

Also worth noting how ps.Foreach takes 10% of the CPU due to merges.

CPU profile:

pprof.localhost:4043.samples.cpu.004.pb.gz

Object allocation profile:

pprof.localhost:4043.alloc_objects.alloc_space.001.pb.gz

Note how cons generates 20% of the garbage.

2opremio · 2016-02-23T13:03:29Z

Closing in favor of #1010

tomwilkie added this to the 0.12.0 milestone Jan 24, 2016

This was referenced Jan 24, 2016

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

Merged

Use ps.Map for Counters and Sets, remove Metadata in favour of Latest. #838

Merged

paulbellamy closed this as completed in #851 Jan 27, 2016

paulbellamy reopened this Jan 27, 2016

This was referenced Jan 27, 2016

use immutability for the NodeSet #858

Merged

Make render.RenderableNodes immutable #862

Closed

tomwilkie closed this as completed Jan 28, 2016

2opremio mentioned this issue Jan 29, 2016

Release 0.12.0 #865

Merged

2opremio reopened this Jan 29, 2016

tomwilkie modified the milestones: 0.13.0, 0.12.0 Feb 1, 2016

2opremio mentioned this issue Feb 1, 2016

Nodes appear and disappear intermittently #897

Closed

2opremio mentioned this issue Feb 4, 2016

Probe using 70% CPU #812

Closed

2opremio self-assigned this Feb 5, 2016

This was referenced Feb 9, 2016

Improve codec performance #916

Merged

[WIP] Transmit reports using gob with websockets ... #957

Closed

2opremio mentioned this issue Feb 18, 2016

Send incremental reports probe->app #985

Closed

2opremio mentioned this issue Feb 19, 2016

codec.(*simpleIoEncWriterWriter).WriteByte generating a lot of garbage ugorji/go#143

Closed

2opremio mentioned this issue Feb 22, 2016

Reduce amount of objects allocated by the codec #1000

Merged

2opremio mentioned this issue Feb 23, 2016

Reduce Garbage Collection pressure #1010

Closed

2opremio closed this as completed Feb 23, 2016

2opremio mentioned this issue May 10, 2016

High CPU consumption in the app #1457

Open

2opremio mentioned this issue Jul 26, 2016

Improve performance of immutable maps #1720

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

App using lots of CPU #854

App using lots of CPU #854

tomwilkie commented Jan 24, 2016

tomwilkie commented Jan 28, 2016

2opremio commented Jan 29, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

tomwilkie commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 11, 2016

2opremio commented Feb 18, 2016

2opremio commented Feb 19, 2016

2opremio commented Feb 22, 2016

2opremio commented Feb 23, 2016

App using lots of CPU #854

App using lots of CPU #854

Comments

tomwilkie commented Jan 24, 2016

tomwilkie commented Jan 28, 2016

2opremio commented Jan 29, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

tomwilkie commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 4, 2016

2opremio commented Feb 11, 2016

2opremio commented Feb 18, 2016

2opremio commented Feb 19, 2016

2opremio commented Feb 22, 2016

2opremio commented Feb 23, 2016