quantise reports #2305

rade · 2017-03-06T11:06:49Z

Merge all reports received within a specified interval, discarding the originals. This improves performance of Report() on repeated invocation since it ends up merging fewer reports.

For example, if reports are received every second (e.g. 3 probes reporting at the default probe.publishInterval of 3s), and the app.window is 15s (the default) and the report generation interval is 5s (e.g. one UI accessing every TOPOLOGY_INTERVAL; the default), then the original code would have to merge 15 reports per UI access, whereas this code will only need to merge 8 to 9 reports (3-4 merged reports from previous invocation plus 5 recently received reports).

Merge all reports received within a specified interval, discarding the originals. This improves performance of Report() on repeated invocation since it ends up merging fewer reports. For example, if reports are received every second (e.g. 3 probes reporting at the default probe.publishInterval of 3s), and the app.windows is 15s (the default) and the report generation interval is 5s (e.g. one UI accessing every TOPOLOGY_INTERVAL; the default), then the original code would have to merge 15 reports per UI access, whereas this code will only need to merge 8 to 9 reports (3-4 merged reports from previous invocation plus 5 recently received reports).

rade · 2017-03-06T12:18:06Z

Some benchmarks...

I fed 60s worth of Weave Cloud production reports, captured by @awh, into scope running on my laptop with

./prog/scope --mode=app --weave=false --app.collector=file:///home/matthias/tmp/reports

and then ran

for i in $(seq 1 100); do echo $i $(date); curl -o /dev/null -s http://localhost:4040/api/topology; done

Taking the difference between the timestamp of first report after it's 30s in (to counteract any startup artifacts), and the timestamp on the 21st report after that.

I.e. I was measuring how long it takes to produce 20 reports.

master: 94s (195s with GOMAXPROCS=1)
branch: 73s (140s with GOMAXPROCS=1)

So this is approximately a 25% improvements. Note however that

this test does not perform an report decoding; that's only done once on startup
the Weave Cloud cluster is larger than most user's reports
the improvement is only achieved for the standalone app; to get similar gains in the multi-tenant app using this technique would require session/instance affinity on the query service

Here are a couple of master/branch flame graphs

So the report merging is now taking a lot less time, which promotes the rendering to the top spot. Note that none of the optimisation techniques we've discussed so far apply to that code :(

2opremio · 2017-03-06T12:42:47Z

So this is approximately a 25% improvements. Note however that

this test does not perform an report decoding; that's only done once on startup

I am concerned that this optimization will be negligible if we add the decoding in. Would you mind running this with decoding to see the overall improvement?

rade · 2017-03-06T16:28:02Z

I am concerned that this optimization will be negligible if we add the decoding in.

It's got very few downsides to it.

Would you mind running this with decoding to see the overall improvement?

I did, but then discovered that this doesn't make sense.The test is asking for reports as fast as it can. Hence the report reading & decoding will always be contending with the merge & render for CPU. And as a result the reading & decoding is falling further and further behind the schedule dictated by the report timestamps. Which in turn means that report merging & rendering is process less than it should.

So atm I cannot think of a good benchmark that would include both report parsing and merge/render.

2opremio · 2017-03-06T16:43:56Z

So atm I cannot think of a good benchmark that would include both report parsing and merge/render.

Deploying it in dev and profiling the standalone app (not the the service) would be a good test.

rade · 2017-03-06T23:57:19Z

Deploying it in dev and profiling the standalone app (not the the service) would be a good test.

That is a reasonable suggestion but goes beyond what I can achieve in a sensible amount of time, since the app isn't deployable with flux. Btw, instead of profiling, i would just look a the CPU usage collected by cortex, comparing before/after while having a browser window pointing at the app.

Note that dev deals with far less data than prod, so the benefits of this PR will be less pronounced.

rade · 2017-03-07T19:16:54Z

I've run it locally with

$ for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s http://localhost:4040/api/topology; sleep 5; done

And average the duration of 10 requests. This gave me 6.28s on master and 5.68s on the branch.

The 5 second sleep in the above gives it just enough time to not fall behind on processing reports. But note that this results in a report merge & render interval more than double that of the UI. Which in turn means the performance gain is underestimated.

Flamegraphs on this don't really show much difference - unsurprisingly report decoding and gc is now taking up a huge chunk.

before:

after:

2opremio · 2017-03-07T21:20:15Z

Thanks for running the test. LGTM

…

On Mar 7, 2017 20:17, "Matthias Radestock" ***@***.***> wrote: I've run it locally with $ for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s http://localhost:4040/api/topology; sleep 5; done And average the duration of 10 requests. This gave me 6.28s on master and 5.68s on the branch. The 5 second sleep in the above gives it just enough time to not fall behind on processing reports. But note that this results in a report merge & render interval more than double that of the UI. Which in turn means the performance gain is underestimated. Flamegraphs on this don't really show much difference - unsurprisingly report decoding and gc is now taking up a huge chunk. before: [image: screenshot_2017-03-07_19-09-07] <https://cloud.githubusercontent.com/assets/109109/23673477/588da384-036a-11e7-8349-8a7967312e6e.png> after: [image: screenshot_2017-03-07_19-09-46] <https://cloud.githubusercontent.com/assets/109109/23673481/5ed2b41e-036a-11e7-9f1e-388ad12320a5.png> — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#2305 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACQOJGdyF6yr0oXF3Jkzow4k_DcrZW6gks5rja0ogaJpZM4MUAl8> .

rade · 2017-03-08T06:53:00Z

This has made quite a big difference to dev:

The initial period is when the app was just sitting there, prior to the upgrade. The first large spike is when I used 'curl' in a loop to fetch reports (more on that below). The high plateau afterwards is when I was using the scope UI. Then comes the upgrade, while using the scope UI, followed by a short period when I wasn't using the UI.

So driving the UI now takes only 1.5 cores instead of 3.5. That is strange though because previously it took 2.5 cores just to sit there, w/o the UI, and the optimisations in this PR should make no difference to that. So there's probably something else going on, e.g. perhaps the app is getting slower over time. It's also using only half the memory now.

Another test I ran was to 'curl' for reports, after grabbing the cookie from my browser:

for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s --cookie "_weave_scope_session=<elided>" https://frontend.dev.weave.works/admin/scope/api/topology; done

I averaged the execution time of 10 of these. Before:3.1s. After:1.9s.

rade · 2017-03-08T12:08:39Z

Improvement in prod is from 4 cores down to 3, and average topology query latency from 6.5s to 5.2s.

2opremio · 2017-03-08T12:52:00Z

That's fantastic, I wouldn't had expected this.

rade requested a review from 2opremio March 6, 2017 12:18

rade merged commit 69c8082 into master Mar 8, 2017

rade deleted the quantise-reports branch July 5, 2017 13:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantise reports #2305

quantise reports #2305

rade commented Mar 6, 2017

rade commented Mar 6, 2017

2opremio commented Mar 6, 2017 •

edited

Loading

rade commented Mar 6, 2017

2opremio commented Mar 6, 2017

rade commented Mar 6, 2017

rade commented Mar 7, 2017

2opremio commented Mar 7, 2017 via email

rade commented Mar 8, 2017

rade commented Mar 8, 2017

2opremio commented Mar 8, 2017

quantise reports #2305

quantise reports #2305

Conversation

rade commented Mar 6, 2017

rade commented Mar 6, 2017

2opremio commented Mar 6, 2017 • edited Loading

rade commented Mar 6, 2017

2opremio commented Mar 6, 2017

rade commented Mar 6, 2017

rade commented Mar 7, 2017

2opremio commented Mar 7, 2017 via email

rade commented Mar 8, 2017

rade commented Mar 8, 2017

2opremio commented Mar 8, 2017

2opremio commented Mar 6, 2017 •

edited

Loading