-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
quantise reports #2305
quantise reports #2305
Conversation
Merge all reports received within a specified interval, discarding the originals. This improves performance of Report() on repeated invocation since it ends up merging fewer reports. For example, if reports are received every second (e.g. 3 probes reporting at the default probe.publishInterval of 3s), and the app.windows is 15s (the default) and the report generation interval is 5s (e.g. one UI accessing every TOPOLOGY_INTERVAL; the default), then the original code would have to merge 15 reports per UI access, whereas this code will only need to merge 8 to 9 reports (3-4 merged reports from previous invocation plus 5 recently received reports).
Some benchmarks... I fed 60s worth of Weave Cloud production reports, captured by @awh, into scope running on my laptop with
and then ran
Taking the difference between the timestamp of first report after it's 30s in (to counteract any startup artifacts), and the timestamp on the 21st report after that. I.e. I was measuring how long it takes to produce 20 reports. master: 94s (195s with GOMAXPROCS=1) So this is approximately a 25% improvements. Note however that
Here are a couple of master/branch flame graphs So the report merging is now taking a lot less time, which promotes the rendering to the top spot. Note that none of the optimisation techniques we've discussed so far apply to that code :( |
I am concerned that this optimization will be negligible if we add the decoding in. Would you mind running this with decoding to see the overall improvement? |
It's got very few downsides to it.
I did, but then discovered that this doesn't make sense.The test is asking for reports as fast as it can. Hence the report reading & decoding will always be contending with the merge & render for CPU. And as a result the reading & decoding is falling further and further behind the schedule dictated by the report timestamps. Which in turn means that report merging & rendering is process less than it should. So atm I cannot think of a good benchmark that would include both report parsing and merge/render. |
Deploying it in dev and profiling the standalone app (not the the service) would be a good test. |
That is a reasonable suggestion but goes beyond what I can achieve in a sensible amount of time, since the app isn't deployable with flux. Btw, instead of profiling, i would just look a the CPU usage collected by cortex, comparing before/after while having a browser window pointing at the app. Note that dev deals with far less data than prod, so the benefits of this PR will be less pronounced. |
I've run it locally with
And average the duration of 10 requests. This gave me 6.28s on master and 5.68s on the branch. The 5 second sleep in the above gives it just enough time to not fall behind on processing reports. But note that this results in a report merge & render interval more than double that of the UI. Which in turn means the performance gain is underestimated. Flamegraphs on this don't really show much difference - unsurprisingly report decoding and gc is now taking up a huge chunk. |
Thanks for running the test. LGTM
…On Mar 7, 2017 20:17, "Matthias Radestock" ***@***.***> wrote:
I've run it locally with
$ for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s http://localhost:4040/api/topology; sleep 5; done
And average the duration of 10 requests. This gave me 6.28s on master and
5.68s on the branch.
The 5 second sleep in the above gives it just enough time to not fall
behind on processing reports. But note that this results in a report merge
& render interval more than double that of the UI. Which in turn means the
performance gain is underestimated.
Flamegraphs on this don't really show much difference - unsurprisingly
report decoding and gc is now taking up a huge chunk.
before:
[image: screenshot_2017-03-07_19-09-07]
<https://cloud.githubusercontent.com/assets/109109/23673477/588da384-036a-11e7-8349-8a7967312e6e.png>
after:
[image: screenshot_2017-03-07_19-09-46]
<https://cloud.githubusercontent.com/assets/109109/23673481/5ed2b41e-036a-11e7-9f1e-388ad12320a5.png>
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#2305 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACQOJGdyF6yr0oXF3Jkzow4k_DcrZW6gks5rja0ogaJpZM4MUAl8>
.
|
Improvement in prod is from 4 cores down to 3, and average topology query latency from 6.5s to 5.2s. |
That's fantastic, I wouldn't had expected this. |
Merge all reports received within a specified interval, discarding the originals. This improves performance of
Report()
on repeated invocation since it ends up merging fewer reports.For example, if reports are received every second (e.g. 3 probes reporting at the default
probe.publishInterval
of 3s), and theapp.window
is 15s (the default) and the report generation interval is 5s (e.g. one UI accessing everyTOPOLOGY_INTERVAL
; the default), then the original code would have to merge 15 reports per UI access, whereas this code will only need to merge 8 to 9 reports (3-4 merged reports from previous invocation plus 5 recently received reports).