Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantise reports #2305

Merged
merged 1 commit into from
Mar 8, 2017
Merged

quantise reports #2305

merged 1 commit into from
Mar 8, 2017

Conversation

rade
Copy link
Member

@rade rade commented Mar 6, 2017

Merge all reports received within a specified interval, discarding the originals. This improves performance of Report() on repeated invocation since it ends up merging fewer reports.

For example, if reports are received every second (e.g. 3 probes reporting at the default probe.publishInterval of 3s), and the app.window is 15s (the default) and the report generation interval is 5s (e.g. one UI accessing every TOPOLOGY_INTERVAL; the default), then the original code would have to merge 15 reports per UI access, whereas this code will only need to merge 8 to 9 reports (3-4 merged reports from previous invocation plus 5 recently received reports).

Merge all reports received within a specified interval, discarding the
originals. This improves performance of Report() on repeated
invocation since it ends up merging fewer reports.

For example, if reports are received every second (e.g. 3 probes
reporting at the default probe.publishInterval of 3s), and the
app.windows is 15s (the default) and the report generation interval is
5s (e.g. one UI accessing every TOPOLOGY_INTERVAL; the default), then
the original code would have to merge 15 reports per UI access,
whereas this code will only need to merge 8 to 9 reports (3-4 merged
reports from previous invocation plus 5 recently received reports).
@rade
Copy link
Member Author

rade commented Mar 6, 2017

Some benchmarks...

I fed 60s worth of Weave Cloud production reports, captured by @awh, into scope running on my laptop with

./prog/scope --mode=app --weave=false --app.collector=file:///home/matthias/tmp/reports

and then ran

for i in $(seq 1 100); do echo $i $(date); curl -o /dev/null -s http://localhost:4040/api/topology; done

Taking the difference between the timestamp of first report after it's 30s in (to counteract any startup artifacts), and the timestamp on the 21st report after that.

I.e. I was measuring how long it takes to produce 20 reports.

master: 94s (195s with GOMAXPROCS=1)
branch: 73s (140s with GOMAXPROCS=1)

So this is approximately a 25% improvements. Note however that

  • this test does not perform an report decoding; that's only done once on startup
  • the Weave Cloud cluster is larger than most user's reports
  • the improvement is only achieved for the standalone app; to get similar gains in the multi-tenant app using this technique would require session/instance affinity on the query service

Here are a couple of master/branch flame graphs
screenshot_2017-03-06_12-14-45
screenshot_2017-03-06_12-14-20

So the report merging is now taking a lot less time, which promotes the rendering to the top spot. Note that none of the optimisation techniques we've discussed so far apply to that code :(

@rade rade requested a review from 2opremio March 6, 2017 12:18
@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

So this is approximately a 25% improvements. Note however that

this test does not perform an report decoding; that's only done once on startup

I am concerned that this optimization will be negligible if we add the decoding in. Would you mind running this with decoding to see the overall improvement?

@rade
Copy link
Member Author

rade commented Mar 6, 2017

I am concerned that this optimization will be negligible if we add the decoding in.

It's got very few downsides to it.

Would you mind running this with decoding to see the overall improvement?

I did, but then discovered that this doesn't make sense.The test is asking for reports as fast as it can. Hence the report reading & decoding will always be contending with the merge & render for CPU. And as a result the reading & decoding is falling further and further behind the schedule dictated by the report timestamps. Which in turn means that report merging & rendering is process less than it should.

So atm I cannot think of a good benchmark that would include both report parsing and merge/render.

@2opremio
Copy link
Contributor

2opremio commented Mar 6, 2017

So atm I cannot think of a good benchmark that would include both report parsing and merge/render.

Deploying it in dev and profiling the standalone app (not the the service) would be a good test.

@rade
Copy link
Member Author

rade commented Mar 6, 2017

Deploying it in dev and profiling the standalone app (not the the service) would be a good test.

That is a reasonable suggestion but goes beyond what I can achieve in a sensible amount of time, since the app isn't deployable with flux. Btw, instead of profiling, i would just look a the CPU usage collected by cortex, comparing before/after while having a browser window pointing at the app.

Note that dev deals with far less data than prod, so the benefits of this PR will be less pronounced.

@rade
Copy link
Member Author

rade commented Mar 7, 2017

I've run it locally with

$ for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s http://localhost:4040/api/topology; sleep 5; done

And average the duration of 10 requests. This gave me 6.28s on master and 5.68s on the branch.

The 5 second sleep in the above gives it just enough time to not fall behind on processing reports. But note that this results in a report merge & render interval more than double that of the UI. Which in turn means the performance gain is underestimated.

Flamegraphs on this don't really show much difference - unsurprisingly report decoding and gc is now taking up a huge chunk.

before:
screenshot_2017-03-07_19-09-07
after:
screenshot_2017-03-07_19-09-46

@2opremio
Copy link
Contributor

2opremio commented Mar 7, 2017 via email

@rade rade merged commit 69c8082 into master Mar 8, 2017
@rade
Copy link
Member Author

rade commented Mar 8, 2017

This has made quite a big difference to dev:
screenshot_2017-03-08_06-44-08

The initial period is when the app was just sitting there, prior to the upgrade. The first large spike is when I used 'curl' in a loop to fetch reports (more on that below). The high plateau afterwards is when I was using the scope UI. Then comes the upgrade, while using the scope UI, followed by a short period when I wasn't using the UI.

So driving the UI now takes only 1.5 cores instead of 3.5. That is strange though because previously it took 2.5 cores just to sit there, w/o the UI, and the optimisations in this PR should make no difference to that. So there's probably something else going on, e.g. perhaps the app is getting slower over time. It's also using only half the memory now.

Another test I ran was to 'curl' for reports, after grabbing the cookie from my browser:

for i in $(seq 1 100); do echo $i $(date); time curl -o /dev/null -s --cookie "_weave_scope_session=<elided>" https://frontend.dev.weave.works/admin/scope/api/topology; done

I averaged the execution time of 10 of these. Before:3.1s. After:1.9s.

@rade
Copy link
Member Author

rade commented Mar 8, 2017

Improvement in prod is from 4 cores down to 3, and average topology query latency from 6.5s to 5.2s.

@2opremio
Copy link
Contributor

2opremio commented Mar 8, 2017

That's fantastic, I wouldn't had expected this.

@rade rade deleted the quantise-reports branch July 5, 2017 13:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants