Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

Merged
merged 7 commits into from
Jan 27, 2016

Conversation

tomwilkie
Copy link
Contributor

Fixes #854

The idea is that to render hosts, you have to render processes, containers, pods etc. All of these in turn render processes. As the rendering pipeline is deterministic, ie rendering the same report will give you the same result, we can cache intermediate steps. I introduce a random report ID and use reflection to get the address of the renderer, and use these two as a key for the cache.

Results in an 750% increase in performance of the host renderer, although this test might not be fair...

Before:

Toms-MacBook-Pro:render twilkie$ go test -bench . -- -bench-report-file report.json 
PASS
BenchmarkEndpointRender-8                      10000        133761 ns/op       52781 B/op        736 allocs/op
BenchmarkEndpointStats-8                    100000000           18.9 ns/op         0 B/op          0 allocs/op
BenchmarkProcessRender-8                        5000        275635 ns/op      114848 B/op       1427 allocs/op
BenchmarkProcessStats-8                     20000000            65.9 ns/op         0 B/op          0 allocs/op
BenchmarkProcessWithContainerNameRender-8       5000        324229 ns/op      131566 B/op       1612 allocs/op
BenchmarkProcessWithContainerNameStats-8    20000000            75.3 ns/op         0 B/op          0 allocs/op
BenchmarkProcessNameRender-8                    3000        367554 ns/op      146392 B/op       1726 allocs/op
BenchmarkProcessNameStats-8                 100000000           19.2 ns/op         0 B/op          0 allocs/op
BenchmarkContainerRender-8                      2000        521885 ns/op      213154 B/op       2540 allocs/op
BenchmarkContainerStats-8                       3000        433637 ns/op      176031 B/op       2255 allocs/op
BenchmarkContainerWithImageNameRender-8         2000        554605 ns/op      228969 B/op       2707 allocs/op
BenchmarkContainerWithImageNameStats-8          3000        432879 ns/op      176003 B/op       2255 allocs/op
BenchmarkContainerImageRender-8                 2000        686163 ns/op      291427 B/op       3150 allocs/op
BenchmarkContainerImageStats-8              100000000           18.8 ns/op         0 B/op          0 allocs/op
BenchmarkContainerHostnameRender-8              2000        532886 ns/op      220428 B/op       2634 allocs/op
BenchmarkContainerHostnameStats-8           100000000           18.8 ns/op         0 B/op          0 allocs/op
BenchmarkHostRender-8                           2000        879954 ns/op      373453 B/op       4044 allocs/op
BenchmarkHostStats-8                        10000000           126 ns/op           0 B/op          0 allocs/op
BenchmarkPodRender-8                            2000        646618 ns/op      268650 B/op       3008 allocs/op
BenchmarkPodStats-8                         100000000           18.7 ns/op         0 B/op          0 allocs/op
BenchmarkPodServiceRender-8                     2000        773566 ns/op      326606 B/op       3418 allocs/op
BenchmarkPodServiceStats-8                  100000000           18.7 ns/op         0 B/op          0 allocs/op
ok      github.com/weaveworks/scope/render  34.022s

After:

Toms-MacBook-Pro:render twilkie$ go test -bench . -- -bench-report-file report.json 
PASS
BenchmarkEndpointRender-8                      20000         74163 ns/op       29659 B/op        377 allocs/op
BenchmarkEndpointStats-8                    100000000           10.1 ns/op         0 B/op          0 allocs/op
BenchmarkProcessRender-8                       30000         44996 ns/op       22172 B/op        204 allocs/op
BenchmarkProcessStats-8                     30000000            39.1 ns/op         0 B/op          0 allocs/op
BenchmarkProcessWithContainerNameRender-8      20000         72938 ns/op       31491 B/op        319 allocs/op
BenchmarkProcessWithContainerNameStats-8    20000000            50.2 ns/op         0 B/op          0 allocs/op
BenchmarkProcessNameRender-8                  100000         16413 ns/op        7403 B/op         76 allocs/op
BenchmarkProcessNameStats-8                 100000000           10.2 ns/op         0 B/op          0 allocs/op
BenchmarkContainerRender-8                     30000         41609 ns/op       21403 B/op        135 allocs/op
BenchmarkContainerStats-8                     100000         10727 ns/op        2969 B/op         54 allocs/op
BenchmarkContainerWithImageNameRender-8        20000         70158 ns/op       32224 B/op        236 allocs/op
BenchmarkContainerWithImageNameStats-8        100000         10807 ns/op        2969 B/op         54 allocs/op
BenchmarkContainerImageRender-8               100000         12298 ns/op        5514 B/op         62 allocs/op
BenchmarkContainerImageStats-8              100000000           10.5 ns/op         0 B/op          0 allocs/op
BenchmarkContainerHostnameRender-8            200000          8771 ns/op        3657 B/op         50 allocs/op
BenchmarkContainerHostnameStats-8           100000000           10.2 ns/op         0 B/op          0 allocs/op
BenchmarkHostRender-8                          10000        116569 ns/op       59922 B/op        327 allocs/op
BenchmarkHostStats-8                        20000000            80.0 ns/op         0 B/op          0 allocs/op
BenchmarkPodRender-8                          100000         12679 ns/op        5514 B/op         62 allocs/op
BenchmarkPodStats-8                         100000000           10.0 ns/op         0 B/op          0 allocs/op
BenchmarkPodServiceRender-8                   100000         11310 ns/op        4634 B/op         59 allocs/op
BenchmarkPodServiceStats-8                  100000000           10.3 ns/op         0 B/op          0 allocs/op
ok      github.com/weaveworks/scope/render  31.782s

@tomwilkie tomwilkie changed the title Memoise & cache the result of renderers, so we don't recalculate view… Memoise & cache the result of renderers, so we don't recalculate views multiple times. Jan 22, 2016
@tomwilkie
Copy link
Contributor Author

This is going to conflict with #838 as I copied the DeepEqual implementation from there. Put #838 in first, I don't mind rebasing this.

@paulbellamy
Copy link
Contributor

Would passing around a *report.Report allow us to use the pointer as the cache key and skip the report.Report.ID thing?

@paulbellamy
Copy link
Contributor

Overall, a lot better (cleaner and less complicated), than I was expecting. I think this could work.

Would probably be good to add a render.PurgeCache() func, for the tests and benchmarks.

@tomwilkie
Copy link
Contributor Author

Roger. Could do report pointers, I wasn't keen on the id stuff myself. Will
try it out.

On Monday, 25 January 2016, Paul Bellamy notifications@github.com wrote:

Overall, a lot better (cleaner and less complicated), than I was
expecting. I think this could work.

Would probably be good to add a render.PurgeCache() func, for the tests
and benchmarks.


Reply to this email directly or view it on GitHub
#851 (comment).

@paulbellamy
Copy link
Contributor

A couple potential suggestions, but with a render.PurgeCache() called in the benchmarks/tests, it LGTM.

@paulbellamy paulbellamy assigned tomwilkie and unassigned paulbellamy Jan 25, 2016
@tomwilkie
Copy link
Contributor Author

After adding the PurgeCache method between benchmark runs, it becomes obvious this isn't much of a performance increase (613703 -> 582766 ns/op, so within the margin). On inspection, this is because there is actually very little work done twice by the host renderer, or any renderer.

However this does make a huge difference to the calls to /api/topology (from 7197010 -> 1447692 ns/op, or 5x faster). As we hit this every second, this results in a small reduction in CPU usage for the scope container on my VM running the example app.

Still too high.

@paulbellamy
Copy link
Contributor

Still too high.

I disagree. 1-8 ms to generate all of the reports seems entirely reasonable. If you're saying that the CPU usage is too high, then I agree, but that is probably a different problem/solution.

@paulbellamy
Copy link
Contributor

Actually, this should mean that we only have to run each renderer each time a probe report comes in, so we're at most running them 8ms/15seconds or 0.05% of the time. So, this should have a pretty strong impact on CPU usage, in spite of not showing it on the benchmarks.

@tomwilkie
Copy link
Contributor Author

If you're saying that the CPU usage is too high, then I agree

Yes thats what I meant. I see the app using mostly 3% CPU, occasionally spiking to 20%.

Actually, this should mean that we only have to run each renderer each time a probe report comes in, so we're at most running them 8ms/15seconds or 0.05% of the time.

This won't happen as every http request creates a new report (with a new ID) so won't hit the cache - ie the cache is only used for things renderer multiple times in the same request. We could improve the collector such that the same report is used across requests?

@paulbellamy
Copy link
Contributor

We could [have the collector cache the last report until a new one comes in]

👍

@tomwilkie
Copy link
Contributor Author

Instead of caching the result (as the invalidation logic is complication wrt to timing them out everything the oldest report is older than 15s) I made the ID of the generated report a hash of the IDs of the input reports, which should have the same effect. CPU usage for the scope-app, using the example app, was normally 0%, then went from 3% to 12%.

So I don't think it had the effect I wanted, but at this level its hard to measure (we need a better way of measure CPU usage - maybe from a 30s profile)?

I'm going to try caching the report now, as that will save some merge cost, but I'm not convinced its worth it - once you have more that 3 probes, you'll have to generate a new report each time.

@tomwilkie
Copy link
Contributor Author

Caching them also didn't make a huge different, but the app uses 0% CPU for a bunch of time, and occasionally spikes to 10ish. I think this is fine for the app now.

@paulbellamy
Copy link
Contributor

Why not murmur3.New128?

paulbellamy added a commit that referenced this pull request Jan 27, 2016
Memoise & cache the result of renderers, so we don't recalculate views multiple times.
@paulbellamy paulbellamy merged commit 1943ad4 into master Jan 27, 2016
@paulbellamy paulbellamy deleted the memoise-and-cache branch January 27, 2016 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants