Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

tomwilkie · 2016-01-22T23:07:22Z

Fixes #854

The idea is that to render hosts, you have to render processes, containers, pods etc. All of these in turn render processes. As the rendering pipeline is deterministic, ie rendering the same report will give you the same result, we can cache intermediate steps. I introduce a random report ID and use reflection to get the address of the renderer, and use these two as a key for the cache.

Results in an 750% increase in performance of the host renderer, although this test might not be fair...

Before:

Toms-MacBook-Pro:render twilkie$ go test -bench . -- -bench-report-file report.json 
PASS
BenchmarkEndpointRender-8                      10000        133761 ns/op       52781 B/op        736 allocs/op
BenchmarkEndpointStats-8                    100000000           18.9 ns/op         0 B/op          0 allocs/op
BenchmarkProcessRender-8                        5000        275635 ns/op      114848 B/op       1427 allocs/op
BenchmarkProcessStats-8                     20000000            65.9 ns/op         0 B/op          0 allocs/op
BenchmarkProcessWithContainerNameRender-8       5000        324229 ns/op      131566 B/op       1612 allocs/op
BenchmarkProcessWithContainerNameStats-8    20000000            75.3 ns/op         0 B/op          0 allocs/op
BenchmarkProcessNameRender-8                    3000        367554 ns/op      146392 B/op       1726 allocs/op
BenchmarkProcessNameStats-8                 100000000           19.2 ns/op         0 B/op          0 allocs/op
BenchmarkContainerRender-8                      2000        521885 ns/op      213154 B/op       2540 allocs/op
BenchmarkContainerStats-8                       3000        433637 ns/op      176031 B/op       2255 allocs/op
BenchmarkContainerWithImageNameRender-8         2000        554605 ns/op      228969 B/op       2707 allocs/op
BenchmarkContainerWithImageNameStats-8          3000        432879 ns/op      176003 B/op       2255 allocs/op
BenchmarkContainerImageRender-8                 2000        686163 ns/op      291427 B/op       3150 allocs/op
BenchmarkContainerImageStats-8              100000000           18.8 ns/op         0 B/op          0 allocs/op
BenchmarkContainerHostnameRender-8              2000        532886 ns/op      220428 B/op       2634 allocs/op
BenchmarkContainerHostnameStats-8           100000000           18.8 ns/op         0 B/op          0 allocs/op
BenchmarkHostRender-8                           2000        879954 ns/op      373453 B/op       4044 allocs/op
BenchmarkHostStats-8                        10000000           126 ns/op           0 B/op          0 allocs/op
BenchmarkPodRender-8                            2000        646618 ns/op      268650 B/op       3008 allocs/op
BenchmarkPodStats-8                         100000000           18.7 ns/op         0 B/op          0 allocs/op
BenchmarkPodServiceRender-8                     2000        773566 ns/op      326606 B/op       3418 allocs/op
BenchmarkPodServiceStats-8                  100000000           18.7 ns/op         0 B/op          0 allocs/op
ok      github.com/weaveworks/scope/render  34.022s

After:

Toms-MacBook-Pro:render twilkie$ go test -bench . -- -bench-report-file report.json 
PASS
BenchmarkEndpointRender-8                      20000         74163 ns/op       29659 B/op        377 allocs/op
BenchmarkEndpointStats-8                    100000000           10.1 ns/op         0 B/op          0 allocs/op
BenchmarkProcessRender-8                       30000         44996 ns/op       22172 B/op        204 allocs/op
BenchmarkProcessStats-8                     30000000            39.1 ns/op         0 B/op          0 allocs/op
BenchmarkProcessWithContainerNameRender-8      20000         72938 ns/op       31491 B/op        319 allocs/op
BenchmarkProcessWithContainerNameStats-8    20000000            50.2 ns/op         0 B/op          0 allocs/op
BenchmarkProcessNameRender-8                  100000         16413 ns/op        7403 B/op         76 allocs/op
BenchmarkProcessNameStats-8                 100000000           10.2 ns/op         0 B/op          0 allocs/op
BenchmarkContainerRender-8                     30000         41609 ns/op       21403 B/op        135 allocs/op
BenchmarkContainerStats-8                     100000         10727 ns/op        2969 B/op         54 allocs/op
BenchmarkContainerWithImageNameRender-8        20000         70158 ns/op       32224 B/op        236 allocs/op
BenchmarkContainerWithImageNameStats-8        100000         10807 ns/op        2969 B/op         54 allocs/op
BenchmarkContainerImageRender-8               100000         12298 ns/op        5514 B/op         62 allocs/op
BenchmarkContainerImageStats-8              100000000           10.5 ns/op         0 B/op          0 allocs/op
BenchmarkContainerHostnameRender-8            200000          8771 ns/op        3657 B/op         50 allocs/op
BenchmarkContainerHostnameStats-8           100000000           10.2 ns/op         0 B/op          0 allocs/op
BenchmarkHostRender-8                          10000        116569 ns/op       59922 B/op        327 allocs/op
BenchmarkHostStats-8                        20000000            80.0 ns/op         0 B/op          0 allocs/op
BenchmarkPodRender-8                          100000         12679 ns/op        5514 B/op         62 allocs/op
BenchmarkPodStats-8                         100000000           10.0 ns/op         0 B/op          0 allocs/op
BenchmarkPodServiceRender-8                   100000         11310 ns/op        4634 B/op         59 allocs/op
BenchmarkPodServiceStats-8                  100000000           10.3 ns/op         0 B/op          0 allocs/op
ok      github.com/weaveworks/scope/render  31.782s

tomwilkie · 2016-01-23T20:20:00Z

This is going to conflict with #838 as I copied the DeepEqual implementation from there. Put #838 in first, I don't mind rebasing this.

paulbellamy · 2016-01-25T12:34:17Z

Would passing around a *report.Report allow us to use the pointer as the cache key and skip the report.Report.ID thing?

paulbellamy · 2016-01-25T12:35:52Z

Overall, a lot better (cleaner and less complicated), than I was expecting. I think this could work.

Would probably be good to add a render.PurgeCache() func, for the tests and benchmarks.

tomwilkie · 2016-01-25T16:03:20Z

Roger. Could do report pointers, I wasn't keen on the id stuff myself. Will
try it out.

On Monday, 25 January 2016, Paul Bellamy notifications@github.com wrote:

Overall, a lot better (cleaner and less complicated), than I was
expecting. I think this could work.

Would probably be good to add a render.PurgeCache() func, for the tests
and benchmarks.

—
Reply to this email directly or view it on GitHub
#851 (comment).

paulbellamy · 2016-01-25T17:38:41Z

A couple potential suggestions, but with a render.PurgeCache() called in the benchmarks/tests, it LGTM.

…s multiple times.

tomwilkie · 2016-01-25T23:06:02Z

After adding the PurgeCache method between benchmark runs, it becomes obvious this isn't much of a performance increase (613703 -> 582766 ns/op, so within the margin). On inspection, this is because there is actually very little work done twice by the host renderer, or any renderer.

However this does make a huge difference to the calls to /api/topology (from 7197010 -> 1447692 ns/op, or 5x faster). As we hit this every second, this results in a small reduction in CPU usage for the scope container on my VM running the example app.

Still too high.

paulbellamy · 2016-01-26T14:01:27Z

Still too high.

I disagree. 1-8 ms to generate all of the reports seems entirely reasonable. If you're saying that the CPU usage is too high, then I agree, but that is probably a different problem/solution.

paulbellamy · 2016-01-26T14:14:30Z

Actually, this should mean that we only have to run each renderer each time a probe report comes in, so we're at most running them 8ms/15seconds or 0.05% of the time. So, this should have a pretty strong impact on CPU usage, in spite of not showing it on the benchmarks.

tomwilkie · 2016-01-26T16:26:11Z

If you're saying that the CPU usage is too high, then I agree

Yes thats what I meant. I see the app using mostly 3% CPU, occasionally spiking to 20%.

Actually, this should mean that we only have to run each renderer each time a probe report comes in, so we're at most running them 8ms/15seconds or 0.05% of the time.

This won't happen as every http request creates a new report (with a new ID) so won't hit the cache - ie the cache is only used for things renderer multiple times in the same request. We could improve the collector such that the same report is used across requests?

paulbellamy · 2016-01-26T16:52:54Z

We could [have the collector cache the last report until a new one comes in]

👍

tomwilkie · 2016-01-27T00:07:01Z

Instead of caching the result (as the invalidation logic is complication wrt to timing them out everything the oldest report is older than 15s) I made the ID of the generated report a hash of the IDs of the input reports, which should have the same effect. CPU usage for the scope-app, using the example app, was normally 0%, then went from 3% to 12%.

So I don't think it had the effect I wanted, but at this level its hard to measure (we need a better way of measure CPU usage - maybe from a 30s profile)?

I'm going to try caching the report now, as that will save some merge cost, but I'm not convinced its worth it - once you have more that 3 probes, you'll have to generate a new report each time.

tomwilkie · 2016-01-27T00:32:23Z

Caching them also didn't make a huge different, but the app uses 0% CPU for a bunch of time, and occasionally spikes to 10ish. I think this is fine for the app now.

paulbellamy · 2016-01-27T11:10:19Z

Why not murmur3.New128?

Memoise & cache the result of renderers, so we don't recalculate views multiple times.

tomwilkie changed the title ~~Memoise & cache the result of renderers, so we don't recalculate view…~~ Memoise & cache the result of renderers, so we don't recalculate views multiple times. Jan 22, 2016

tomwilkie assigned paulbellamy Jan 24, 2016

paulbellamy assigned tomwilkie and unassigned paulbellamy Jan 25, 2016

tomwilkie added 4 commits January 25, 2016 13:31

Memoise & cache the result of renderers, so we don't recalculate view…

b8daa02

…s multiple times.

Fix tests

41d4822

Flush cache between runs of the benchmark.

f8cbaf0

A more debugging

b1a3a15

tomwilkie force-pushed the memoise-and-cache branch from fc56c16 to b1a3a15 Compare January 25, 2016 22:24

Add benchmark for list topologies API.

0e3e3b7

tomwilkie force-pushed the memoise-and-cache branch from 470860e to 0e3e3b7 Compare January 26, 2016 05:48

Make the collector's report IDs a hash of the merged in report.

57f5a8b

Cache reports generated by the collector.

8af839c

paulbellamy added a commit that referenced this pull request Jan 27, 2016

Merge pull request #851 from weaveworks/memoise-and-cache

1943ad4

Memoise & cache the result of renderers, so we don't recalculate views multiple times.

paulbellamy merged commit 1943ad4 into master Jan 27, 2016

paulbellamy deleted the memoise-and-cache branch January 27, 2016 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

tomwilkie commented Jan 22, 2016

tomwilkie commented Jan 23, 2016

paulbellamy commented Jan 25, 2016

paulbellamy commented Jan 25, 2016

tomwilkie commented Jan 25, 2016

paulbellamy commented Jan 25, 2016

tomwilkie commented Jan 25, 2016

paulbellamy commented Jan 26, 2016

paulbellamy commented Jan 26, 2016

tomwilkie commented Jan 26, 2016

paulbellamy commented Jan 26, 2016

tomwilkie commented Jan 27, 2016

tomwilkie commented Jan 27, 2016

paulbellamy commented Jan 27, 2016

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

Memoise & cache the result of renderers, so we don't recalculate views multiple times. #851

Conversation

tomwilkie commented Jan 22, 2016

tomwilkie commented Jan 23, 2016

paulbellamy commented Jan 25, 2016

paulbellamy commented Jan 25, 2016

tomwilkie commented Jan 25, 2016

paulbellamy commented Jan 25, 2016

tomwilkie commented Jan 25, 2016

paulbellamy commented Jan 26, 2016

paulbellamy commented Jan 26, 2016

tomwilkie commented Jan 26, 2016

paulbellamy commented Jan 26, 2016

tomwilkie commented Jan 27, 2016

tomwilkie commented Jan 27, 2016

paulbellamy commented Jan 27, 2016