Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom encoder for latest maps #1709

Merged
merged 3 commits into from
Jul 21, 2016
Merged

Custom encoder for latest maps #1709

merged 3 commits into from
Jul 21, 2016

Conversation

tomwilkie
Copy link
Contributor

@tomwilkie tomwilkie commented Jul 21, 2016

Implement a custom encoder for latest maps, instead of round tripping through a temporary map.

In a very unscientific test, LatestMap decoding went from about 5% CPU to nothing (it was unmeasurable). See below for better numbers

As it stands, this change won't be backward compatible. But if we like this approach, it can be made so with a little work. Is now backwards compatible. Have not implemented optimised encode - which prove this.

@tomwilkie tomwilkie changed the title [DO NOT MERGE!] Customer encoder for latest maps [DO NOT MERGE!] Custom encoder for latest maps Jul 21, 2016
@2opremio
Copy link
Contributor

2opremio commented Jul 21, 2016

In a very unscientific test, LatestMap decoding went from about 5% CPU to nothing (it was unmeasurable).

What did you use to test this? It may be worth experimenting with but after looking at profiles from the service most of the CPU is consumed in Set()'s memory allocation (which we need to invoke regardless).

I saw a two potential small improvements though:

  • Allocate the intermediate map at once when encoding (in := make(map[string]LatestEntry, m,Size()) instead of in := map[string]LatestEntry{})
  • Use monomorphic string values for keys/values (instead of interfaces) in ps.Map . The castings to/from interfaces seem to be expensive (5% of the program's runtime).

return
}

length := r.ReadMapStart()

This comment was marked as abuse.

This comment was marked as abuse.

- Cuts out intermediate type
- Should be backwards compatible with existing reports.
@tomwilkie tomwilkie changed the title [DO NOT MERGE!] Custom encoder for latest maps Custom encoder for latest maps Jul 21, 2016
@tomwilkie
Copy link
Contributor Author

I've stuck this on the service-scope-app instance on dev. Gathered a before and after profile.

Before:

Toms-MacBook-Pro:scope twilkie$ go tool pprof prog/scope ~/Downloads/profile\ \(1\) 
Entering interactive mode (type "help" for commands)
(pprof) list CodecDecodeSelf
Total: 7.42s
...
ROUTINE ======================== github.com/weaveworks/scope/report.(*LatestMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/latest_map.go
         0      540ms (flat, cum)  7.28% of Total
 ...

After

Toms-MacBook-Pro:scope twilkie$ go tool pprof prog/scope ~/Downloads/profile\ \(2\) 
Entering interactive mode (type "help" for commands)
(pprof) list CodecDecodeSelf
Total: 6.70s
...
ROUTINE ======================== github.com/weaveworks/scope/report.(*LatestMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/latest_map.go
      10ms      370ms (flat, cum)  5.52% of Total
...

A modest improvement then, not as much as I'd hoped for. ConstMap should still help I hope.

@tomwilkie
Copy link
Contributor Author

Interestingly the overall effect is larger than I thought it would be, almost 10%:

before:

(pprof) top 5
2850ms of 7420ms total (38.41%)
Dropped 187 nodes (cum <= 37.10ms)
Showing top 5 nodes out of 216 (cum >= 1250ms)
      flat  flat%   sum%        cum   cum%
    1300ms 17.52% 17.52%     2080ms 28.03%  runtime.scanobject
     460ms  6.20% 23.72%      460ms  6.20%  runtime.(*mspan).sweep.func1
     400ms  5.39% 29.11%      550ms  7.41%  runtime.greyobject
     350ms  4.72% 33.83%     1250ms 16.85%  runtime.mallocgc
     340ms  4.58% 38.41%     1250ms 16.85%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.(*tree).deleteLeftmost

after:

(pprof) top 5
2770ms of 6700ms total (41.34%)
Dropped 203 nodes (cum <= 33.50ms)
Showing top 5 nodes out of 171 (cum >= 350ms)
      flat  flat%   sum%        cum   cum%
    1260ms 18.81% 18.81%     2000ms 29.85%  runtime.scanobject
     400ms  5.97% 24.78%      400ms  5.97%  runtime.heapBitsForObject
     400ms  5.97% 30.75%      410ms  6.12%  runtime.heapBitsSetType
     360ms  5.37% 36.12%      360ms  5.37%  runtime.(*mspan).sweep.func1
     350ms  5.22% 41.34%      350ms  5.22%  runtime.memmove

//m.Map.ForEach(func(key string, value interface{}) {
// r.EncodeString(1, key)
// encoder.Encode(value.(LatestEntry))
//})

This comment was marked as abuse.

@jml
Copy link
Contributor

jml commented Jul 21, 2016

I notice that we already have benchmarks for this code in latest_map_internal_test.go. Are they run by our CI? Do we get failures when they become significantly slower? Should we add more to capture this performance improvement?

@jml
Copy link
Contributor

jml commented Jul 21, 2016

Very glad to see there are already tests for latest_map encode & decode.

Thanks for the patch. It's great to see quantifiable performance improvements. Please ping me when you're done responding to comments.

@tomwilkie
Copy link
Contributor Author

I notice that we already have benchmarks for this code in latest_map_internal_test.go. Are they run by our CI?

I don't think so

Do we get failures when they become significantly slower?

No

Should we add more to capture this performance improvement?

TBH I've not found the benchmarks to be particularly useful.

@tomwilkie
Copy link
Contributor Author

@jml PTAL

@jml
Copy link
Contributor

jml commented Jul 21, 2016

TBH I've not found the benchmarks to be particularly useful.

If this is the common experience then perhaps we should delete them. Not for this PR though.

Thanks,
LGTM.

@tomwilkie tomwilkie merged commit 78199aa into master Jul 21, 2016
@rade
Copy link
Member

rade commented Jul 21, 2016

related to #1457 and #1010.

@tomwilkie tomwilkie deleted the latest-encoder branch July 21, 2016 15:25
@2opremio
Copy link
Contributor

2opremio commented Jul 25, 2016

Interestingly the overall effect is larger than I thought it would be, almost 10%:

@tomwilkie Did you make multiple measurements? Note that load balancing is not uniform right now, so it could be that simply the load balancing changed after re-deploying (I have experienced this when making my own measurements).

@tomwilkie
Copy link
Contributor Author

Did you make multiple measurements?

I did not, so take the measurement with a pinch of salt. Although I suspect unequal load balancing would have move than 10% difference. And load balancing is better than we think (I think). But all performance measurements should be taken with a pinch of salt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants