Improve performance of immutable maps #1720

2opremio · 2016-07-25T12:50:19Z

Replace github.com/mndrix/ps by github.com/weaveworks/ps (See Improve performance of maps ps#1 )
LatestMap codec performance improvements and cleanups
- Allocate all map entries of the intermediate representation at once
- Use UnsafeMutableMap to improve performance of LatestMap construction
- Remove gob encoder/decoder

tomwilkie · 2016-07-25T13:25:23Z

I like that UnsafeMutableSet is on the ps.Tree, and not on the LatestMap.

What is the performance impact?

2opremio · 2016-07-25T13:51:21Z

My measurements against the query service in the dev-c4 cluster show about a ~10% reduction in CPU consumption. (For @rade , I initially saw about 30% but stabilized in ~10% when I removed load balancing from the equation, using a single replica, and I obtained longer profiles of 90s instead of 30s)

Without the changes in this PR:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.001.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
21.96s of 41.59s total (52.80%)
Dropped 382 nodes (cum <= 0.21s)
Showing top 10 nodes out of 197 (cum >= 1.13s)
      flat  flat%   sum%        cum   cum%
     5.12s 12.31% 12.31%      8.14s 19.57%  runtime.scanobject
     3.32s  7.98% 20.29%      6.26s 15.05%  runtime.heapBitsSweepSpan
     2.94s  7.07% 27.36%      2.94s  7.07%  runtime.(*mspan).sweep.func1
     2.30s  5.53% 32.89%      8.21s 19.74%  runtime.mallocgc
     1.96s  4.71% 37.61%      1.98s  4.76%  runtime.heapBitsSetType
     1.57s  3.77% 41.38%      1.57s  3.77%  runtime.memmove
     1.24s  2.98% 44.36%      1.94s  4.66%  runtime.greyobject
     1.23s  2.96% 47.32%      2.38s  5.72%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
     1.15s  2.77% 50.08%      1.15s  2.77%  runtime.stringiter2
     1.13s  2.72% 52.80%      1.13s  2.72%  runtime.heapBitsForObject
(pprof)

With the changes:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
20.28s of 38.11s total (53.21%)
Dropped 409 nodes (cum <= 0.19s)
Showing top 10 nodes out of 200 (cum >= 1.01s)
      flat  flat%   sum%        cum   cum%
     4.91s 12.88% 12.88%      8.36s 21.94%  runtime.scanobject
     3.38s  8.87% 21.75%      5.54s 14.54%  runtime.heapBitsSweepSpan
     2.16s  5.67% 27.42%      2.16s  5.67%  runtime.(*mspan).sweep.func1
     1.93s  5.06% 32.48%      6.01s 15.77%  runtime.mallocgc
     1.53s  4.01% 36.50%      1.53s  4.01%  runtime.memmove
     1.44s  3.78% 40.28%      1.44s  3.78%  runtime.heapBitsSetType
     1.38s  3.62% 43.90%      2.15s  5.64%  runtime.greyobject
     1.33s  3.49% 47.39%      1.33s  3.49%  runtime.heapBitsForObject
     1.21s  3.18% 50.56%      2.22s  5.83%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
     1.01s  2.65% 53.21%      1.01s  2.65%  runtime.stringiter2
(pprof)

2opremio · 2016-07-25T14:28:46Z

After fixing the cut-and-paste bug in the recursive call things look better: ~18% improvement.

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
17710ms of 34480ms total (51.36%)
Dropped 400 nodes (cum <= 172.40ms)
Showing top 10 nodes out of 198 (cum >= 970ms)
      flat  flat%   sum%        cum   cum%
    3500ms 10.15% 10.15%     6040ms 17.52%  runtime.scanobject
    2710ms  7.86% 18.01%     2710ms  7.86%  runtime.(*mspan).sweep.func1
    2520ms  7.31% 25.32%     5240ms 15.20%  runtime.heapBitsSweepSpan
    1620ms  4.70% 30.02%     6440ms 18.68%  runtime.mallocgc
    1550ms  4.50% 34.51%     1550ms  4.50%  runtime.memmove
    1340ms  3.89% 38.40%     2450ms  7.11%  github.com/weaveworks/scope/vendor/github.com/mndrix/ps.hashKey
    1320ms  3.83% 42.23%     1320ms  3.83%  runtime.heapBitsSetType
    1110ms  3.22% 45.45%     1110ms  3.22%  runtime.stringiter2
    1070ms  3.10% 48.55%     1620ms  4.70%  runtime.greyobject
     970ms  2.81% 51.36%      970ms  2.81%  runtime.heapBitsForObject
(pprof)

tomwilkie · 2016-07-25T14:37:27Z

Thats a bit better!

2opremio · 2016-07-25T15:20:02Z

I've discovered that we are spending 10% of the app time ... parsing unicode while hashing

func hashKey(key string) uint64 {
    hash := offset64
    for _, codetime := range key {
        hash ^= uint64(codepoint)
        hash *= prime64
    }
    return hash
}

An unsafe casting to a bytes cuts down another 10% CPU:

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.006.pb.gz
Entering interactive mode (type "help" for commands) 
(pprof) top5
11.88s of 31.75s total (37.42%)
Dropped 374 nodes (cum <= 0.16s)
Showing top 5 nodes out of 206 (cum >= 1.29s)
      flat  flat%   sum%        cum   cum%
     3.58s 11.28% 11.28%      6.09s 19.18%  runtime.scanobject
     3.06s  9.64% 20.91%      5.34s 16.82%  runtime.heapBitsSweepSpan
     2.28s  7.18% 28.09%      2.28s  7.18%  runtime.(*mspan).sweep.func1
     1.67s  5.26% 33.35%      6.12s 19.28%  runtime.mallocgc
     1.29s  4.06% 37.42%      1.29s  4.06%  runtime.memmove
(pprof)

common/xfer/plugin_spec.go

 	"github.com/ugorji/go/codec"
+	"github.com/weaveworks/ps"


tomwilkie · 2016-07-26T09:30:07Z

Other than one comment, LGTM

report/latest_map.go

-	}
-	return LatestMap{out}
-}
-
 // CodecEncodeSelf implements codec.Selfer
 func (m *LatestMap) CodecEncodeSelf(encoder *codec.Encoder) {
 	if m.Map != nil {


paulbellamy · 2016-07-26T10:17:21Z

You probably meant to gvt fetch github.com/weaveworks/ps

2opremio · 2016-07-26T10:21:55Z

You probably meant to gvt fetch github.com/weaveworks/ps

I forgot to commit it, thanks

* Allocate all map entries of the intermadiate representation at once * Use UnsafeMutableMap to improve performance of LatestMap construction * Remove gob encoder/decoder

paulbellamy · 2016-07-26T10:47:22Z

Once vendored, LGTM.

2opremio force-pushed the reduce-gc branch from 51360de to 82ed90d Compare July 25, 2016 13:02

2opremio changed the title ~~[WIP] Improve performance when decoding/setting maps~~ [WIP] Improve performance when decoding/setting immutable maps Jul 25, 2016

2opremio changed the title ~~[WIP] Improve performance when decoding/setting immutable maps~~ [WIP] Improve immutable maps performance Jul 25, 2016

2opremio mentioned this pull request Jul 25, 2016

Improve performance of maps weaveworks/ps#1

Merged

2opremio force-pushed the reduce-gc branch from 6684aac to 63328c5 Compare July 26, 2016 09:23

2opremio changed the title ~~[WIP] Improve immutable maps performance~~ Improve performance of immutable maps Jul 26, 2016

tomwilkie reviewed Jul 26, 2016
View reviewed changes

paulbellamy reviewed Jul 26, 2016
View reviewed changes

Alfonso Acosta added 2 commits July 26, 2016 10:35

Replace github.com/mndrix/ps by github.com/weaveworks/ps

ecc8a31

LatestMap codec performance improvements and cleanups

a80429d

* Allocate all map entries of the intermadiate representation at once * Use UnsafeMutableMap to improve performance of LatestMap construction * Remove gob encoder/decoder

2opremio force-pushed the reduce-gc branch from 63328c5 to a80429d Compare July 26, 2016 10:36

Fix tests

b5c488f

2opremio merged commit 2132528 into master Jul 26, 2016

2opremio deleted the reduce-gc branch July 26, 2016 12:28

2opremio mentioned this pull request Jul 29, 2016

Reduce Garbage Collection pressure #1010

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance of immutable maps #1720

Improve performance of immutable maps #1720

2opremio commented Jul 25, 2016 •

edited

Loading

tomwilkie commented Jul 25, 2016

2opremio commented Jul 25, 2016 •

edited

Loading

2opremio commented Jul 25, 2016 •

edited

Loading

tomwilkie commented Jul 25, 2016

2opremio commented Jul 25, 2016 •

edited

Loading

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Jul 26, 2016

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

paulbellamy commented Jul 26, 2016

2opremio commented Jul 26, 2016

paulbellamy commented Jul 26, 2016

		"github.com/ugorji/go/codec"
		"github.com/weaveworks/ps"

Improve performance of immutable maps #1720

Improve performance of immutable maps #1720

Conversation

2opremio commented Jul 25, 2016 • edited Loading

tomwilkie commented Jul 25, 2016

2opremio commented Jul 25, 2016 • edited Loading

2opremio commented Jul 25, 2016 • edited Loading

tomwilkie commented Jul 25, 2016

2opremio commented Jul 25, 2016 • edited Loading

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

tomwilkie commented Jul 26, 2016

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

This comment was marked as abuse.

paulbellamy commented Jul 26, 2016

2opremio commented Jul 26, 2016

paulbellamy commented Jul 26, 2016

2opremio commented Jul 25, 2016 •

edited

Loading

2opremio commented Jul 25, 2016 •

edited

Loading

2opremio commented Jul 25, 2016 •

edited

Loading

2opremio commented Jul 25, 2016 •

edited

Loading