Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] New ConstMap field #1724

Closed
wants to merge 10 commits into from
Closed

[WIP] New ConstMap field #1724

wants to merge 10 commits into from

Conversation

2opremio
Copy link
Contributor

@2opremio 2opremio commented Jul 26, 2016

ConstMap is similar to LatestMap, but instead of including a timestamp per key in only has a global timestamp (motivated from discussion on #1201 ).

This PR only migrates the LatestMap field to ConstMap in the Endpoint topology. If experiments are successful we might extend it to other topologies.

TODO:

  • Add ConstMap tests
  • Include custom encoder/decoder (investigating whether using since-epoch would provide considerably better performance than textual dates).
  • Backwards compatibility with old probes not producing ConstMap fields.

@2opremio
Copy link
Contributor Author

2opremio commented Jul 27, 2016

I have tested the change so far in dev-c4 (with the reports from the Scope probes monitoring production) and the results are a bit discouraging. Not only it doesn't perform better, but a bit worse.

It seems that adding an extra field to every node doesn't make up for the improvement on the Endpoint topology. We might see an improvement once I move more LatestMap data to ConstMap or create a custom encoder for the ConstMap, like we did for LatestMap.

Without the change: pprof.localhost:4040.samples.cpu.001.pb.gz

$ go tool pprof ~/pprof/pprof.localhost\:4040.samples.cpu.001.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
8300ms of 16010ms total (51.84%)
Dropped 315 nodes (cum <= 80.05ms)
Showing top 10 nodes out of 186 (cum >= 940ms)
      flat  flat%   sum%        cum   cum%
    2000ms 12.49% 12.49%     3110ms 19.43%  runtime.scanobject
    1490ms  9.31% 21.80%     2560ms 15.99%  runtime.heapBitsSweepSpan
    1070ms  6.68% 28.48%     1070ms  6.68%  runtime.(*mspan).sweep.func1
     770ms  4.81% 33.29%     2550ms 15.93%  runtime.mallocgc
     680ms  4.25% 37.54%      680ms  4.25%  runtime.memmove
     620ms  3.87% 41.41%      630ms  3.94%  runtime.heapBitsSetType
     460ms  2.87% 44.28%      460ms  2.87%  runtime.duffcopy
     420ms  2.62% 46.91%      420ms  2.62%  runtime.heapBitsForObject
     400ms  2.50% 49.41%      700ms  4.37%  runtime.greyobject
     390ms  2.44% 51.84%      940ms  5.87%  runtime.makemap
(pprof) 

With the change: pprof.localhost:4040.samples.cpu.003.pb.gz

$ go tool pprof ~/pprof/pprof.localhost\:4040.samples.cpu.003.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
9020ms of 17080ms total (52.81%)
Dropped 310 nodes (cum <= 85.40ms)
Showing top 10 nodes out of 195 (cum >= 980ms)
      flat  flat%   sum%        cum   cum%
    1940ms 11.36% 11.36%     3320ms 19.44%  runtime.scanobject
    1540ms  9.02% 20.37%     2970ms 17.39%  runtime.heapBitsSweepSpan
    1400ms  8.20% 28.57%     1400ms  8.20%  runtime.(*mspan).sweep.func1
     950ms  5.56% 34.13%     2830ms 16.57%  runtime.mallocgc
     740ms  4.33% 38.47%      740ms  4.33%  runtime.heapBitsSetType
     620ms  3.63% 42.10%      620ms  3.63%  runtime.memmove
     570ms  3.34% 45.43%      920ms  5.39%  runtime.greyobject
     500ms  2.93% 48.36%      500ms  2.93%  runtime.heapBitsForObject
     400ms  2.34% 50.70%      400ms  2.34%  runtime.duffcopy
     360ms  2.11% 52.81%      980ms  5.74%  runtime.makemap
(pprof) 

(To see the effect of the ConstMap for decoding download the svg and check the top left flames)

@2opremio
Copy link
Contributor Author

Here's the resulting report after the change: report-after.json.gz

I will upload the report before the change tomorrow.

@2opremio
Copy link
Contributor Author

Things look marginally better after creating a custom decoder: pprof.localhost:4040.samples.cpu.004.pb.gz

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.004.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
9060ms of 16830ms total (53.83%)
Dropped 359 nodes (cum <= 84.15ms)
Showing top 10 nodes out of 207 (cum >= 1110ms)
      flat  flat%   sum%        cum   cum%
    2120ms 12.60% 12.60%     3360ms 19.96%  runtime.scanobject
    1920ms 11.41% 24.00%     2960ms 17.59%  runtime.heapBitsSweepSpan
    1040ms  6.18% 30.18%     1040ms  6.18%  runtime.(*mspan).sweep.func1
     920ms  5.47% 35.65%     2790ms 16.58%  runtime.mallocgc
     610ms  3.62% 39.28%      610ms  3.62%  runtime.heapBitsForObject
     580ms  3.45% 42.72%      580ms  3.45%  runtime.memmove
     570ms  3.39% 46.11%      570ms  3.39%  runtime.heapBitsSetType
     440ms  2.61% 48.72%      660ms  3.92%  runtime.greyobject
     430ms  2.55% 51.28%      430ms  2.55%  runtime.duffcopy
     430ms  2.55% 53.83%     1110ms  6.60%  runtime.makemap
(pprof) 

@rade
Copy link
Member

rade commented Jul 28, 2016

If experiments are successful we might extend it to other topologies.

The relative gain for endpoints will be much smaller than for, say, container env vars, labels and other immutable container info, since endpoints only have three fields whereas the latter has dozens.

@2opremio
Copy link
Contributor Author

The relative gain for endpoints will be much smaller than for, say, container env vars, labels and other immutable container info, since endpoints only have three fields whereas the latter has dozens.

Yep, but the Endpoint topology has way more nodes in the experiments I've done.

$ jq 'to_entries | map(select(type=="object" and (.value | type=="object"))) |  map(select(.value | has("nodes"))) | from_entries | map_values(.nodes | length)' < report.json
{
  "Endpoint": 3287,
  "Process": 0,
  "Container": 552,
  "Pod": 51,
  "Service": 50,
  "Deployment": 0,
  "ReplicaSet": 26,
  "ContainerImage": 38,
  "Host": 5,
  "Overlay": 5
}

Anyways, I am going to migrate env variables and labels next.

@2opremio
Copy link
Contributor Author

After migrating the tables (namely docker labels and envs) to ConstMap the CPU consumption is similar to how it was with LatestMap: pprof.localhost:4040.samples.cpu.005.pb.gz

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.005.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
8650ms of 16380ms total (52.81%)
Dropped 286 nodes (cum <= 81.90ms)
Showing top 10 nodes out of 194 (cum >= 900ms)
      flat  flat%   sum%        cum   cum%
    1970ms 12.03% 12.03%     3270ms 19.96%  runtime.scanobject
    1570ms  9.58% 21.61%     2680ms 16.36%  runtime.heapBitsSweepSpan
    1110ms  6.78% 28.39%     1110ms  6.78%  runtime.(*mspan).sweep.func1
     730ms  4.46% 32.84%     2670ms 16.30%  runtime.mallocgc
     680ms  4.15% 37.00%      690ms  4.21%  runtime.heapBitsSetType
     620ms  3.79% 40.78%      620ms  3.79%  runtime.memmove
     550ms  3.36% 44.14%      550ms  3.36%  runtime.duffcopy
     520ms  3.17% 47.31%      810ms  4.95%  runtime.greyobject
     490ms  2.99% 50.31%      490ms  2.99%  runtime.heapBitsForObject
     410ms  2.50% 52.81%      900ms  5.49%  runtime.makemap
(pprof) 

It could also be that the load in the probes in production has increased in the past day, I will remove my changes and compare again.

BTW, this doesn't include #1728 yet.

@rade
Copy link
Member

rade commented Jul 29, 2016

I suggest a) re-basing on master, and b) doing a back-to-back comparison against current prod traffic. And please attach the before/after reports.

@2opremio
Copy link
Contributor Author

2opremio commented Jul 29, 2016

After rebasing and re-running the tests it seems that ConstMaps are not making any difference. The query service keeps spending roughly 30% time decoding although the final uncompressed report is considerably smaller (in JSON at least).

Here is what I get without the ConstMap changes (master-0525a2a):

Profile: pprof.localhost:4040.samples.cpu.001.pb.gz
Report: report-master.json.gz (1.8 MB uncompressed JSON)

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.001.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
7500ms of 15410ms total (48.67%)
Dropped 305 nodes (cum <= 77.05ms)
Showing top 10 nodes out of 205 (cum >= 280ms)
      flat  flat%   sum%        cum   cum%
    1630ms 10.58% 10.58%     2640ms 17.13%  runtime.scanobject
    1600ms 10.38% 20.96%     2590ms 16.81%  runtime.heapBitsSweepSpan
     990ms  6.42% 27.38%      990ms  6.42%  runtime.(*mspan).sweep.func1
     660ms  4.28% 31.67%      660ms  4.28%  runtime.memmove
     630ms  4.09% 35.76%      630ms  4.09%  runtime.heapBitsSetType
     630ms  4.09% 39.84%     2470ms 16.03%  runtime.mallocgc
     400ms  2.60% 42.44%      660ms  4.28%  runtime.greyobject
     380ms  2.47% 44.91%      380ms  2.47%  runtime.heapBitsForObject
     300ms  1.95% 46.85%      300ms  1.95%  github.com/weaveworks/scope/vendor/github.com/weaveworks/ps.hashKey
     280ms  1.82% 48.67%      280ms  1.82%  hash/crc32.update
(pprof)

And here is what I get with the CostMap changes (const-map-a410a4b)

Profile: pprof.localhost:4040.samples.cpu.002.pb.gz
Report: report-after.json.gz (1.4 MB uncompressed JSON)

$ go tool pprof --seconds 90 http://localhost:4040/debug/pprof/profile
Fetching profile from http://localhost:4040/debug/pprof/profile?seconds=90
Please wait... (1m30s)
Saved profile in /Users/fons/pprof/pprof.localhost:4040.samples.cpu.002.pb.gz
Entering interactive mode (type "help" for commands)
(pprof) top10
7620ms of 15680ms total (48.60%)
Dropped 326 nodes (cum <= 78.40ms)
Showing top 10 nodes out of 214 (cum >= 300ms)
      flat  flat%   sum%        cum   cum%
    1630ms 10.40% 10.40%     2770ms 17.67%  runtime.scanobject
    1490ms  9.50% 19.90%     2540ms 16.20%  runtime.heapBitsSweepSpan
    1050ms  6.70% 26.59%     1050ms  6.70%  runtime.(*mspan).sweep.func1
     850ms  5.42% 32.02%     3080ms 19.64%  runtime.mallocgc
     580ms  3.70% 35.71%      580ms  3.70%  runtime.memmove
     510ms  3.25% 38.97%      510ms  3.25%  runtime.heapBitsSetType
     470ms  3.00% 41.96%      470ms  3.00%  runtime.heapBitsForObject
     400ms  2.55% 44.52%      700ms  4.46%  runtime.greyobject
     340ms  2.17% 46.68%      340ms  2.17%  runtime.duffcopy
     300ms  1.91% 48.60%      300ms  1.91%  runtime.aeshashbody
(pprof) 

@2opremio
Copy link
Contributor Author

Before the change, decoding LatestMaps accounted for 15.25% of the overall CPU time and after it, decoding LatestMaps+ConstMaps accounts for roughly 13.5% (6.57% + 6.89%).

Interestingly LatestMap entries accounted for 7.27% and now (for the remaining LatestMap entries) it still takes 4.15% .

@2opremio
Copy link
Contributor Author

My theory is that decoding is dominated by string decoding (which hasn't changed) and we are adding an extra field to all nodes. Here's the line-by-line profiling for both decoders:

(pprof) list Map.*DecodeSelf  
Total: 15.68s
ROUTINE ======================== github.com/weaveworks/scope/report.(*ConstMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/const_map.go
      10ms      1.03s (flat, cum)  6.57% of Total
         .          .    226:       encoder.Encode(struct{}{})
         .          .    227:   }
         .          .    228:}
         .          .    229:
         .          .    230:// CodecDecodeSelf implements codec.Selfer
      10ms       10ms    231:func (m *ConstMap) CodecDecodeSelf(decoder *codec.Decoder) {
         .       10ms    232:   var in intermediateConstMap
         .      1.01s    233:   if err := decoder.Decode(&in); err != nil {
         .          .    234:       return
         .          .    235:   }
         .          .    236:   if in.Map.Map == nil {
         .          .    237:       *m = EmptyConstMap
         .          .    238:   } else {
ROUTINE ======================== github.com/weaveworks/scope/report.(*LatestMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/latest_map.go
         0      1.08s (flat, cum)  6.89% of Total
         .          .    206:   if r.TryDecodeAsNil() {
         .          .    207:       *m = LatestMap{}
         .          .    208:       return
         .          .    209:   }
         .          .    210:
         .       10ms    211:   length := r.ReadMapStart()
         .          .    212:   out := ps.NewMap()
         .          .    213:   for i := 0; length < 0 || i < length; i++ {
         .          .    214:       if length < 0 && r.CheckBreak() {
         .          .    215:           break
         .          .    216:       }
         .          .    217:
         .          .    218:       var key string
         .          .    219:       z.DecSendContainerState(containerMapKey)
         .       30ms    220:       if !r.TryDecodeAsNil() {
         .       70ms    221:           key = r.DecodeString()
         .          .    222:       }
         .          .    223:
         .       20ms    224:       var value LatestEntry
         .       10ms    225:       z.DecSendContainerState(containerMapValue)
         .       50ms    226:       if !r.TryDecodeAsNil() {
         .      710ms    227:           decoder.Decode(&value)
         .          .    228:       }
         .          .    229:
         .      180ms    230:       out = out.UnsafeMutableSet(key, value)
         .          .    231:   }
         .          .    232:   z.DecSendContainerState(containerMapEnd)
         .          .    233:   *m = LatestMap{out}
         .          .    234:}
         .          .    235:
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateConstMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/report.codecgen.go
      10ms      990ms (flat, cum)  6.31% of Total
         .          .    154:   if false {
         .          .    155:   } else if z.HasExtensions() && z.DecExt(x) {
         .          .    156:   } else {
         .          .    157:       yyct2 := r.ContainerType()
         .          .    158:       if yyct2 == codecSelferValueTypeMap9138 {
      10ms       10ms    159:           yyl2 := r.ReadMapStart()
         .          .    160:           if yyl2 == 0 {
         .          .    161:               z.DecSendContainerState(codecSelfer_containerMapEnd9138)
         .          .    162:           } else {
         .      980ms    163:               x.codecDecodeSelfFromMap(yyl2, d)
         .          .    164:           }
         .          .    165:       } else if yyct2 == codecSelferValueTypeArray9138 {
         .          .    166:           yyl2 := r.ReadArrayStart()
         .          .    167:           if yyl2 == 0 {
         .          .    168:               z.DecSendContainerState(codecSelfer_containerArrayEnd9138)
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateConstMap).codecDecodeSelfFromMap in /go/src/github.com/weaveworks/scope/report/report.codecgen.go
         0      980ms (flat, cum)  6.25% of Total
         .          .    191:           if r.CheckBreak() {
         .          .    192:               break
         .          .    193:           }
         .          .    194:       }
         .          .    195:       z.DecSendContainerState(codecSelfer_containerMapKey9138)
         .       10ms    196:       yys3Slc = r.DecodeBytes(yys3Slc, true, true)
         .          .    197:       yys3SlcHdr := codecSelferUnsafeString9138{uintptr(unsafe.Pointer(&yys3Slc[0])), len(yys3Slc)}
         .          .    198:       yys3 := *(*string)(unsafe.Pointer(&yys3SlcHdr))
         .          .    199:       z.DecSendContainerState(codecSelfer_containerMapValue9138)
         .          .    200:       switch yys3 {
         .          .    201:       case "map":
         .          .    202:           if r.TryDecodeAsNil() {
         .          .    203:               x.Map = intermediateMap{}
         .          .    204:           } else {
         .          .    205:               yyv4 := &x.Map
         .      940ms    206:               yyv4.CodecDecodeSelf(d)
         .          .    207:           }
         .          .    208:       case "timestamp":
         .          .    209:           if r.TryDecodeAsNil() {
         .          .    210:               x.Timestamp = 0
         .          .    211:           } else {
         .          .    212:               yyv5 := &x.Timestamp
         .          .    213:               yym6 := z.DecBinary()
         .          .    214:               _ = yym6
         .          .    215:               if false {
         .          .    216:               } else {
         .       20ms    217:                   *((*int64)(yyv5)) = int64(r.DecodeInt(64))
         .          .    218:               }
         .          .    219:           }
         .          .    220:       default:
         .          .    221:           z.DecStructFieldNotFound(-1, yys3)
         .          .    222:       } // end switch yys3
         .          .    223:   } // end for yyj3
         .       10ms    224:   z.DecSendContainerState(codecSelfer_containerMapEnd9138)
         .          .    225:}
         .          .    226:
         .          .    227:func (x *intermediateConstMap) codecDecodeSelfFromArray(l int, d *codec1978.Decoder) {
         .          .    228:   var h codecSelfer9138
         .          .    229:   z, r := codec1978.GenHelperDecoder(d)
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/const_map.go
         0      940ms (flat, cum)  5.99% of Total
         .          .    172:           break
         .          .    173:       }
         .          .    174:
         .          .    175:       var key string
         .          .    176:       z.DecSendContainerState(containerMapKey)
         .       50ms    177:       if !r.TryDecodeAsNil() {
         .      310ms    178:           key = r.DecodeString()
         .          .    179:       }
         .          .    180:
         .          .    181:       var value string
         .          .    182:       z.DecSendContainerState(containerMapValue)
         .       30ms    183:       if !r.TryDecodeAsNil() {
         .      200ms    184:           value = r.DecodeString()
         .          .    185:       }
         .          .    186:
         .      350ms    187:       out = out.UnsafeMutableSet(key, value)
         .          .    188:   }
         .          .    189:   z.DecSendContainerState(containerMapEnd)
         .          .    190:   m.Map = out
         .          .    191:}
         .          .    192:
(pprof) 

@2opremio
Copy link
Contributor Author

2opremio commented Jul 29, 2016

Decoding strings is expensive because you need to copy the underlying bytes. I tried optimizing DecodeString() as suggested at ugorji/go#143 but, as @ugorji predicted, everything exploded.

A way to optimize this would be embedding fixed-sized byte arrays instead of strings in places were we can limit the size (e.g. array keys).

@2opremio
Copy link
Contributor Author

2opremio commented Aug 13, 2016

I rebased the PR and did some object allocation analysis: pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz

const-map-alloc

$ go tool pprof  --alloc_objects ~/home-vagrant/scope-const-map /Users/fons/pprof/pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list ConstMap.*Decode
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/report.(*ConstMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/const_map.go
   2900034   18850339 (flat, cum) 13.74% of Total
         .          .    227:   }
         .          .    228:}
         .          .    229:
         .          .    230:// CodecDecodeSelf implements codec.Selfer
         .          .    231:func (m *ConstMap) CodecDecodeSelf(decoder *codec.Decoder) {
   1458220    1458220    232:   var in intermediateConstMap
         .   15950305    233:   if err := decoder.Decode(&in); err != nil {
         .          .    234:       return
         .          .    235:   }
         .          .    236:   if in.Map.Map == nil {
         .          .    237:       *m = EmptyConstMap
         .          .    238:   } else {
   1441814    1441814    239:       m.Map = in.Map
         .          .    240:       m.Timestamp = time.Unix(int64(0), in.Timestamp)
         .          .    241:   }
         .          .    242:}
         .          .    243:
         .          .    244:// MarshalJSON shouldn't be used, use CodecEncodeSelf instead
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateConstMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/report.codecgen.go
         0   15950305 (flat, cum) 11.63% of Total
         .          .    158:       if yyct2 == codecSelferValueTypeMap1075 {
         .          .    159:           yyl2 := r.ReadMapStart()
         .          .    160:           if yyl2 == 0 {
         .          .    161:               z.DecSendContainerState(codecSelfer_containerMapEnd1075)
         .          .    162:           } else {
         .   15950305    163:               x.codecDecodeSelfFromMap(yyl2, d)
         .          .    164:           }
         .          .    165:       } else if yyct2 == codecSelferValueTypeArray1075 {
         .          .    166:           yyl2 := r.ReadArrayStart()
         .          .    167:           if yyl2 == 0 {
         .          .    168:               z.DecSendContainerState(codecSelfer_containerArrayEnd1075)
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateConstMap).codecDecodeSelfFromMap in /go/src/github.com/weaveworks/scope/report/report.codecgen.go
         0   15950305 (flat, cum) 11.63% of Total
         .          .    201:       case "map":
         .          .    202:           if r.TryDecodeAsNil() {
         .          .    203:               x.Map = intermediateMap{}
         .          .    204:           } else {
         .          .    205:               yyv4 := &x.Map
         .   15950305    206:               yyv4.CodecDecodeSelf(d)
         .          .    207:           }
         .          .    208:       case "timestamp":
         .          .    209:           if r.TryDecodeAsNil() {
         .          .    210:               x.Timestamp = 0
         .          .    211:           } else {
(pprof) list intermediateMap
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/report.(*intermediateMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/const_map.go
   4227136   15950305 (flat, cum) 11.63% of Total
         .          .    173:       }
         .          .    174:
         .          .    175:       var key string
         .          .    176:       z.DecSendContainerState(containerMapKey)
         .          .    177:       if !r.TryDecodeAsNil() {
         .    4529315    178:           key = r.DecodeString()
         .          .    179:       }
         .          .    180:
         .          .    181:       var value string
         .          .    182:       z.DecSendContainerState(containerMapValue)
         .          .    183:       if !r.TryDecodeAsNil() {
         .    2493489    184:           value = r.DecodeString()
         .          .    185:       }
         .          .    186:
   4227136    8927501    187:       out = out.UnsafeMutableSet(key, value)
         .          .    188:   }
         .          .    189:   z.DecSendContainerState(containerMapEnd)
         .          .    190:   m.Map = out
         .          .    191:}
         .          .    192:
(pprof) list LatestMap.*Decode
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/report.(*LatestMap).CodecDecodeSelf in /go/src/github.com/weaveworks/scope/report/latest_map.go
   8989766   21469341 (flat, cum) 15.65% of Total
         .          .    216:       }
         .          .    217:
         .          .    218:       var key string
         .          .    219:       z.DecSendContainerState(containerMapKey)
         .          .    220:       if !r.TryDecodeAsNil() {
         .    4380685    221:           key = r.DecodeString()
         .          .    222:       }
         .          .    223:
   4456652    4456652    224:       var value LatestEntry
         .          .    225:       z.DecSendContainerState(containerMapValue)
         .          .    226:       if !r.TryDecodeAsNil() {
         .    3421933    227:           decoder.Decode(&value)
         .          .    228:       }
         .          .    229:
   4533114    9210071    230:       out = out.UnsafeMutableSet(key, value)
         .          .    231:   }
         .          .    232:   z.DecSendContainerState(containerMapEnd)
         .          .    233:   *m = LatestMap{out}
         .          .    234:}
         .          .    235:
(pprof) list UnsafeMutableSet
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/vendor/github.com/weaveworks/ps.(*tree).UnsafeMutableSet in /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/ps/map.go
         0    9377322 (flat, cum)  6.84% of Total
         .          .    184:
         .          .    185:// UnsafeMutableSet is the in-place mutable version of Set. Only use if
         .          .    186:// you are the only reference-holder of the Map.
         .          .    187:func (self *tree) UnsafeMutableSet(key string, value interface{}) Map {
         .          .    188:   hash := hashKey(key)
         .    9377322    189:   return mutableSetLowLevel(self, hash, hash, key, value)
         .          .    190:}
         .          .    191:
         .          .    192:func mutableSetLowLevel(self *tree, partialHash, hash uint64, key string, value interface{}) *tree {
         .          .    193:   if self.IsNil() { // an empty tree is easy
         .          .    194:       m := self.clone()
(pprof) list mutableSetLowLevel
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/vendor/github.com/weaveworks/ps.mutableSetLowLevel in /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/ps/map.go
   9377322   16849218 (flat, cum) 12.28% of Total
         .          .    189:   return mutableSetLowLevel(self, hash, hash, key, value)
         .          .    190:}
         .          .    191:
         .          .    192:func mutableSetLowLevel(self *tree, partialHash, hash uint64, key string, value interface{}) *tree {
         .          .    193:   if self.IsNil() { // an empty tree is easy
   9377322    9377322    194:       m := self.clone()
         .          .    195:       m.count = 1
         .          .    196:       m.hash = hash
         .          .    197:       m.key = key
         .          .    198:       m.value = value
         .          .    199:       return m
         .          .    200:   }
         .          .    201:
         .          .    202:   if hash != self.hash {
         .          .    203:       i := partialHash % childCount
         .          .    204:       oldChildCount := self.children[i].count
         .    7471896    205:       self.children[i] = mutableSetLowLevel(self.children[i], partialHash>>shiftSize, hash, key, value)
         .          .    206:       // update count if we added a new object
         .          .    207:       if oldChildCount < self.children[i].count {
         .          .    208:           self.count++
         .          .    209:       }
         .          .    210:       return self
(pprof) 
$ go tool pprof  -focus ConstMap --alloc_objects ~/home-vagrant/scope-const-map /Users/fons/pprof/pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list DecodeString
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/vendor/github.com/ugorji/go/codec.(*msgpackDecDriver).DecodeString in /go/src/github.com/weaveworks/scope/vendor/github.com/ugorji/go/codec/msgpack.go
   5685396    7022804 (flat, cum)  5.12% of Total
         .          .    551:   }
         .          .    552:   return decByteSlice(d.r, clen, bs)
         .          .    553:}
         .          .    554:
         .          .    555:func (d *msgpackDecDriver) DecodeString() (s string) {
   5685396    7022804    556:   return string(d.DecodeBytes(d.b[:], true, true))
         .          .    557:}
         .          .    558:
         .          .    559:func (d *msgpackDecDriver) readNextBd() {
         .          .    560:   d.bd = d.r.readn1()
         .          .    561:   d.bdRead = true
(pprof)
$ go tool pprof  -focus LatestMap --alloc_objects ~/home-vagrant/scope-const-map /Users/fons/pprof/pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list DecodeString
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/vendor/github.com/ugorji/go/codec.(*msgpackDecDriver).DecodeString in /go/src/github.com/weaveworks/scope/vendor/github.com/ugorji/go/codec/msgpack.go
   7032830    7744817 (flat, cum)  5.65% of Total
         .          .    551:   }
         .          .    552:   return decByteSlice(d.r, clen, bs)
         .          .    553:}
         .          .    554:
         .          .    555:func (d *msgpackDecDriver) DecodeString() (s string) {
   7032830    7744817    556:   return string(d.DecodeBytes(d.b[:], true, true))
         .          .    557:}
         .          .    558:
         .          .    559:func (d *msgpackDecDriver) readNextBd() {
         .          .    560:   d.bd = d.r.readn1()
         .          .    561:   d.bdRead = true
(pprof) 
$ go tool pprof --alloc_objects ~/home-vagrant/scope-const-map /Users/fons/pprof/pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz 
Entering interactive mode (type "help" for commands)
(pprof) list setLowLevel
Total: 137172540
ROUTINE ======================== github.com/weaveworks/scope/vendor/github.com/weaveworks/ps.setLowLevel in /go/src/github.com/weaveworks/scope/vendor/github.com/weaveworks/ps/map.go
  14109594   19942468 (flat, cum) 14.54% of Total
         .          .    149:   return setLowLevel(self, hash, hash, key, value)
         .          .    150:}
         .          .    151:
         .          .    152:func setLowLevel(self *tree, partialHash, hash uint64, key string, value interface{}) *tree {
         .          .    153:   if self.IsNil() { // an empty tree is easy
   4615959    4615959    154:           m := self.clone()
         .          .    155:           m.count = 1
         .          .    156:           m.hash = hash
         .          .    157:           m.key = key
         .          .    158:           m.value = value
         .          .    159:           return m
         .          .    160:   }
         .          .    161:
         .          .    162:   if hash != self.hash {
   5566093    5566093    163:           m := self.clone()
         .          .    164:           i := partialHash % childCount
         .    5832874    165:           m.children[i] = setLowLevel(self.children[i], partialHash>>shiftSize, hash, key, value)
         .          .    166:           // update count if we added a new object
         .          .    167:           if m.children[i].count > self.children[i].count {
         .          .    168:                   m.count++
         .          .    169:           }
         .          .    170:           return m
         .          .    171:   }
         .          .    172:
         .          .    173:   // did we find a hash collision?
         .          .    174:   if key != self.key {
         .          .    175:           oops := fmt.Sprintf("Hash collision between: '%s' and '%s'.  Please report to https://github.com/mndrix/ps/issues/new", self.key, key)
         .          .    176:           panic(oops)
         .          .    177:   }
         .          .    178:
         .          .    179:   // replacing a key's previous value
   3927542    3927542    180:   m := self.clone()
         .          .    181:   m.value = value
         .          .    182:   return m
         .          .    183:}
         .          .    184:
         .          .    185:// UnsafeMutableSet is the in-place mutable version of Set. Only use if
(pprof) 

The listings above show that:

  1. Decoding strings takes roughly 6% of the total allocations both for ConstMap and LatestMap (~12% in total). Which may explain the unimpressive win from [WIP] Cache LatestMap keys when decoding #1781 (caching the keys should lead to a maximum gain of ~6% allocations since it accounts for half of the strings, in practice it will be less than that since decoding from byteslices still causes allocations as shown in [WIP] Cache LatestMap keys when decoding #1781 (comment)).

  2. The mutable set operation takes ~15% of the decoding allocations. I am thinking that maybe we can use an allocation pool for the non-root nodes (which are never exposed to DS user).

  3. There are some allocations I don't really understand. For instance:

    1441814    1441814    239:       m.Map = in.Map
    

    Due to the underlying interface pointers?

    1458220    1458220    232:   var in intermediateConstMap
         .   15950305    233:   if err := decoder.Decode(&in); err != nil {
    

    Due to faulty escape analysis?

My hypothesis is that applying all the following optimizations together should start to make a difference:

  • ConstMap instead of LatestMap
  • Key caching ([WIP] Cache LatestMap keys when decoding #1781 )
  • Removing interfaces from ps.Map (i.e. making it a struct and creating a data-structure specific for string values)
  • Using a ps.Map node-pool for non-root nodes.

Question is, Is it really worth the effort or should we just work on delta reports (#985 ) instead?

@2opremio
Copy link
Contributor Author

Closing for now

@2opremio 2opremio closed this Jan 16, 2017
@2opremio 2opremio deleted the const-map branch February 13, 2017 12:17
@2opremio 2opremio restored the const-map branch February 13, 2017 12:17
@2opremio 2opremio deleted the const-map branch January 13, 2020 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants