-
Notifications
You must be signed in to change notification settings - Fork 712
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] New ConstMap field #1724
[WIP] New ConstMap field #1724
Conversation
I have tested the change so far in dev-c4 (with the reports from the Scope probes monitoring production) and the results are a bit discouraging. Not only it doesn't perform better, but a bit worse. It seems that adding an extra field to every node doesn't make up for the improvement on the Endpoint topology. We might see an improvement once I move more LatestMap data to ConstMap or create a custom encoder for the ConstMap, like we did for LatestMap. Without the change: pprof.localhost:4040.samples.cpu.001.pb.gz
With the change: pprof.localhost:4040.samples.cpu.003.pb.gz
(To see the effect of the ConstMap for decoding download the svg and check the top left flames) |
Here's the resulting report after the change: report-after.json.gz I will upload the report before the change tomorrow. |
Things look marginally better after creating a custom decoder: pprof.localhost:4040.samples.cpu.004.pb.gz
|
The relative gain for endpoints will be much smaller than for, say, container env vars, labels and other immutable container info, since endpoints only have three fields whereas the latter has dozens. |
Yep, but the Endpoint topology has way more nodes in the experiments I've done.
Anyways, I am going to migrate env variables and labels next. |
After migrating the tables (namely docker labels and envs) to ConstMap the CPU consumption is similar to how it was with LatestMap: pprof.localhost:4040.samples.cpu.005.pb.gz
It could also be that the load in the probes in production has increased in the past day, I will remove my changes and compare again. BTW, this doesn't include #1728 yet. |
I suggest a) re-basing on master, and b) doing a back-to-back comparison against current prod traffic. And please attach the before/after reports. |
After rebasing and re-running the tests it seems that ConstMaps are not making any difference. The query service keeps spending roughly 30% time decoding although the final uncompressed report is considerably smaller (in JSON at least). Here is what I get without the ConstMap changes (master-0525a2a): Profile: pprof.localhost:4040.samples.cpu.001.pb.gz
And here is what I get with the CostMap changes (const-map-a410a4b) Profile: pprof.localhost:4040.samples.cpu.002.pb.gz
|
Before the change, decoding LatestMaps accounted for 15.25% of the overall CPU time and after it, decoding LatestMaps+ConstMaps accounts for roughly 13.5% (6.57% + 6.89%). Interestingly LatestMap entries accounted for 7.27% and now (for the remaining LatestMap entries) it still takes 4.15% . |
My theory is that decoding is dominated by string decoding (which hasn't changed) and we are adding an extra field to all nodes. Here's the line-by-line profiling for both decoders:
|
Decoding strings is expensive because you need to copy the underlying bytes. I tried optimizing DecodeString() as suggested at ugorji/go#143 but, as @ugorji predicted, everything exploded. A way to optimize this would be embedding fixed-sized byte arrays instead of strings in places were we can limit the size (e.g. array keys). |
I rebased the PR and did some object allocation analysis: pprof.localhost:4040.alloc_objects.alloc_space.001.pb.gz
The listings above show that:
My hypothesis is that applying all the following optimizations together should start to make a difference:
Question is, Is it really worth the effort or should we just work on delta reports (#985 ) instead? |
Closing for now |
ConstMap is similar to LatestMap, but instead of including a timestamp per key in only has a global timestamp (motivated from discussion on #1201 ).
This PR only migrates the LatestMap field to ConstMap in the Endpoint topology. If experiments are successful we might extend it to other topologies.
TODO: