Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance optimizations for the inmem sinks key flattening #161

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mkeeler
Copy link
Member

@mkeeler mkeeler commented Mar 5, 2024

Description

In some Consul profiling we noticed the inmem sinks key flattening showing up as taking significant CPU. So I went into this looking to optimize key flattening to reduce the overhead.

The TLDR from everything that follows is that the changes introduced in this PR reduce CPU usage by 54-75% and memory allocations by 70-83%. for key flattening in the inmem sink

Details

I did these optimizations in 3 parts:

Eliminate fmt.Sprintf in flattenKeyLabels:

This drastically reduces allocations when there are lots of labels and had a 15-30% CPU usage reduction when labels were used. The more labels used the more drastic the reduction.

Simplify flattenKey:

Here I got rid of the temporary buffer and string replacer and put the code into the final state of this commit. This reduced the CPU utilization for flattenKey by 50% and allocations by 75%. The impact on flattenKeyLabels was a little less pronounced as most of the CPU is actually in label processing. Even so the reductions were 25-40% for CPU usage and 37-42% for allocations.

Eliminate the space replacer and call strings.Replace just once:

Within the label processing loop we were using the space replacer to write modified bytes out to the buffer. I instead swapped this for direct buffer writes and a single call to strings.Replace at the end. This resulted in another 33-47% CPU reduction for the function and 20-50% less allocations.

Overall benchmark comparison:

goos: darwin
goarch: amd64
pkg: github.com/hashicorp/go-metrics
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
                                               │ /Users/mkeeler/go-metrics.old.txt │ /Users/mkeeler/go-metrics.no-space-replacer.txt │
                                               │              sec/op               │         sec/op           vs base                │
FlattenKey/three-segments-16                                          189.00n ± 8%               62.97n ± 4%  -66.68% (p=0.000 n=10)
FlattenKey/five-segments-16                                           198.10n ± 6%               75.89n ± 3%  -61.69% (p=0.000 n=10)
FlattenKey/ten-segments-16                                             270.6n ± 4%               123.7n ± 5%  -54.30% (p=0.000 n=10)
FlattenKeyLabels/three-segments-no-labels-16                           277.3n ± 6%               102.0n ± 4%  -63.23% (p=0.000 n=10)
FlattenKeyLabels/three-segments-one-label-16                           512.4n ± 3%               128.9n ± 4%  -74.84% (p=0.000 n=10)
FlattenKeyLabels/five-segments-three-labels-16                        1033.0n ± 6%               238.3n ± 1%  -76.94% (p=0.000 n=10)
FlattenKeyLabels/ten-segments-five-labels-16                          1465.0n ± 6%               360.9n ± 2%  -75.36% (p=0.000 n=10)
_GlobalMetrics_Direct/direct-16                                        20.87n ± 5%               20.30n ± 2%   -2.68% (p=0.015 n=10)
_GlobalMetrics_Direct/atomic.Value-16                                  21.73n ± 7%               24.00n ± 9%  +10.42% (p=0.023 n=10)
geomean                                                                215.2n                    88.28n       -58.97%

                                               │ /Users/mkeeler/go-metrics.old.txt │ /Users/mkeeler/go-metrics.no-space-replacer.txt │
                                               │               B/op                │         B/op           vs base                  │
FlattenKey/three-segments-16                                         144.00 ± 0%                16.00 ± 0%  -88.89% (p=0.000 n=10)
FlattenKey/five-segments-16                                          160.00 ± 0%                24.00 ± 0%  -85.00% (p=0.000 n=10)
FlattenKey/ten-segments-16                                           240.00 ± 0%                64.00 ± 0%  -73.33% (p=0.000 n=10)
FlattenKeyLabels/three-segments-no-labels-16                         224.00 ± 0%                32.00 ± 0%  -85.71% (p=0.000 n=10)
FlattenKeyLabels/three-segments-one-label-16                         312.00 ± 0%                40.00 ± 0%  -87.18% (p=0.000 n=10)
FlattenKeyLabels/five-segments-three-labels-16                        584.0 ± 0%                152.0 ± 0%  -73.97% (p=0.000 n=10)
FlattenKeyLabels/ten-segments-five-labels-16                          832.0 ± 0%                368.0 ± 0%  -55.77% (p=0.000 n=10)
_GlobalMetrics_Direct/direct-16                                       0.000 ± 0%                0.000 ± 0%        ~ (p=1.000 n=10) ¹
_GlobalMetrics_Direct/atomic.Value-16                                 0.000 ± 0%                0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                                                          ²                          -72.37%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                                               │ /Users/mkeeler/go-metrics.old.txt │ /Users/mkeeler/go-metrics.no-space-replacer.txt │
                                               │             allocs/op             │       allocs/op        vs base                  │
FlattenKey/three-segments-16                                          4.000 ± 0%                1.000 ± 0%  -75.00% (p=0.000 n=10)
FlattenKey/five-segments-16                                           4.000 ± 0%                1.000 ± 0%  -75.00% (p=0.000 n=10)
FlattenKey/ten-segments-16                                            4.000 ± 0%                1.000 ± 0%  -75.00% (p=0.000 n=10)
FlattenKeyLabels/three-segments-no-labels-16                          7.000 ± 0%                2.000 ± 0%  -71.43% (p=0.000 n=10)
FlattenKeyLabels/three-segments-one-label-16                         11.000 ± 0%                2.000 ± 0%  -81.82% (p=0.000 n=10)
FlattenKeyLabels/five-segments-three-labels-16                       18.000 ± 0%                3.000 ± 0%  -83.33% (p=0.000 n=10)
FlattenKeyLabels/ten-segments-five-labels-16                         23.000 ± 0%                4.000 ± 0%  -82.61% (p=0.000 n=10)
_GlobalMetrics_Direct/direct-16                                       0.000 ± 0%                0.000 ± 0%        ~ (p=1.000 n=10) ¹
_GlobalMetrics_Direct/atomic.Value-16                                 0.000 ± 0%                0.000 ± 0%        ~ (p=1.000 n=10) ¹
geomean                                                                          ²                          -69.40%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

I did these optimizations in 3 parts:

Eliminate `fmt.Sprintf` in `flattenKeyLabels`:

This drastically reduces allocations when there are lots of labels and had a 15-30% CPU usage reduction when labels were used. The more labels used the more drastic the reduction.

Simplify `flattenKey`:

Here I got rid of the temporary buffer and string replacer and put the code into the final state of this commit. This reduced the CPU utilization for `flattenKey` by 50% and allocations by 75%. The impact on `flattenKeyLabels` was a little less pronounced as most of the CPU is actually in label processing. Even so the reductions were 25-40% for CPU usage and 37-42% for allocations.

Eliminate the space replacer and call `strings.Replace` just once:

Within the label processing loop we were using the space replacer to write modified bytes out to the buffer. I instead swapped this for direct buffer writes and a single call to strings.Replace at the end. This resulted in another 33-47% CPU reduction for the function and 20-50% less allocations.
@jrasell jrasell requested a review from a team December 18, 2024 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant