Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of WithLabelValues(...) #1360

Merged

Conversation

colega
Copy link
Contributor

@colega colega commented Oct 4, 2023

The slice with variadic arguments passed to MetricVec.WithLabelValues() was escaping to heap. This change fixes that by performing a copy of the slice before passing it to fmt.Errorf(), which is where the slice was escaping. This keeps the hot path without that allocation.

Meaningful benchmark results (skipping ~0 CPU and 0 alloc ones):

                                                               │    old.txt    │               new.txt                │
                                                               │    sec/op     │    sec/op     vs base                │
Counter/With_Label_Values-16                                     108.00n ±  6%   58.06n ±  1%  -46.24% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                       174.5n ± 15%   136.8n ±  6%  -21.63% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                               309.3n ± 12%   172.9n ±  1%  -44.08% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16                591.5n ± 11%   418.9n ±  3%  -29.17% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                             212.9n ± 10%   116.8n ± 23%  -45.16% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16              406.2n ± 14%   275.1n ±  4%  -32.30% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                               85.45n ±  2%   89.09n ±  2%   +4.26% (p=0.003 n=10)

                                                               │    old.txt     │                  new.txt                  │
                                                               │      B/op      │     B/op      vs base                     │
Counter/With_Label_Values-16                                       48.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                        96.00 ± 0%       48.00 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                                144.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16                 288.0 ± 0%       144.0 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                              96.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16              192.00 ± 0%       96.00 ± 0%   -50.00% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                                48.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)

                                                               │   old.txt    │                 new.txt                 │
                                                               │  allocs/op   │ allocs/op   vs base                     │
Counter/With_Label_Values-16                                     1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                      2.000 ± 0%     1.000 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                              3.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16               6.000 ± 0%     3.000 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                            2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16             4.000 ± 0%     2.000 ± 0%   -50.00% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                              1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)

I was surprised by the fact that BenchmarkCounterWithLabelValuesConcurrent didn't see a CPU improvement, regardless the fact that it removes an allocation. I ran it with -count=25 with the same result 🤷:

goos: linux
goarch: amd64
pkg: github.com/prometheus/client_golang/prometheus
cpu: 11th Gen Intel(R) Core(TM) i7-11700K @ 3.60GHz
                                    │ concurrent.old.txt │         concurrent.new.txt         │
                                    │       sec/op       │   sec/op     vs base               │
CounterWithLabelValuesConcurrent-16          84.46n ± 0%   88.59n ± 1%  +4.89% (p=0.000 n=25)

                                    │ concurrent.old.txt │         concurrent.new.txt         │
                                    │        B/op        │   B/op     vs base                 │
CounterWithLabelValuesConcurrent-16           48.00 ± 0%   0.00 ± 0%  -100.00% (p=0.000 n=25)

                                    │ concurrent.old.txt │         concurrent.new.txt          │
                                    │     allocs/op      │ allocs/op   vs base                 │
CounterWithLabelValuesConcurrent-16           1.000 ± 0%   0.000 ± 0%  -100.00% (p=0.000 n=25)

It's worth noting that some of the runs in this benchmark take ~50ns, while others take ~85ns.

Thanks @bboreham on the pointer on how to remove the allocation here.

The slice with variadic arguments passed to MetricVec.WithLabelValues()
was escaping to heap. This change fixes that by performing a copy of the
slice before passing it to fmt.Errorf(), which is where the slice was
escaping. This keeps the hot path without that allocation.

Meaningful benchmark results (skipping ~0 CPU and 0 alloc ones):

                                                               │    old.txt    │               new.txt                │
                                                               │    sec/op     │    sec/op     vs base                │
Counter/With_Label_Values-16                                     108.00n ±  6%   58.06n ±  1%  -46.24% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                       174.5n ± 15%   136.8n ±  6%  -21.63% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                               309.3n ± 12%   172.9n ±  1%  -44.08% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16                591.5n ± 11%   418.9n ±  3%  -29.17% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                             212.9n ± 10%   116.8n ± 23%  -45.16% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16              406.2n ± 14%   275.1n ±  4%  -32.30% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                               85.45n ±  2%   89.09n ±  2%   +4.26% (p=0.003 n=10)

                                                               │    old.txt     │                  new.txt                  │
                                                               │      B/op      │     B/op      vs base                     │
Counter/With_Label_Values-16                                       48.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                        96.00 ± 0%       48.00 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                                144.0 ± 0%         0.0 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16                 288.0 ± 0%       144.0 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                              96.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16              192.00 ± 0%       96.00 ± 0%   -50.00% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                                48.00 ± 0%        0.00 ± 0%  -100.00% (p=0.000 n=10)

                                                               │   old.txt    │                 new.txt                 │
                                                               │  allocs/op   │ allocs/op   vs base                     │
Counter/With_Label_Values-16                                     1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_Label_Values_and_Constraint-16                      2.000 ± 0%     1.000 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_triple_Label_Values-16                              3.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_triple_Label_Values_and_Constraint-16               6.000 ± 0%     3.000 ± 0%   -50.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values-16                            2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
Counter/With_repeated_Label_Values_and_Constraint-16             4.000 ± 0%     2.000 ± 0%   -50.00% (p=0.000 n=10)
CounterWithLabelValuesConcurrent-16                              1.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Copy link
Member

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit inexperienced when it comes to go performance

Could you explain where exactly vals escapes? 🙂 and why if I'm not asking for too much :P

@colega
Copy link
Contributor Author

colega commented Oct 5, 2023

There's a fmt.Errorf(msg string, args ...interface{}) call on the next line, and we're passing vals as one of those args. Since a slice can't fit into the "value" part of the interface tuple (type, value), it escapes to the heap and a pointer to that slice is stored instead.

A longer explanation after some googling about this topic: https://stackoverflow.com/questions/39492539/go-implicit-conversion-to-interface-does-memory-allocation

@bboreham
Copy link
Member

bboreham commented Oct 5, 2023

In escape analysis, the compiler tries to prove that a pointer to data does not get stored somewhere such that it lives on past the return of the function it was created in.

Interfaces defeat this analysis, since the code that implements a method could come from anywhere in the program.
Escape analysis is conservative - if there is any chance of an escape, that's enough.
So, by and large, any time you see interface the value is going to escape.

Oleg's change makes a deliberate copy before the slice is passed as an interface{}, so now the compiler can prove that the original value does not escape.

@ArthurSens
Copy link
Member

Thank you both very much for the explanation and contribution ❤️

@ArthurSens ArthurSens merged commit 51714a5 into prometheus:main Oct 5, 2023
8 checks passed
@@ -165,6 +165,8 @@ func validateValuesInLabels(labels Labels, expectedNumberOfValues int) error {

func validateLabelValues(vals []string, expectedNumberOfValues int) error {
if len(vals) != expectedNumberOfValues {
// The call below makes vals escape, copy them to avoid that.
vals := append([]string(nil), vals...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		vals := vals

Would this have the same effect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I don't know. Try it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would work, as the slice still references the array that we've allocated on stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we already receive a copy of vals when this function is called, so this doesn't change anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, thanks again!

@colega colega deleted the improve-performance-of-with-label-values branch December 4, 2023 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants