-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider run time performance #40
Comments
As another datapoint, there are some some benchmarks in https://github.com/wojas/genericr and the results there don't look too bad:
Note that this benchmarks the logr interface and genericr without actual logging to a backend, so just the overhead introduced by the abstraction. These do not include more complicated objects that could escape to the heap. I will add those when I have time. |
I am sure that some of the overhead can be mitigated by good intra-impl
optimization and some will be endemic to the API style. Concretely, high
V() level calls which are almost always skipped still build the alice of
variadic args.
Russ Cox suggested that maybe making the return of V() be a concrete type
could elide that.
E.g. totally not compiled or tried, something like:
`V()` returns a `logr.VLogger`, which is a struct carrying `enabled bool`
and a `logr.Logger`, and implements logr.Logger itself. All of the
standard methods first check `enabled`. That doesn't seem like a win
(because the variadic args are still passed) but he suggested the inliner
can do better and probably not do the variadics until after the `enabled`
check. I have not yet made time to prove this out. Of course that WOULD
be a breaking API change for implementations (but not for callers).
Russ had some other ideas which I will condense here when I get some time.
…On Thu, Apr 1, 2021 at 3:36 AM wojas ***@***.***> wrote:
As another datapoint, there are some some benchmarks in
https://github.com/wojas/genericr and the results there don't look too
bad:
goos: darwin
goarch: amd64
pkg: github.com/wojas/genericr
cpu: Intel(R) Xeon(R) W-2150B CPU @ 3.00GHz
BenchmarkLogger_basic-20 37195687 31.29 ns/op 0 B/op 0 allocs/op
BenchmarkLogger_basic_with_caller-20 1640784 716.2 ns/op 216 B/op 2 allocs/op
BenchmarkLogger_2vars-20 19593140 61.64 ns/op 64 B/op 1 allocs/op
BenchmarkLogger_clone-20 25827868 43.97 ns/op 0 B/op 0 allocs/op
BenchmarkLogger_complicated-20 3593434 331.9 ns/op 432 B/op 6 allocs/op
BenchmarkLogger_complicated_precalculated-20 8682727 139.2 ns/op 160 B/op 2 allocs/op
BenchmarkLogger_2vars_tostring-20 975183 1276 ns/op 600 B/op 13 allocs/op
PASS
ok github.com/wojas/genericr 10.014s
Note that this benchmarks the logr interface and genericr without actual
logging to a backend, so just the overhead introduced by the abstraction.
These do not include more complicated objects that could escape to the
heap. I will add those when I have time.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#40 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVF2Q62IYXLUAIEVFQ3TGREEJANCNFSM4Z4C345A>
.
|
I spent some time playing with it tonight. Part of the problem is glogr making copies. I implemented as Russ suggested and the performance does seem better - all of the overhead is in the variadic args. The problem is that it is a significant change in the API - I made a big mess, so I will have to clean it up before I can share it properly :) |
So just to wrap my head around this, the main goal here is to make log calls as cheap as possible if the actual logging at that level is disabled. I think that's a goal worthy of some breakage before a 1.0. The concrete struct result for PR #42 is a bigger breaking change, completely replacing the interface by a concrete struct, but it may have other performance benefits. From a first look it is similar to what genericr is doing under the hood. |
Yes. Other wins are possible, but this is a big one.
I agree :)
Without #42:
With #42:
So, weirdly, everything got slower except calls through V(). I guess that has to do with the extra variadic-slice pack/unpack. Changing Info/Error to pass the slice (instead of the I'll have to make time to disassemble it all and see what I can find. |
Switching benchmark to Discard(): Before:
After:
|
Hmm, I think a big part of this is the benchmark benefitting from optimizations that probably are not realistic in the wild. I'll look into it more later |
Forcing noinline in the benchmark makes it more representative. Before:
After:
Notably I will make a new push with this change. |
If we want to push this to 1.0 (#38), we really need to make some intentional decisions about performance. The API as it stands was designed largely without considering performance, and (surprise!) it shows.
glogr_benchmark_test.go:
Running this:
So it's notably slower. All of the variadic args escape to the heap, including the string keys (which regular glog does not suffer). But doesn't account for enough.
V()
calls that are not taken also expand all their variadic args used.Some of this is attributable to glogr being very dumb and wasteful (clone() on every call) but it's not clear how much. Before we call it 1.0 we need to do some homework.
The text was updated successfully, but these errors were encountered: