-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Any stack usage #1301
Improve Any stack usage #1301
Conversation
By doing this unsafe approach we are able to save one call to morestack.
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. Before: ``` ❯ go test -bench BenchmarkAny -benchmem -run Any goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAnyInGoroutine/any-12 3977013 316.1 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/int-12 4172178 289.6 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/goroutine-12 5018606 253.1 ns/op 0 B/op 0 allocs/op BenchmarkAnyInGoroutine/int-in-go-12 2167634 561.5 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-12 1875784 637.6 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/int-in-go-with-stack-12 2500544 446.7 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-with-stack-12 1769258 653.0 ns/op 88 B/op 2 allocs/op PASS ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run Any goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAnyInGoroutine/any-12 3967261 288.2 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/int-12 4571604 260.6 ns/op 64 B/op 1 allocs/op BenchmarkAnyInGoroutine/goroutine-12 5481904 216.4 ns/op 0 B/op 0 allocs/op BenchmarkAnyInGoroutine/int-in-go-12 2097002 583.1 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-12 2164167 551.6 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/int-in-go-with-stack-12 2989617 378.2 ns/op 88 B/op 2 allocs/op BenchmarkAnyInGoroutine/any-in-go-with-stack-12 3123987 381.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 11.486s ``` 10 runs. ``` ❯ benchstat before.txt after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 290.0n ± 1% -5.00% (p=0.000 n=10) AnyInGoroutine/int-12 288.0n ± 0% 265.5n ± 1% -7.85% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 216.9n ± 2% ~ (p=0.469 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 578.5n ± 0% -2.40% (p=0.001 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 557.8n ± 1% -16.30% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 470.3n ± 6% ~ (p=0.631 n=10) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 475.9n ± 3% -22.95% (p=0.000 n=10) geomean 417.8n 382.8n -8.36% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
Codecov Report
@@ Coverage Diff @@
## master #1301 +/- ##
==========================================
+ Coverage 98.08% 98.15% +0.07%
==========================================
Files 50 50
Lines 3242 3369 +127
==========================================
+ Hits 3180 3307 +127
Misses 53 53
Partials 9 9
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@sywhang Is it really true? Imagine you have 2 endpoints, one uses zap.Any the other doesn't. So one has a 2KB the other one 8KB stack. Imagine we have 75/25 traffic (25% to zap.Any). Then the average stack size in this case is going to be 4KB. So 25% of the traffic would need to allocate more stack. |
@cdvr1993 That's assuming each request is served by a fresh new goroutine though, right? |
Yes, the way some rpc frameworks work. For instance, gRPC: https://github.com/grpc/grpc-go/blob/master/server.go#L996
|
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before.txt ~/after-after.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ sec/op │ sec/op vs base │ AnyInGoroutine/any-12 305.2n ± 3% 297.0n ± 0% ~ (p=0.085 n=10) AnyInGoroutine/int-12 288.0n ± 0% 270.1n ± 1% -6.25% (p=0.000 n=10) AnyInGoroutine/goroutine-12 218.3n ± 6% 209.5n ± 5% -4.05% (p=0.015 n=10) AnyInGoroutine/int-in-go-12 592.7n ± 2% 573.9n ± 2% -3.17% (p=0.000 n=10) AnyInGoroutine/any-in-go-12 666.5n ± 4% 552.3n ± 1% -17.13% (p=0.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 474.4n ± 4% 459.4n ± 6% ~ (p=0.447 n=10+9) AnyInGoroutine/any-in-go-with-stack-12 617.6n ± 4% 468.8n ± 4% -24.09% (p=0.000 n=10) geomean 417.8n 380.1n -9.01% │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ B/op │ B/op vs base │ AnyInGoroutine/any-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) AnyInGoroutine/int-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before.txt │ /Users/pawel/after-after.txt │ │ allocs/op │ allocs/op vs base │ AnyInGoroutine/any-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/goroutine-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ AnyInGoroutine/int-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10+9) ¹ AnyInGoroutine/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean ```
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ```
I cleaned up a little bit the usage of go directives and unsafe usage: Stack is now 280 bytes vs 4856 bytes that we initially had. Unsafe usage only remains for the interface parameter because we need a common type for it. |
We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
Big +1, this is a huge amount of unsafe. If we end up needing this sort of unsafe usage for performance reasons, I don't think it should be in the default build, it should be under a separate build tag like |
@prashantv @abhinav what do you think of creating a second Any function, like AnyUnsafe(), that way it is up to the users to use the unsafe version. |
First, let me say @cdvr1993, this is both, impressive debugging and nice workaround. I think we neglected to mention that in the initial reaction to the unsafe code. The idea of "UnsafeAny" is more digestible, if a bit distasteful. I would still prefer to avoid it. This is all a lot of debt to work around an (AFAIK) language issue. Please understand that part of the reason for my push back is that besides just the inherent risk of unsafe code, this makes it risky and difficult for anyone else to touch this code. The level of scrutiny necessary for every change here will have to be a lot higher, which is additional burden on maintainers. If the unsafe option was the only way (and that's not yet settled), there are some mitigations that, in my opinion, would make the unsafe code more digestible:
Again, this is just my opinion. Other maintainers, feel free to chime in. |
Ok, just for the sake of showing how this can be cleaned up a bit, I created: #1304 |
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
…ion) This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. - #1305 - my own PR that inline bunch of loops - https://github.com/uber-go/zap/compare/pawel/any-int5 that does the same as above, but is slightly easier to parse - #1307 - a reflect.TypeOf lookup version THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to get reviewers thoughts before typing everything in. I originally thought we can use a `type props strucy` intermediary struct to store the data, but that hits the same problem: every `props` would get it's own slot on the stack. This avoids this by returning the raw data. Pros: - the implementation is shared between `Any` and strongly typed Fields - no reflect or unsafe - reduced the stack significantly - we should be able to get to the same ~180 bytes as ~1305. - no peft penalty for strongly typed versions, at least on ARM64 it's compiled away. Cons: - the code gets a bit harder to maintain. It's significantly better than #1305 I would say though.
…ion) This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. - #1305 - my own PR that inline bunch of loops - https://github.com/uber-go/zap/compare/pawel/any-int5 that does the same as above, but is slightly easier to parse - #1307 - a reflect.TypeOf lookup version THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to get reviewers thoughts before typing everything in. I originally thought we can use a `type props strucy` intermediary struct to store the data, but that hits the same problem: every `props` would get it's own slot on the stack. This avoids this by returning the raw data. Pros: - the implementation is shared between `Any` and strongly typed Fields - no reflect or unsafe - reduced the stack significantly - we should be able to get to the same ~180 bytes as ~1305. - no peft penalty for strongly typed versions, at least on ARM64 it's compiled away. Cons: - the code gets a bit harder to maintain. It's significantly better than #1305 I would say though.
Yet another attempt at reducing the stack size of zap.Any, borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in uber-go#1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } Stack size: The stack size of zap.Any following this change is 0xc0. % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than uber-go#1305, which is the smallest stack size of all other attempts. Allocations: Everything appears to get inlined with no heap escapes: go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value Maintainability: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. Performance: This is a net improvement on all BenchmarkAny calls except "any-no-logger" which calls `zap.Any` and discards the result. ``` name old time/op new time/op delta Any/str-no-logger-2 8.77ns ± 0% 8.75ns ± 1% ~ (p=0.159 n=4+5) Any/any-no-logger-2 54.1ns ± 0% 81.6ns ± 0% +50.71% (p=0.016 n=5+4) Any/str-with-logger-2 1.38µs ± 3% 1.38µs ± 4% ~ (p=0.841 n=5+5) Any/any-with-logger-2 1.60µs ±22% 1.37µs ± 1% ~ (p=0.151 n=5+5) Any/str-in-go-2 3.41µs ± 1% 3.42µs ± 5% ~ (p=0.905 n=4+5) Any/any-in-go-2 5.98µs ± 1% 3.68µs ± 6% -38.44% (p=0.008 n=5+5) Any/str-in-go-with-stack-2 3.42µs ± 2% 3.46µs ± 3% ~ (p=0.421 n=5+5) Any/any-in-go-with-stack-2 5.98µs ± 3% 3.65µs ± 3% -38.95% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/str-no-logger-2 0.00B 0.00B ~ (all equal) Any/any-no-logger-2 0.00B 0.00B ~ (all equal) Any/str-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/any-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/str-in-go-2 88.5B ± 1% 88.0B ± 0% ~ (p=0.429 n=4+4) Any/any-in-go-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/str-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/any-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/str-no-logger-2 0.00 0.00 ~ (all equal) Any/any-no-logger-2 0.00 0.00 ~ (all equal) Any/str-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/any-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/str-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/str-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) ``` I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - -a "create and long in goroutine in a pre-warmed system" that does the above- we decided it's not worth the complication. The test results are: ``` ❯ go test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 161981473 7.374 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82343354 14.67 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2965648 416.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2920292 418.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1158 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 553144 2152 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 160509367 7.548 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51330402 23.45 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3221404 377.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 2726443 393.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1129 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 558602 2147 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.426s ``` On gotip: ``` ❯ gotip test -bench BenchmarkAny -benchmem -cpu 1 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string/field-only/typed 155084869 7.603 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 82740788 14.55 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 2800495 411.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 2896258 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 1000000 1155 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 551599 2168 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 159505488 7.578 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 51406354 23.78 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 3011210 388.6 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 3010370 395.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 1000000 1161 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 553860 2161 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 19.391s ``` on amd64 (similar, 2x worse stack growth impact) ``` % go test -bench BenchmarkAny -benchmem -cpu 1 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/string/field-only/typed 47534053 25.23 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/field-only/any 36913526 32.57 ns/op 0 B/op 0 allocs/op BenchmarkAny/string/log/typed 1693508 725.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log/any 1576172 765.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string/log-go/typed 516832 2343 ns/op 112 B/op 3 allocs/op BenchmarkAny/string/log-go/any 243692 4404 ns/op 128 B/op 3 allocs/op BenchmarkAny/stringer/field-only/typed 48735537 24.73 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/field-only/any 26115684 47.24 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer/log/typed 1761630 677.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log/any 1646913 705.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer/log-go/typed 534187 2275 ns/op 112 B/op 3 allocs/op BenchmarkAny/stringer/log-go/any 273787 4348 ns/op 128 B/op 3 allocs/op PASS ok go.uber.org/zap 18.890s ```
This is an alternative to #1301 and #1302. It's not as fast as these two options, but it still gives us half the stack reduction without the `unsafe` usage. Interestingly it seems that on both arm64 and amd64 the new code, with the closure, is faster than the plain old switch. We do see a ~5-10ns delay on `Any` creation if it's used without `logger`, but that's minimal and not realistic. Bunch of credit for this goes to @cdvr1993, we started independently, I was about to give up but the conversations pushed me forward. In the end he ended up going into a more advanced land where I dare not to enter. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). This is an alternative to #1301, @cdvr and me were talking about this, and he inspired this idea with the closure. By using a function and a closure we're able to reduce the size and remove the degradation. At least on my laptop, this change result in a new performance gain, as all benchmarks show reduced time. 10 runs. ``` ❯ benchstat ~/before2.txt ~/after2.txt goos: darwin goarch: arm64 pkg: go.uber.org/zap │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger-12 3.344n ± 1% 3.029n ± 1% -9.40% (p=0.000 n=10) Any/any-no-logger-12 13.80n ± 4% 18.67n ± 1% +35.29% (p=0.000 n=10) Any/str-with-logger-12 372.4n ± 3% 363.6n ± 1% -2.35% (p=0.001 n=10) Any/any-with-logger-12 369.2n ± 1% 363.6n ± 1% -1.52% (p=0.002 n=10) Any/str-in-go-12 587.2n ± 2% 587.0n ± 1% ~ (p=0.617 n=10) Any/any-in-go-12 666.5n ± 3% 567.6n ± 1% -14.85% (p=0.000 n=10) Any/str-in-go-with-stack-12 448.6n ± 18% 403.4n ± 13% ~ (p=0.280 n=10) Any/any-in-go-with-stack-12 564.9n ± 7% 443.2n ± 4% -21.55% (p=0.000 n=10) geomean 167.8n 160.7n -4.23% │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ B/op │ B/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 88.00 ± 0% 88.00 ± 0% ~ (p=1.000 n=10) ¹ geomean ² +0.00% ² ¹ all samples are equal ² summaries must be >0 to compute geomean │ /Users/pawel/before2.txt │ /Users/pawel/after2.txt │ │ allocs/op │ allocs/op vs base │ Any/str-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-no-logger-12 0.000 ± 0% 0.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-with-logger-12 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/str-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ Any/any-in-go-with-stack-12 2.000 ± 0% 2.000 ± 0% ~ (p=1.000 n=10) ¹ geomean ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3191725 382.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3159882 367.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 2998960 373.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3264657 361.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3168627 386.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3169394 364.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3271981 368.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3293463 362.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 793905 1388 ns/op 143 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1724048 748.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 2536380 444.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2177941 586.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 890155 1237 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1836302 719.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 3671503 322.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2257405 540.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 811408 1457 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1384990 729.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 3228151 381.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2678596 450.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 821092 1386 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1747638 662.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3747934 341.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2678191 463.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 53.238s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.64 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.65 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 27779637 44.20 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 27881986 42.96 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 27587953 43.39 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 26861058 43.43 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1749990 690.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1807341 660.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1821039 654.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1865083 650.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1677643 741.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1905400 689.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1843364 646.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1899883 645.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 453326 2479 ns/op 92 B/op 2 allocs/op BenchmarkAny/str-in-go-2 724555 1580 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1358790 953.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1805985 585.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 466447 2395 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 874053 1487 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1457768 834.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1795317 632.5 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 407620 2749 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 725614 1597 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1303908 863.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1957864 609.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 497640 2401 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 648355 1549 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1486416 869.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2116040 568.8 ns/op 88 B/op 2 allocs/op PASS ```
This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. This PR reduces the stack size from: ``` field.go:420 0xd16c3 4881ecf8120000 SUBQ $0x12f8, SP // 4856 ``` to ``` field.go:420 0xcb603 4881ecb8000000 SUBQ $0xb8, SP // 184 ``` so by ~96%. More crucially, `zap.Any` is now as fast as correctly typed methods, like `zap.String`, etc. The downside is the (slight) incrase in the code maitenance - we unroll as much as we can and rely on the compiler correctly re-using small variable sizes. While this is not pretty, it feels safe - the changes were purely mechanical. Future changes and extensions should be easy to review. Additionally, the new code is (slightly) faster in all cases since we remove 1-2 function calls from all paths. The "in new goroutine" path is most affected, as shown in benchmarks below. This was largely inspired by conversations with @cdvr1993. We started looking at this in parallel, but I would have given up if it wasn't for our conversations. This particular version was inspired by an earlier version of #1304 - where I realized that @cdvr1993 is doing a similar dispatching mechanism that zap is already doing via `zapcore` - a possible optimization. Longer version: We have identified an issue where zap.Any can cause performance degradation due to stack size. This is apparently cased by the compiler assigning 4.8kb (a zap.Field per arm of the switch statement) for zap.Any on stack. This can result in an unnecessary runtime.newstack+runtime.copystack. A github issue against Go language is pending. This can be particularly bad if `zap.Any` was to be used in a new goroutine, since the default goroutine sizes can be as low as 2kb (it can vary depending on average stack size - see golang/go#18138). *Most crucially, `zap.Any` is now as fast as a direct dispatch like `zap.String`.* 10 runs. ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap │ before-final.txt │ after-final.txt │ │ sec/op │ sec/op vs base │ Any/str-no-logger 3.106n ± 2% 3.160n ± 1% +1.75% (p=0.025 n=10) Any/str-no-logger-2 3.171n ± 4% 3.142n ± 1% ~ (p=0.593 n=10) Any/str-no-logger-4 3.108n ± 3% 3.139n ± 2% +0.97% (p=0.004 n=10) Any/str-no-logger-8 3.099n ± 2% 3.143n ± 2% ~ (p=0.086 n=10) Any/any-no-logger 13.89n ± 2% 12.98n ± 2% -6.59% (p=0.000 n=10) Any/any-no-logger-2 13.97n ± 2% 12.96n ± 2% -7.27% (p=0.000 n=10) Any/any-no-logger-4 13.83n ± 2% 12.89n ± 2% -6.83% (p=0.000 n=10) Any/any-no-logger-8 13.77n ± 2% 12.88n ± 2% -6.43% (p=0.000 n=10) Any/str-with-logger 384.1n ± 2% 383.9n ± 6% ~ (p=0.810 n=10) Any/str-with-logger-2 367.8n ± 2% 368.5n ± 3% ~ (p=0.971 n=10) Any/str-with-logger-4 372.4n ± 2% 368.6n ± 4% ~ (p=0.912 n=10) Any/str-with-logger-8 369.8n ± 3% 368.3n ± 3% ~ (p=0.698 n=10) Any/any-with-logger 383.8n ± 3% 383.3n ± 6% ~ (p=0.838 n=10) Any/any-with-logger-2 370.0n ± 3% 367.6n ± 1% ~ (p=0.239 n=10) Any/any-with-logger-4 370.0n ± 3% 368.2n ± 4% ~ (p=0.631 n=10) Any/any-with-logger-8 367.6n ± 2% 369.7n ± 3% ~ (p=0.756 n=10) Any/str-in-go 1.334µ ± 3% 1.347µ ± 3% ~ (p=0.271 n=10) Any/str-in-go-2 754.5n ± 3% 744.8n ± 5% ~ (p=0.481 n=10) Any/str-in-go-4 420.2n ± 11% 367.7n ± 31% ~ (p=0.086 n=10) Any/str-in-go-8 557.6n ± 4% 547.1n ± 12% ~ (p=0.579 n=10) Any/any-in-go 2.562µ ± 4% 1.447µ ± 3% -43.53% (p=0.000 n=10) Any/any-in-go-2 1361.0n ± 4% 761.4n ± 7% -44.06% (p=0.000 n=10) Any/any-in-go-4 732.1n ± 9% 397.1n ± 11% -45.76% (p=0.000 n=10) Any/any-in-go-8 541.3n ± 13% 564.6n ± 5% +4.30% (p=0.041 n=10) Any/str-in-go-with-stack 1.420µ ± 1% 1.428µ ± 3% ~ (p=0.670 n=10) Any/str-in-go-with-stack-2 749.5n ± 4% 771.8n ± 4% ~ (p=0.123 n=10) Any/str-in-go-with-stack-4 433.2n ± 15% 400.7n ± 14% ~ (p=0.393 n=10) Any/str-in-go-with-stack-8 494.0n ± 7% 490.1n ± 10% ~ (p=0.853 n=10) Any/any-in-go-with-stack 2.586µ ± 3% 1.471µ ± 4% -43.14% (p=0.000 n=10) Any/any-in-go-with-stack-2 1343.0n ± 3% 773.7n ± 4% -42.39% (p=0.000 n=10) Any/any-in-go-with-stack-4 697.7n ± 8% 403.4n ± 9% -42.17% (p=0.000 n=10) Any/any-in-go-with-stack-8 490.8n ± 9% 492.8n ± 8% ~ (p=0.796 n=10) geomean 206.3n 182.9n -11.35% ``` On absolute terms: Before, on arm64: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3154850 387.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3239221 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3273285 363.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3251991 372.4 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 2944020 401.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 2984863 368.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3265248 363.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3301592 365.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 764239 1423 ns/op 140 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1510189 753.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3013986 369.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2128927 540.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 464083 2551 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 818104 1347 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1587925 698.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2452558 466.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 767626 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1534382 771.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2384058 433.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 3146942 450.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 434194 2524 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 851312 1304 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1570944 710.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2546115 604.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.238s ``` After: ``` ❯ go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/str-with-logger 3202051 382.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 3301683 371.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 3186028 364.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 3061030 371.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 3203704 378.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 3281462 372.8 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 3252879 371.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 3246148 373.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 804132 1404 ns/op 133 B/op 2 allocs/op BenchmarkAny/str-in-go-2 1686093 758.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 3075596 430.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 2101650 543.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 845822 1424 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 1531311 736.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 2618665 464.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2130280 536.2 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 818583 1440 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 1533379 739.4 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 2507131 399.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2348804 453.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 807199 1526 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 1590476 783.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 3026263 383.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 2615467 493.8 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.077s ``` And amd64, before: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.58 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.52 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.56 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.50 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 39399811 30.35 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 39448304 30.63 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 39647024 30.32 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 39479619 30.46 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1798702 669.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1862551 647.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1848636 642.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1878465 656.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1776140 684.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1868102 668.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1869589 639.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1782540 648.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 458112 2594 ns/op 91 B/op 2 allocs/op BenchmarkAny/str-in-go-2 820398 1344 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1392148 969.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1790403 644.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 220327 4897 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 494391 2701 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 823672 1399 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 1591206 746.8 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 384094 2820 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 809073 1530 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1464598 933.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 1943251 578.0 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 233019 4967 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 356689 2848 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 791342 1385 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1662126 746.0 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 51.671s ``` After: ``` % go test -bench BenchmarkAny -benchmem -run errs -cpu 1,2,4,8 goos: linux goarch: amd64 pkg: go.uber.org/zap cpu: AMD EPYC 7B13 BenchmarkAny/str-no-logger 100000000 11.77 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-2 100000000 11.75 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-4 100000000 11.76 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-no-logger-8 100000000 11.69 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger 49795383 24.33 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-2 48821454 24.31 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-4 49452686 24.79 ns/op 0 B/op 0 allocs/op BenchmarkAny/any-no-logger-8 49359926 24.26 ns/op 0 B/op 0 allocs/op BenchmarkAny/str-with-logger 1808188 700.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-2 1894179 643.9 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-4 1858263 649.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-with-logger-8 1879894 645.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger 1817276 663.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-2 1906438 637.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-4 1837354 641.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/any-with-logger-8 1909658 648.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/str-in-go 468484 2463 ns/op 96 B/op 2 allocs/op BenchmarkAny/str-in-go-2 726475 1465 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-4 1285284 958.9 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-8 1746547 573.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go 426568 2715 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-2 611106 1703 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-4 1000000 1017 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-8 2220459 625.7 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack 429721 2673 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-2 637306 1593 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-4 1301713 902.1 ns/op 88 B/op 2 allocs/op BenchmarkAny/str-in-go-with-stack-8 2012583 651.6 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack 391810 2833 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-2 675589 1639 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-4 1219318 970.3 ns/op 88 B/op 2 allocs/op BenchmarkAny/any-in-go-with-stack-8 1825632 574.6 ns/op 88 B/op 2 allocs/op PASS ok go.uber.org/zap 50.294s ```
…ion) This is an alternative to: - #1301 and #1302 and #1304 - a series of PRs that are faster than this one. However, they rely on unsafe. - #1303 - my own PR that uses closures, to reduce the stack size by 60%. - #1305 - my own PR that inline bunch of loops - https://github.com/uber-go/zap/compare/pawel/any-int5 that does the same as above, but is slightly easier to parse - #1307 - a reflect.TypeOf lookup version THIS PR IS INCOMPLETE - it shows a possible approach, but I wanted to get reviewers thoughts before typing everything in. I originally thought we can use a `type props strucy` intermediary struct to store the data, but that hits the same problem: every `props` would get it's own slot on the stack. This avoids this by returning the raw data. Pros: - the implementation is shared between `Any` and strongly typed Fields - no reflect or unsafe - reduced the stack significantly - we should be able to get to the same ~180 bytes as ~1305. - no peft penalty for strongly typed versions, at least on ARM64 it's compiled away. Cons: - the code gets a bit harder to maintain. It's significantly better than #1305 I would say though.
Yet another attempt at reducing the stack size of zap.Any, borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in uber-go#1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } Stack size: The stack size of zap.Any following this change is 0xc0. % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than uber-go#1305, which is the smallest stack size of all other attempts. Allocations: Everything appears to get inlined with no heap escapes: go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value Maintainability: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. Performance: This is a net improvement on all BenchmarkAny calls except "any-no-logger" which calls `zap.Any` and discards the result. ``` name old time/op new time/op delta Any/str-no-logger-2 8.77ns ± 0% 8.75ns ± 1% ~ (p=0.159 n=4+5) Any/any-no-logger-2 54.1ns ± 0% 81.6ns ± 0% +50.71% (p=0.016 n=5+4) Any/str-with-logger-2 1.38µs ± 3% 1.38µs ± 4% ~ (p=0.841 n=5+5) Any/any-with-logger-2 1.60µs ±22% 1.37µs ± 1% ~ (p=0.151 n=5+5) Any/str-in-go-2 3.41µs ± 1% 3.42µs ± 5% ~ (p=0.905 n=4+5) Any/any-in-go-2 5.98µs ± 1% 3.68µs ± 6% -38.44% (p=0.008 n=5+5) Any/str-in-go-with-stack-2 3.42µs ± 2% 3.46µs ± 3% ~ (p=0.421 n=5+5) Any/any-in-go-with-stack-2 5.98µs ± 3% 3.65µs ± 3% -38.95% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/str-no-logger-2 0.00B 0.00B ~ (all equal) Any/any-no-logger-2 0.00B 0.00B ~ (all equal) Any/str-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/any-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/str-in-go-2 88.5B ± 1% 88.0B ± 0% ~ (p=0.429 n=4+4) Any/any-in-go-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/str-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/any-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/str-no-logger-2 0.00 0.00 ~ (all equal) Any/any-no-logger-2 0.00 0.00 ~ (all equal) Any/str-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/any-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/str-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/str-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) ``` I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
Yet another attempt at reducing the stack size of zap.Any, borrowing from uber-go#1301, uber-go#1303, uber-go#1304, uber-go#1305, uber-go#1307, and 1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in uber-go#1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } Stack size: The stack size of zap.Any following this change is 0xc0. % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than uber-go#1305, which is the smallest stack size of all other attempts. Allocations: Everything appears to get inlined with no heap escapes: go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value Maintainability: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. Performance: This is a net improvement on all BenchmarkAny calls except "any-no-logger" which calls `zap.Any` and discards the result. ``` name old time/op new time/op delta Any/str-no-logger-2 8.77ns ± 0% 8.75ns ± 1% ~ (p=0.159 n=4+5) Any/any-no-logger-2 54.1ns ± 0% 81.6ns ± 0% +50.71% (p=0.016 n=5+4) Any/str-with-logger-2 1.38µs ± 3% 1.38µs ± 4% ~ (p=0.841 n=5+5) Any/any-with-logger-2 1.60µs ±22% 1.37µs ± 1% ~ (p=0.151 n=5+5) Any/str-in-go-2 3.41µs ± 1% 3.42µs ± 5% ~ (p=0.905 n=4+5) Any/any-in-go-2 5.98µs ± 1% 3.68µs ± 6% -38.44% (p=0.008 n=5+5) Any/str-in-go-with-stack-2 3.42µs ± 2% 3.46µs ± 3% ~ (p=0.421 n=5+5) Any/any-in-go-with-stack-2 5.98µs ± 3% 3.65µs ± 3% -38.95% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/str-no-logger-2 0.00B 0.00B ~ (all equal) Any/any-no-logger-2 0.00B 0.00B ~ (all equal) Any/str-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/any-with-logger-2 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/str-in-go-2 88.5B ± 1% 88.0B ± 0% ~ (p=0.429 n=4+4) Any/any-in-go-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/str-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) Any/any-in-go-with-stack-2 88.0B ± 0% 88.0B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/str-no-logger-2 0.00 0.00 ~ (all equal) Any/any-no-logger-2 0.00 0.00 ~ (all equal) Any/str-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/any-with-logger-2 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/str-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/str-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) Any/any-in-go-with-stack-2 2.00 ± 0% 2.00 ± 0% ~ (all equal) ``` I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stack size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ```
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
This is a prefactor for #1301, #1302, #1304, #1305, #1307, #1308 and #1310. We're writing various approaches to reduce the stock size and it's painful to keep copy-pasting the tests between PRs. This was suggested in @prashantv in #1307. The tests are mostly based on tests in #1303, but made "more generic", as #1307 we might want to test across more than just a single type. It does make the tests a bit harder to setup. Some of the setup is inconvenient (we duplicate the value in both `typed` and `any` version of the tests) but hopefully okay to understand. A fully non-duplicated alternative would likely require something like #1310 itself. For #1307 in particular a test against interface type would likely be needed, so adding it here too. The tests compare two code paths, with the same arguments, one using a strongly typed method and second using `zap.Any`. We have: - a simple "create field" case for a baseline - a "create and log" case for a realistic case (we typically log the fields) - a "create and log in a goroutine" case for the pathological case we're trying to solve for. - a "create and long in goroutine in a pre-warmed system" that does the above, but before tries to affect the starting goroutine stack size to provide an realistic example. Without this, for any tests with 2+ goroutines, the cost of `zap.Any` is not visible, as we always end up expanding the stack even in the strongly typed methods. The test results are: ``` goos: darwin goarch: arm64 pkg: go.uber.org/zap BenchmarkAny/string-typ-no-logger 166879518 6.988 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-no-logger-12 167398297 6.973 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger 87669631 13.97 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-any-no-logger-12 86760837 14.11 ns/op 0 B/op 0 allocs/op BenchmarkAny/string-typ-logger 3059485 395.5 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-12 3141176 379.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger 2995699 401.3 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-any-logger-12 3071046 391.1 ns/op 64 B/op 1 allocs/op BenchmarkAny/string-typ-logger-go 784323 1351 ns/op 146 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-12 2000835 613.9 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go 477486 2479 ns/op 117 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-12 1830955 680.0 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack 841566 1328 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-typ-logger-go-stack-12 2625226 479.6 ns/op 96 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack 486084 2493 ns/op 112 B/op 2 allocs/op BenchmarkAny/string-any-logger-go-stack-12 2658640 667.9 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-no-logger 147314238 8.034 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-no-logger-12 157857937 7.436 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger 58872349 20.19 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-any-no-logger-12 60532305 20.27 ns/op 0 B/op 0 allocs/op BenchmarkAny/stringer-typ-logger 3094204 411.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-12 3163489 383.7 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger 2981940 427.2 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-any-logger-12 2777792 394.0 ns/op 64 B/op 1 allocs/op BenchmarkAny/stringer-typ-logger-go 911761 1335 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-12 2006440 605.2 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go 467934 2518 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-12 1786076 683.1 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack 855794 1316 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-typ-logger-go-stack-12 2598783 434.5 ns/op 96 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack 473282 2474 ns/op 112 B/op 2 allocs/op BenchmarkAny/stringer-any-logger-go-stack-12 2020183 651.9 ns/op 112 B/op 2 allocs/op PASS ok go.uber.org/zap 53.516s ``` reviews rename review farewell, my friend dummy
Yet another attempt at reducing the stack size of zap.Any, borrowing from #1301, #1303, #1304, #1305, #1307, and #1308. This approach defines a generic data type for field constructors of a specific type. This is similar to the lookup map in #1307, minus the map lookup, the interface match, or reflection. type anyFieldC[T any] func(string, T) Field The generic data type provides a non-generic method matching the interface: interface{ Any(string, any) Field } **Stack size**: The stack size of zap.Any following this change is 0xc0 (192 bytes). % go build -gcflags -S 2>&1 | grep ^go.uber.org/zap.Any go.uber.org/zap.Any STEXT size=5861 args=0x20 locals=0xc0 funcid=0x0 align=0x0 This is just 8 bytes more than #1305, which is the smallest stack size of all other attempts. **Allocations**: Everything appears to get inlined with no heap escapes: % go build -gcflags -m 2>&1 | grep field.go | perl -n -e 'next unless m{^./field.go:(\d+)}; print if ($1 >= 413)' | grep 'escapes' [no output] (Line 413 declares anyFieldC) Besides that, the output of `-m` for the relevant section of code consists of almost entirely: ./field.go:415:6: can inline anyFieldC[go.shape.bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.[]bool].Any ./field.go:415:6: can inline anyFieldC[go.shape.complex128].Any [...] ./field.go:415:6: inlining call to anyFieldC[go.shape.complex128].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.[]bool].Any ./field.go:415:6: inlining call to anyFieldC[go.shape.bool].Any Followed by: ./field.go:428:10: leaking param: key ./field.go:428:22: leaking param: value **Maintainability**: Unlike some of the other approaches, this variant is more maintainable. The `zap.Any` function looks roughly the same. Adding new branches there is obvious, and requires no duplication. **Performance**: This is a net improvement against master on BenchmarkAny's log-go checks that log inside a new goroutine. ``` name old time/op new time/op delta Any/string/field-only/typed 25.2ns ± 1% 25.6ns ± 2% ~ (p=0.460 n=5+5) Any/string/field-only/any 56.9ns ± 3% 79.4ns ± 0% +39.55% (p=0.008 n=5+5) Any/string/log/typed 1.47µs ± 0% 1.49µs ± 4% +1.58% (p=0.016 n=4+5) Any/string/log/any 1.53µs ± 2% 1.55µs ± 1% +1.37% (p=0.016 n=5+5) Any/string/log-go/typed 5.97µs ± 6% 5.99µs ± 1% ~ (p=0.151 n=5+5) Any/string/log-go/any 10.9µs ± 0% 6.2µs ± 0% -43.32% (p=0.008 n=5+5) Any/stringer/field-only/typed 25.3ns ± 1% 25.5ns ± 1% +1.09% (p=0.008 n=5+5) Any/stringer/field-only/any 85.5ns ± 1% 124.5ns ± 0% +45.66% (p=0.008 n=5+5) Any/stringer/log/typed 1.43µs ± 1% 1.42µs ± 2% ~ (p=0.175 n=4+5) Any/stringer/log/any 1.50µs ± 1% 1.56µs ± 6% +4.20% (p=0.008 n=5+5) Any/stringer/log-go/typed 5.94µs ± 0% 5.92µs ± 0% -0.40% (p=0.032 n=5+5) Any/stringer/log-go/any 11.1µs ± 2% 6.3µs ± 0% -42.93% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Any/string/field-only/typed 0.00B 0.00B ~ (all equal) Any/string/field-only/any 0.00B 0.00B ~ (all equal) Any/string/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/string/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/string/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00B 0.00B ~ (all equal) Any/stringer/field-only/any 0.00B 0.00B ~ (all equal) Any/stringer/log/typed 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log/any 64.0B ± 0% 64.0B ± 0% ~ (all equal) Any/stringer/log-go/typed 112B ± 0% 112B ± 0% ~ (all equal) Any/stringer/log-go/any 128B ± 0% 128B ± 0% ~ (all equal) name old allocs/op new allocs/op delta Any/string/field-only/typed 0.00 0.00 ~ (all equal) Any/string/field-only/any 0.00 0.00 ~ (all equal) Any/string/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/string/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/string/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/field-only/typed 0.00 0.00 ~ (all equal) Any/stringer/field-only/any 0.00 0.00 ~ (all equal) Any/stringer/log/typed 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log/any 1.00 ± 0% 1.00 ± 0% ~ (all equal) Any/stringer/log-go/typed 3.00 ± 0% 3.00 ± 0% ~ (all equal) Any/stringer/log-go/any 3.00 ± 0% 3.00 ± 0% ~ (all equal) ``` It causes a regression in "field-only" which calls the field constructor and discards the result without using it in a logger. I believe this is acceptable because that's not a real use case; we expect the result to be used with a logger.
By doing this unsafe approach we are able to save one call to morestack.
The benchmark results are as follows:
The assembly shows the following change in stack usage:
So this is ~5x reduction in stack usage.
Additionally I did a cpu profile, and there I found way more usage from the old approach on high core core (32 cores):
Not sure why the time is the same but the profile isn't, only guess is that the goroutine scheduler is adding that latency?
Alternatives
Just fyi. I tried storing each switch statement in a function variable to avoid the compiler to allocate space for each function call, but by doing that the escape analyzer decides that the variables need to escape to the heap and we got the following results:
As you can see:
Stack usage became:
This would be a reduction of >50x but with the cost of 2 more heap allocations.