Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proto: disable XXX_NoUnkeyedLiteral and XXX_sizecache fields #37706

Closed
nvanbenschoten opened this issue May 21, 2019 · 1 comment · Fixed by #38404
Closed

proto: disable XXX_NoUnkeyedLiteral and XXX_sizecache fields #37706

nvanbenschoten opened this issue May 21, 2019 · 1 comment · Fixed by #38404
Assignees
Labels
C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior.

Comments

@nvanbenschoten
Copy link
Member

For the reasons listed in the last two commits of the following branch, we would like to disable the autogenerated XXX_NoUnkeyedLiteral field and the XXX_sizecache field in Go protobufs:
https://github.com/nvanbenschoten/cockroach/commits/nvanbenschoten/sizeCache

However, this seems to cause a strange performance regression. To reproduce this, apply the following patch:

diff --git a/pkg/cmd/protoc-gen-gogoroach/main.go b/pkg/cmd/protoc-gen-gogoroach/main.go
index b47507f..88f6ec8 100644
--- a/pkg/cmd/protoc-gen-gogoroach/main.go
+++ b/pkg/cmd/protoc-gen-gogoroach/main.go
@@ -80,7 +80,14 @@ func main() {
                // Something something extensions; we don't use 'em currently.
                // vanity.TurnOffGoExtensionsMapAll,

-               vanity.TurnOffGoUnrecognizedAll,
+               // Disable generation of the following fields, which aren't worth
+               // their associated runtime cost:
+               // - XXX_unrecognized
+               // - XXX_NoUnkeyedLiteral
+               // - XXX_sizecache
+               vanity.TurnOffGoUnrecognizedAll,
+               vanity.TurnOffGoUnkeyedAll,
+               vanity.TurnOffGoSizecacheAll,

                // Adds unnecessary dependency on golang/protobuf.
                // vanity.TurnOffGogoImport,

Build cockroach using ./build/builder.sh mkrelease linux-gnu. Finally, run roachtest run 'kv95/enc=false/nodes=3$'. Without the patch, throughput should hover around 32k qps. With the patch, throughput drops to around 13k qps.

The goal here is to figure out why. There must be an obvious explanation for this, but I haven't been able to find it. Debugging may require a combination of debugging the performance of running clusters and exploring micro-benchmarks.

@nvanbenschoten nvanbenschoten added C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels May 21, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Jun 17, 2019
`unsafe.Sizeof(hlc.Timestamp{})` was including XXX_sizecache, which was bloating
the struct size even though it didn't have an effect on the encoded size of the
timestamps, which we control in `encodeValue`.

Even once we remove `XXX_sizecache` (cockroachdb#37706), it looks like `unsafe.Sizeof` will
still include padding which we don't want to capture in the `encodedTsSize` constant,
so this seems like the best approach. See https://play.golang.org/p/5swJSSbP6J4.

Release note: None
craig bot pushed a commit that referenced this issue Jun 17, 2019
38197: cli: warn the user if the time zone database is unavailable r=knz a=knz

Fixes #23828.
(The part in that issue about the database not being up to date cannot be fixed using the Go standard library - this does not give access to the search path and the version number. A comprehensive fix needs to wait for #36864.)

Release note (cli change): CockroachDB will now print out an error
message and an informative hint if the time zone database is unusable.

38214: storage/tscache: save 4 bytes per encoded tscache value r=ajwerner a=nvanbenschoten

`unsafe.Sizeof(hlc.Timestamp{})` was including XXX_sizecache, which was bloating
the struct size even though it didn't have an effect on the encoded size of the
timestamps, which we control in `encodeValue`.

Even once we remove `XXX_sizecache` (#37706), it looks like `unsafe.Sizeof` will
still include padding which we don't want to capture in the `encodedTsSize` constant,
so this seems like the best approach. See https://play.golang.org/p/5swJSSbP6J4.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@nvanbenschoten nvanbenschoten self-assigned this Jun 25, 2019
@nvanbenschoten
Copy link
Member Author

It looks like this is fixed by #37967. I'm surprised by this because I see this when compiling with Go 1.11.6 (before d0b7737). Still, the numbers don't lie:

### these two commits patched on the parent of a2d0d6b

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0        7697250        12828.7     14.0      2.9     62.9    104.9   2818.6  read
  600.0s        0         405461          675.8     18.4      9.4     67.1    113.2   2550.1  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0        8102711        13504.5     14.2      3.5     62.9    104.9   2818.6


### these two commits patched on a2d0d6b

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total
  600.0s        0       18124733        30207.7      5.7      4.1     17.8     28.3    121.6  read
  600.0s        0         955357         1592.3     13.0     11.5     30.4     44.0    159.4  write

_elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result
  600.0s        0       19080090        31800.0      6.0      4.5     18.9     30.4    159.4

I'm going to conclude that the regression is addressed and that the hypothesis in #37967 (review) was mostly accurate:

It's possible that we're seeing a very similar issue where the difference in proto padding is causing stack growth underneath the HLC clock mutex (instead of somewhere else) and destroying performance.

craig bot pushed a commit that referenced this issue Jun 25, 2019
38404: proto: disable XXX_NoUnkeyedLiteral and XXX_sizecache fields r=nvanbenschoten a=nvanbenschoten

Fixes #37706.

The PR removes the `XXX_NoUnkeyedLiteral` and `XXX_sizecache` fields from generated protobufs.

`XXX_NoUnkeyedLiteral` claims to come with a small perf win for proto marshaling, but it looks like it's only used with `proto.Marshal` and not with the `Marshal` method (which CRDB code and gRPC code always use). Meanwhile, it has two moderate disadvantages:
1. it adds 32 bits to every proto struct. This isn't a big deal in most cases, but in code that allocates large slices of little proto structs (like the `spanlatch.Manager`, which is what tipped me off to this new field), it has a non-negligible cost.
2. it makes value comparison between proto structs dangerously unreliable. We compare small protos by value all over the place (see `hlc.Timestamp`) and switching to an `Equals` method would come with a cost to code complexity and runtime performance.

After removing `XXX_NoUnkeyedLiteral`, `XXX_sizecache` is now at the end of generated protos, it is now taking up space even though it is an empty struct: golang/go#9401. Since this is just a glorified lint which is already enforced by `go vet`'s “composites” check, it doesn't seem worth the cost.

### Benchmarks

```
name                    old reads/s   new reads/s   delta
kv95/enc=false/nodes=3    35.5k ± 1%    35.8k ± 1%  +0.86%  (p=0.214 n=5+5)

name                    old writes/s  new writes/s  delta
kv95/enc=false/nodes=3    1.87k ± 2%    1.89k ± 2%  +0.94%  (p=0.214 n=5+5)
```

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in #38404 Jun 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-investigation Further steps needed to qualify. C-label will change. C-performance Perf of queries or internals. Solution not expected to change functional behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants