-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement capnproto replication #7659
Conversation
253f417
to
7042bea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this include a whole new RPC framework? I'm very interested in this. Glad to finally see #7071 tackled. How do you think this compared to dRPC? And is it worth eschewing gRPC altogether?
Really interested in this too! Our profiles show similar for unmarshaling |
Yes this comes with its own RPC framework which can be a double edge sword. It's one more thing to learn and troubleshoot if it goes wrong. I haven't looked into dRPC yet in practice, but maybe there is a way to abstract the RPC framework and not expose internals to end users, this way we can swap it out in future releases in a transparent manner. In this PR we expose the listen port as a parameter since we need separate connections for Cap'n Proto. |
DRPC allows serving both gRPC and DRPC on the same socket to allow for incrementally upgrading/migrating a fleet. We might consider doing something similar whether we go with capnproto or DRPC |
Mhm, we probably need to experiment more with the transparent approach before deciding what to do. For example, we could maybe deprecate My only suggestion would be:
Instead of having two addresses here, let's have |
Perhaps we can use something like https://github.com/soheilhy/cmux to multiplex multiple protocols on the same port. This way we don't need to pollute arguments with flags for each protocol. Let me see if this is easily doable, and also add some tests for the new RPC. |
d22809b
to
5cfae4c
Compare
3df360a
to
2fdeba7
Compare
b46ec5d
to
79a7aa2
Compare
2bea7c9
to
b340ea0
Compare
Adding e2e test: fpetkovski#7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg
b.Run("unmarshal", func(b *testing.B) { | ||
for i := 0; i < b.N; i++ { | ||
msg, err := capnp.Unmarshal(bs) | ||
require.NoError(b, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives me:
/home/giedriusstatkevicius/dev/thanos/pkg/receive/writecapnp/marshal_bench_test.go:101:
Error Trace: /home/giedriusstatkevicius/dev/thanos/pkg/receive/writecapnp/marshal_bench_test.go:101
/usr/local/go/src/testing/benchmark.go:193
/usr/local/go/src/testing/benchmark.go:215
/usr/local/go/src/runtime/asm_amd64.s:1700
Error: Received unexpected error:
unmarshal: short header section
Does this benchmark work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we are unmarshaling proto bytes into a capnproto request. I've fixed the benchmark, all of them should be passing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make docs
failing 😬
Our profiles from production show that a lot of CPU and memory in receivers is used for unmarshaling protobuf messages. Although it is not possible to change the remote-write format, we have the freedom to change the protocol used for replicating timeseries data. This commit introduces a new feature in receivers where replication can be done using Cap'n Proto instead of gRPC + Protobuf. The advantage of the former protocol is that deserialization is far cheaper and fields can be accessed directly from the received message (byte slice) without allocating intermediate objects. There is an additional cost for serialization because we have to convert from Protobuf to the Cap'n proto format, but in our setup this still results in a net reduction in resource usage. Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: Pedro Tanaka <pedro.tanaka@shopify.com> Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
4b34a07
to
d337e0a
Compare
Rebased on main and regenerated docs. We should be good to go! We can keep iterating over time, I'd like to see how we can leverage interning in the future. |
Our profiles from production show that a lot of CPU and memory in receivers is used for unmarshaling protobuf messages. Although it is not possible to change the remote-write format, we have the freedom to change the protocol used for replicating timeseries data.
This commit introduces a new feature in receivers where replication can be done using Cap'n Proto instead of gRPC + Protobuf. The advantage of the former protocol is that deserialization is far cheaper and fields can be accessed directly from the received message (byte slice) without allocating intermediate objects. There is an additional cost for serialization because we have to convert from Protobuf to the Cap'n proto format, but in our setup this still results in a net reduction in resource usage.
We have a split router-receiver setup and this is our resource usage in staging after enabling the new replication method:
Router usage did go up as well, but we still got an overall net reduction:
We can also experiment with other formats by having a more generic flag:
receive.replication-format=...
Changes
Verification