Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider gRPC alternatives #7071

Open
GiedriusS opened this issue Jan 17, 2024 · 9 comments
Open

Consider gRPC alternatives #7071

GiedriusS opened this issue Jan 17, 2024 · 9 comments

Comments

@GiedriusS
Copy link
Member

GiedriusS commented Jan 17, 2024

I think there are a few problems with gRPC:

  • The Go implementation at least has a bunch of deprecated code, it has a HTTP/2 implementation inside, and a bunch of functionality that we don't really use. I think Thanos can be much leaner in that regard. This post outlines the problems well https://www.storj.io/blog/introducing-drpc-our-replacement-for-grpc. Even though, of course, the presentation about dRPC will talk negatively about gRPC, my experience reflects what's written there.
  • It will be a tabula rasa for starting from scratch and not using gogoproto Migrate away from gogoprotobuf to protobuf v2 API + vtprotobuf #4557 because gogoproto is now deeply entrenched in our codebase due to copy/paste of generated code. Thus it is tough to remove it.
  • It's hard/impossible to use pooling on both sender/receiver sides in gRPC because gRPC hides this stuff and doesn't allow controlling allocations. gRPC effectively "takes ownership" of whatever byte slice we pass to it.
  • It doesn't allow implementing string deduplication because the unmarshaling function always receives a fresh byte slice (Version 2 of string interning #7067)

I have researched this topic about 1yr ago but it seems like dRPC is the most viable candidate for replacing gRPC. It would be interesting to add dRPC to Thanos under a feature flag and see how well it works.

@MichaHoffmann
Copy link
Contributor

MichaHoffmann commented Jan 17, 2024

So my first google search for "drpc" returned the homepage of the "danville rifle and pistol club". It does not seem to be super popular at least. Thats just an observation not a technical argument against it of course. If we are considering alternatives (which we probably should) I would like to see something more mainstream though I think.

@GiedriusS
Copy link
Member Author

😄 it's https://github.com/storj/drpc. Fair point, I'm not arguing against that but I wonder what other libraries would fit the bill?

@MichaHoffmann
Copy link
Contributor

I have not tried yet but if we are going to look into replacements I think we should also check https://arrow.apache.org/docs/format/Flight.html; arrow seems to be pretty popular lately.

@GiedriusS
Copy link
Member Author

GiedriusS commented Jan 17, 2024

It is based on gRPC so I think that immediately disqualifies it because it has the same problems. Twirp also seems like it might be a valid option https://github.com/twitchtv/twirp. But it looks like it also allocates a new []byte for each request:
https://github.com/twitchtv/twirp/blob/main/example/service.twirp.go#L441-L450
So, we would be fighting with the RPC framework from the start again :/

@fpetkovski
Copy link
Contributor

fpetkovski commented Jan 17, 2024

In the profiles I've looked at, the major bottleneck was not gRPC, but rather the (de)serialization of protobuf messages. Arrow uses flatbuffers with some additional optimizations for zero-copy data transfer. We do have to check whether these optimizations are available in the Go library though, but zero-copy is one of the key selling points of the format so there's a good chance it's already done. I have a testing branch which adds an arrow API for the Series call, but I haven't had a chance to move it forward: main...fpetkovski:thanos:series-arrow.

I think we can also experiment with various approaches for the internal replication calls, since those are self contained and can be tested quickly.

@bwplotka
Copy link
Member

bwplotka commented Jan 17, 2024

Interesting points! Definitely it's healthy to revisit gRPC vs the alternatives from time to time!

The Go implementation at least has a bunch of deprecated code, it has a HTTP/2 implementation inside, and a bunch of functionality that we don't really use. I think Thanos can be much leaner in that regard. This post outlines the problems well https://www.storj.io/blog/introducing-drpc-our-replacement-for-grpc. Even though, of course, the presentation about dRPC will talk negatively about gRPC, my experience reflects what's written there.

dRPC looks pretty nice, especially for Go (the most supported language). It's essentially what we do in Prometheus, HTTP 1.1 + protobuf, but supporting rpc elements in proto file. However, it is a niche, plus does not support other languages well -- maybe that's fine.

Plus, I wonder if we really don't have any improvements from multiplexing on single connections with HTTP/2, we do stream a lot. Benchmarks would tell for our use case (I know dRPC has some benchmarks for their use case).

The Go implementation at least has a bunch of deprecated code

What exactly deprecated code? Do you mean using old protobuf API? This seems to be changing soon: grpc/grpc-go#6919 .. and the reason for this AND bunch of deprecated methods is for us to not have a breaking changes all the time, so it is a pretty weak point to me. They do deprecate stuff and sometimes use deprecated repos in some places to keep our (downstream users) work smooth. Why this is bad?

It will be a tabula rasa for starting from scratch and not using gogoproto #4557 because gogoproto is now deeply entrenched in our codebase due to copy/paste of generated code. Thus it is tough to remove it.

It's not though to remove it. It's just a lot of places to change, but it's generally easier than people think. And yes, you lose non-pointer fields, but as I shown in my experiments with new go generators the results on v2 + e.g. vtproto are still better vs gogo proto, despite this.

It's hard/impossible to use pooling on both sender/receiver sides in gRPC because gRPC hides this stuff and doesn't allow controlling allocations. gRPC effectively "takes ownership" of whatever byte slice we pass to it.

Are you really sure it's impossible? gRPC is doing a good job on identifying and pooling for us, plus my impression was that we can totally inject our own custom pooling to any place using custom codec have you checked that? e.g https://github.com/planetscale/vtprotobuf?tab=readme-ov-file#vtprotobuf-with-grpc (see pool option before that section).

It doesn't allow implementing string deduplication because the unmarshaling function always receives a fresh byte slice (#7067)

Same doubt as above, plus this particular problem is fixed with Remote Write 2.0 that includes interning..

@GiedriusS
Copy link
Member Author

vtprotobuf only provides the ability to use pooling on one side. I have noted that on the serialization side, it is completely unsafe to use pooling due to gRPC: planetscale/vtprotobuf#20. :)

@GiedriusS
Copy link
Member Author

GiedriusS commented Jan 17, 2024

There's some recent movement though apparently grpc/grpc-go#6619 so maybe this year this will be fixed in the Go gRPC implementation 😄

This service demonstrated the benefit of this approach with a very significant reduction in allocation volume (>90% fewer bytes allocated per second and a 7x reduction in allocations) and much lower CPU usage (>30% less). We offer an implementation of this proposal for context and as a conversation starter: grpc/grpc-go#6613

Nice potential gains (:

@GiedriusS
Copy link
Member Author

One more thing I want to do: build a dictionary from label names/values and use that for compression on an individual RPC level. Not possible with gRPC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants