Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool buffers when sending write request #5195
Pool buffers when sending write request #5195
Changes from 16 commits
39b6651
c922a53
ce99b8e
d85c291
b19c1f4
eb4d76d
8d9490a
eb5104f
c95d625
2004590
646d927
4f1d3ff
6848ab7
adef019
24328ed
27a4fe4
c8d7c56
c38ae89
4e95f78
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify how this number was picked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked average message size received by ingester Push handlers using this query:
We see wildly different numbers across our environments. I picked a number that's big enough to handle big message size (for environment where average is 300 KB), but also provides enough space to keep many smaller messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] We typically have pools as global vars. Any pro of having it on the
Distributor
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why it would need to be a global variable, when we have a logical place to put it. I understand that's not always the case, but I think it fits into distributor nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the fast releasing slab pool if you create a new slab pool for each request? Couldn't just use the "base" one and release all slabs in the cleanup function passed to
ring.DoBatch()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, I can see fast releasing slab is an optimization because slabs may be released sooner, but can you confirm it's not strictly required and the base one could be used as well? I just want to make sure I understand rationale.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding is correct. It's an optimization to release slabs sooner.
I think this may be actually interesting optimization in environments where average size of write request to ingester is high, and we're also sending requests to many ingesters at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hard question is: is this safe or could the message buffer be retained by gRPC for longer? Can you share more details about the arguments on "this is safe to do". I still struggle a lot about this, because each time one of us look at it comes to a slightly different conclusion.
My question is not just about gRPC itself (for which you have added a nice test) but the chain of middlewares we have. Our middlewares shouldn't retain it, but logging may be async so worth to double check it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to realize is that we're doing this buffering on the client side. "gRPC" that we talk about here is grpc connection to ingester.
Re middlewares... We setup our grpc connection with following interceptors:
otgrpc.OpenTracingClientInterceptor(opentracing.GlobalTracer())
middleware.ClientUserHeaderInterceptor
middleware.UnaryClientInstrumentInterceptor(requestDuration)
(*Config).DialOption
:Looking at implementation of these interceptors...
OpenTracingClientInterceptor
has ability to 1) pass request to span inclusion function,, 2) log the payload (request) and 3) decorate the span using the request. Note that we don't configureOpenTracingClientInterceptor
with any of these options.middleware.ClientUserHeaderInterceptor
,middleware.UnaryClientInstrumentInterceptor
,BackoffRetry
andRateLimiter
interceptors don't use request in any way (apart from passing it further)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More places where our buffer could be passed around:
Let's start with binlogs. This is functionality configured by env variable
GRPC_BINARY_LOG_FILTER
. When configured, gRPC logs each RPC in the format defined by proto GrpcLogEntry. Design for this feature is at https://github.com/grpc/proposal/blob/master/A16-binary-logging.md.When used, request is passed to logging method inside
binarylog.ClientMessage
struct. This struct hastoProto()
which will call anotherMarhal
on our message. Oncebinary.ClientMessage
is converted tobinlogpb.GrpcLogEntry
(now it contains our buffer from the pool), it's forwarded to the "sink" for writing.There are three
sink
implementations in grpc:bufferedSink
,noopSink
andwriterSink
.writerSink
marshals*binlogpb.GrpcLogEntry
and writes it to some writer.bufferedSink
simply wrapswriterSink
but gives it a buffer to write output to, and also runs a goroutine to periodically flush the buffer. Important point is that it doesn't keep*binlogpb.GrpcLogEntry
around, but marshals it immediately.My conclusion is: using binary logging feature is safe when using our message buffering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
statsHandlers
for the grpc connection are configured via dial options. We don't use that feature in our code.It's possible that I've missed something, but it seems to me that it's safe to reuse the buffer used to Marshal as soon as
(*grpc.ClientConn).Invoke
returns.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this analysis Peter. This is too valuable to get it lost in a GitHub comment. WDYT to move this to a
docs/contributing/...
page, where we can keep contributing the more we learn about it in the future?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've copied the notes above into doc: #5216
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above analysis considered "happy path" only -- message gets sent to server, server replies. However in case of errors, messages can still be queued locally and sent later. #5830 fixes this.