New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Pool buffers when sending write request #5195

Merged

pstibrany merged 19 commits into main from pool_buffers_when_sending_write_request2

Jun 8, 2023

+813 −7

Member

pstibrany commented Jun 7, 2023

What this PR does

When distributor receives a write request, distributor splits it into multiple smaller requests, and forward each of them to single ingester. In this step distributor needs to marshal protobuf messages. This PR adds pooling of buffers when doing this marshalling.

The actual pooling logic is implemented in a wrapper to IngesterClient (grpc interface for communicating with ingester).

Our benchmark in dev environment show about 7% reduction of CPU time with this change.

Result of benchmark in this PR shows reduction of allocations per Push call (for purpose of this benchmark, Push only calls message Marshal):

name                                                      time/op
WriteRequestBufferingClient_Push/push_without_pooling-10  7.18µs ± 0%
WriteRequestBufferingClient_Push/push_with_pooling-10     6.58µs ± 0%

name                                                      alloc/op
WriteRequestBufferingClient_Push/push_without_pooling-10  6.14kB ± 0%
WriteRequestBufferingClient_Push/push_with_pooling-10       175B ± 3%

name                                                      allocs/op
WriteRequestBufferingClient_Push/push_without_pooling-10    1.00 ± 0%
WriteRequestBufferingClient_Push/push_with_pooling-10       3.00 ± 0%

This PR also adds "fast releasing slab pool". Compared to existing "slab pool" implementation that already exists in Mimir codebase, "fast releasing" one can release individual slabs (pooled object) as soon as they are not used, instead of waiting for final Release call on the slab pool.

Checklist

Tests updated
[na] Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany requested review from a team as code owners

June 7, 2023 13:01

aknuds1 reviewed

View reviewed changes

Contributor

aknuds1 left a comment

I think the flag should be renamed (with "requests") to correspond to the YAML parameter.

CHANGELOG.md Outdated Show resolved Hide resolved

docs/sources/mimir/configure/about-versioning.md Outdated Show resolved Hide resolved

aknuds1 reviewed

View reviewed changes

Contributor

aknuds1 left a comment

Generally LGTM, but please see questions/suggestions :)

pkg/ingester/client/buffering_client.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client.go Outdated Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool.go Outdated Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool.go Outdated Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client_test.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client_test.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client_test.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client_test.go Outdated Show resolved Hide resolved

pkg/ingester/client/buffering_client_test.go Outdated Show resolved Hide resolved

Member Author

pstibrany commented Jun 7, 2023

I think the flag should be renamed (with "requests") to correspond to the YAML parameter.

Thanks Arve, I've fixed problems raised in your comments.

Contributor

aknuds1 commented Jun 7, 2023

Thank you @pstibrany, I will look again :)

aknuds1 approved these changes

View reviewed changes

Contributor

aknuds1 left a comment

LGTM, just leaving some suggestions for readability.

pkg/ingester/client/buffering_client_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Show resolved Hide resolved

pracucci self-requested a review

June 8, 2023 08:25

pstibrany added 16 commits

June 8, 2023 10:35


          Wrap WriteRequest to use pooled []byte when writing requests.

39b6651

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Use fast-releasing slab. WIP.

c922a53

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Add a test and comments.

ce99b8e

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Use slice of slabs, lookup free space in last slab only.

d85c291

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Added flag to control whether pooling of buffers is enabled.

b19c1f4

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Move pooling logic into separate client, to keep changes in distribut…

eb4d76d

…or tiny.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Added test and benchmark.

8d9490a

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Copied tests from SlabPool.

eb5104f

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Add test for multiple Marshal calls.

c95d625

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Polishing.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          CHANGELOG, docs, help.

646d927

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Move pool to distributor.

4f1d3ff

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Move pool to distributor.

6848ab7

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Address review feedback.

adef019

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Fix slabID.

24328ed

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Linter, review feedback.

27a4fe4

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany force-pushed the pool_buffers_when_sending_write_request2 branch from 7d8cb30 to 27a4fe4 Compare

June 8, 2023 08:35

aknuds1 added the component/distributor label

pracucci approved these changes

View reviewed changes

Collaborator

pracucci left a comment

Good job! I left few minor comments. My major comment is that I strongly recommend to keep this experimental feature disabled for now. Contrary to other lower risk optimizations, I want to rollout this optimization more slowly and don't want to enable it by default just yet.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved

pkg/distributor/distributor.go

    
              	// Size of "slab" when using pooled buffers for marshaling write requests. When handling single Push request

              	// buffers for multiple write requests sent to ingesters will be allocated from single "slab", if there is enough space.

              	writeRequestSlabPoolSize = 512 * 1024

Collaborator

pracucci Jun 8, 2023

Can you clarify how this number was picked?

Member Author

pstibrany Jun 8, 2023

I checked average message size received by ingester Push handlers using this query:

sum by (namespace) (increase(cortex_request_message_bytes_sum{route="/cortex.Ingester/Push"}[$__rate_interval]))
/
sum by (namespace) (increase(cortex_request_message_bytes_count{route="/cortex.Ingester/Push"}[$__rate_interval]))

Screenshot 2023-06-08 at 13 46 52

We see wildly different numbers across our environments. I picked a number that's big enough to handle big message size (for environment where average is 300 KB), but also provides enough space to keep many smaller messages.

pkg/distributor/distributor.go Outdated Show resolved Hide resolved

pkg/distributor/distributor.go

@@ @@ -133,6 +136,9 @@ type Distributor struct { @@
               	metadataValidationMetrics *validation.MetadataValidationMetrics
               	PushWithMiddlewares push.Func
+              	// Pool of []byte used when marshalling write requests.
+              	writeRequestBytePool sync.Pool

Collaborator

pracucci Jun 8, 2023

[nit] We typically have pools as global vars. Any pro of having it on the Distributor?

Member Author

pstibrany Jun 8, 2023

I don't see why it would need to be a global variable, when we have a logical place to put it. I understand that's not always the case, but I think it fits into distributor nicely.

pkg/ingester/client/client.go Outdated Show resolved Hide resolved

pkg/distributor/distributor.go Outdated

               	// so set this flag false and pass cleanup() to DoBatch.
               	cleanupInDefer = false
+              	if d.cfg.WriteRequestsBufferPoolingEnabled {
+              		slabPool := pool2.NewFastReleasingSlabPool[byte](&d.writeRequestBytePool, writeRequestSlabPoolSize)

Collaborator

pracucci Jun 8, 2023

Why do you need the fast releasing slab pool if you create a new slab pool for each request? Couldn't just use the "base" one and release all slabs in the cleanup function passed to ring.DoBatch()?

Collaborator

pracucci Jun 8, 2023

I mean, I can see fast releasing slab is an optimization because slabs may be released sooner, but can you confirm it's not strictly required and the base one could be used as well? I just want to make sure I understand rationale.

Member Author

pstibrany Jun 8, 2023

I can see fast releasing slab is an optimization because slabs may be released sooner, but can you confirm it's not strictly required and the base one could be used as well? I just want to make sure I understand rationale.

Your understanding is correct. It's an optimization to release slabs sooner.

I think this may be actually interesting optimization in environments where average size of write request to ingester is high, and we're also sending requests to many ingesters at once.

pkg/ingester/client/buffering_client.go Show resolved Hide resolved

pkg/ingester/client/buffering_client.go

+              		slabPool:     p,
+              	}
+              	// We can return all buffers back to slabPool when this method finishes.
+              	defer wr.ReturnBuffersToPool()

Collaborator

pracucci Jun 8, 2023

The hard question is: is this safe or could the message buffer be retained by gRPC for longer? Can you share more details about the arguments on "this is safe to do". I still struggle a lot about this, because each time one of us look at it comes to a slightly different conclusion.

My question is not just about gRPC itself (for which you have added a nice test) but the chain of middlewares we have. Our middlewares shouldn't retain it, but logging may be async so worth to double check it.

Member Author

pstibrany Jun 8, 2023 •

edited

Loading

One thing to realize is that we're doing this buffering on the client side. "gRPC" that we talk about here is grpc connection to ingester.

Re middlewares... We setup our grpc connection with following interceptors:

grpcclient.Instrument, which in turns adds:
- otgrpc.OpenTracingClientInterceptor(opentracing.GlobalTracer())
- middleware.ClientUserHeaderInterceptor
- middleware.UnaryClientInstrumentInterceptor(requestDuration)
Interceptors added by (*Config).DialOption:
- BackoffRetry
- RateLimiter

Looking at implementation of these interceptors...

OpenTracingClientInterceptor has ability to 1) pass request to span inclusion function,, 2) log the payload (request) and 3) decorate the span using the request. Note that we don't configure OpenTracingClientInterceptor with any of these options.
middleware.ClientUserHeaderInterceptor, middleware.UnaryClientInstrumentInterceptor, BackoffRetry and RateLimiter interceptors don't use request in any way (apart from passing it further)

Member Author

pstibrany Jun 8, 2023 •

edited

Loading

More places where our buffer could be passed around:

Let's start with binlogs. This is functionality configured by env variable GRPC_BINARY_LOG_FILTER. When configured, gRPC logs each RPC in the format defined by proto GrpcLogEntry. Design for this feature is at https://github.com/grpc/proposal/blob/master/A16-binary-logging.md.

When used, request is passed to logging method inside binarylog.ClientMessage struct. This struct has toProto() which will call another Marhal on our message. Once binary.ClientMessage is converted to binlogpb.GrpcLogEntry (now it contains our buffer from the pool), it's forwarded to the "sink" for writing.

There are three sink implementations in grpc: bufferedSink, noopSink and writerSink.

writerSink marshals *binlogpb.GrpcLogEntry and writes it to some writer. bufferedSink simply wraps writerSink but gives it a buffer to write output to, and also runs a goroutine to periodically flush the buffer. Important point is that it doesn't keep *binlogpb.GrpcLogEntry around, but marshals it immediately.

My conclusion is: using binary logging feature is safe when using our message buffering.

Member Author

pstibrany Jun 8, 2023

statsHandlers for the grpc connection are configured via dial options. We don't use that feature in our code.

It's possible that I've missed something, but it seems to me that it's safe to reuse the buffer used to Marshal as soon as (*grpc.ClientConn).Invoke returns.

Collaborator

pracucci Jun 9, 2023

Thanks for this analysis Peter. This is too valuable to get it lost in a GitHub comment. WDYT to move this to a docs/contributing/... page, where we can keep contributing the more we learn about it in the future?

Member Author

pstibrany Jun 9, 2023

I've copied the notes above into doc: #5216

Member Author

pstibrany Aug 24, 2023

Above analysis considered "happy path" only -- message gets sent to server, server replies. However in case of errors, messages can still be queued locally and sent later. #5830 fixes this.

pkg/util/pool/fast_releasing_pool.go Outdated Show resolved Hide resolved

pkg/util/pool/fast_releasing_pool_test.go Outdated Show resolved Hide resolved

pstibrany added 3 commits

June 8, 2023 14:07


          Address review feedback.

c8d7c56

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Fix compilation error.

c38ae89

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>


          Added missing licenses.

4e95f78

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany added component/ruler and removed component/ruler labels

pstibrany merged commit f7589bc into main

pstibrany deleted the pool_buffers_when_sending_write_request2 branch

June 8, 2023 13:11

pstibrany added a commit that referenced this pull request


          Copy notes from #5195 (comment) into a doc.

069a6fc

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany mentioned this pull request

Write down notes about pooling buffers for grpc marshal/unmarshal. #5216

Merged

pstibrany added a commit that referenced this pull request


          Write down notes about pooling buffers for grpc marshal/unmarshal. (#…

8148e17

…5216)

* Copy notes from #5195 (comment) into a doc.

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

* make doc

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

---------

Signed-off-by: Peter Štibraný <pstibrany@gmail.com>

pstibrany mentioned this pull request

Add integration test for distributor with multiple concurrent clients #5231

Merged

1 task

pstibrany mentioned this pull request

Fix buffer-pooling ingester client #5830

Merged

2 tasks

andyasp mentioned this pull request

Enable distributor write requests buffer pooling by default #7165

Merged

4 tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/distributor