Add batching for traces #1554

niksajakovljevic · 2022-08-10T11:44:53Z

Batching will help to achieve better ingest performance, especially if
traces are sent one by one.

A batch is produced when one of two conditions is met: batch size or timeout.

This PR also adds async support for traces meaning that client doesn't need to wait for DB write. This increases ingest performance with a small risk of data loss.

Added 3 new CLI flags:

Two flags to control batch size: tracing.max-batch-size and tracing.batch-timeout.
Flag for async writes tracing.async-acks

niksajakovljevic · 2022-08-10T16:55:24Z

The local benchmarks that I've added are pointing that batching should provide better performance, however I still need to run full blown benchmark with more realistic data loads.

pkg/pgmodel/ingestor/trace/trace_batcher.go

pkg/pgmodel/ingestor/trace/trace_dispatcher.go

pkg/pgmodel/ingestor/trace/trace_batcher.go

arajkumar · 2022-08-11T10:03:12Z

pkg/pgmodel/ingestor/trace/trace_batcher.go

+	defaultMaxBufferedBatches = 200                    // arbitrary picked. We want to avoid wait on batches
+)
+
+var defaultBatchWorkers = runtime.NumCPU() / 2 // we only take half so other half can be used for writers


Since this is primarily IO driven, is it possible to size this based on number of db connections instead of no.of CPUs?

If you look closer batching and writing batches is separated thus creating batches is mostly CPU driven - batches are only written to a channel and picked up by batch writers.

pkg/pgmodel/ingestor/trace/trace_batcher.go

Harkishen-Singh

Would be good to see perf difference with and without batching in case of Jaeger ingestion.

CHANGELOG.md

pkg/pgmodel/ingestor/ingestor.go

pkg/pgmodel/ingestor/trace/trace_batcher.go

Harkishen-Singh · 2022-08-16T09:31:16Z

pkg/pgmodel/ingestor/trace/trace_batcher.go

+		case <-ticker.C:
+			batcherSpan.AddEvent("Batch timeout reached")
+			if !batch.isEmpty() {
+				batch = flushBatch(batch)
+			}


This won't work as expected.
It will tick even if the very previous moment we did the item := <-b.in case and then flushed the batch. This is because the ticker is just created once, so it does not care about most recent event.

We should use <-time.After(config.BatchTimeout) in this case.

Not sure that's true. If item is received and batch is full we will flush it and reset the ticker. If batch is not full we still can reach the timeout and flush it which is what we want. Does it now make sense?

pkg/pgmodel/ingestor/trace/trace_batcher.go

pkg/pgmodel/ingestor/trace/trace_dispatcher.go

niksajakovljevic · 2022-08-16T15:08:45Z

Would be good to see perf difference with and without batching in case of Jaeger ingestion.

Yes I will be soon publishing benchmark numbers in this PR.

pkg/pgclient/client.go

pkg/pgmodel/ingestor/ingestor.go

pkg/pgmodel/ingestor/trace/trace_dispatcher.go

pkg/pgmodel/ingestor/trace/trace_batcher.go

Harkishen-Singh

Overall looks good. However, some changes are needed on the metrics side.

pkg/pgmodel/ingestor/trace/trace_batcher.go

Harkishen-Singh · 2022-08-22T06:20:59Z

pkg/pgmodel/ingestor/trace/trace_batcher.go

+		b.bufferedBatches <- batchCp
+		batcherSpan.AddEvent("Batch sent to buffer")
+		batcherSpan.AddEvent("New Batch")
+		return NewBatch(b.config.MaxBatchSize)


No strong suggestion. My aim was more towards readability. But, let's ignore this.

pkg/pgmodel/ingestor/trace/trace_batcher.go

pkg/pgmodel/metrics/ingest.go

cevian

Overall looks good but needs some changes

pkg/pgclient/client.go

pkg/pgmodel/ingestor/copier.go

pkg/pgmodel/ingestor/trace/trace_batcher.go

cevian · 2022-08-22T17:43:55Z

pkg/pgmodel/ingestor/trace/trace_batcher.go

+)
+
+const (
+	defaultReqBufferSize      = 100000                 // buffer for incoming requests, especially important for async acks


Can't these defaults be derived from other defaults

Hmm...not sure about this one. My idea was to have enough buffer for spikes. 100K might be too much. In the recent benchmark runs I did I've never seen this going above 1K. Maybe we can set it to 5K for now and tune as we go. Frankly fine tuning these requires time and a lots of benchmark runs... Please let me know if you have a better idea.

I actually tweak this to be MaxBatchSize * 3. I also split incoming request into multiple queues (we have a separate queue for each batcher). I added a separate commit for easier review.

pkg/pgmodel/ingestor/trace/trace_batcher.go

Harkishen-Singh

One nit remaining, otherwise LGTM 👍🏻

pkg/pgmodel/ingestor/trace/trace_batcher.go

cevian · 2022-08-29T01:21:47Z

pkg/pgmodel/ingestor/trace/trace_dispatcher.go

+// If it's only one span we shard by it's TraceID so spans with the same TraceID end up in the same batcher.
+// Otherwise we roun-robin between batchers
+func (td *Dispatcher) getBatcherIdx(ctx context.Context, traces ptrace.Traces) (int, error) {
+	numberOfBatchers := td.batcher.config.Batchers


I don't love that the dispatcher needs to care about the number of batchers but this may be unavoidable...

Batching will help to achieve better ingest performance, especially if traces are sent one by one (which is the case for Jaeger collector). Batch is flushed either on timeout or when full. Adds async support for traces meaning that client doesn't need to wait for DB write. This increases ingest performance with a small risk of data loss. New CLI flag `tracing.async-acks` added. Flags to control batch size: `tracing.max-batch-size` and `tracing.batch-timeout`. Flags to control batch workers: `tracing.batch-workers`

niksajakovljevic added the epic/jaeger-grpc-write Jaeger gRPC based write integration label Aug 10, 2022

niksajakovljevic requested a review from a team August 10, 2022 11:44

niksajakovljevic self-assigned this Aug 10, 2022

niksajakovljevic removed the request for review from a team August 10, 2022 11:45

niksajakovljevic force-pushed the niksa/batch-traces branch 5 times, most recently from ca3872f to 6853fd1 Compare August 10, 2022 13:45

niksajakovljevic requested review from arajkumar, Harkishen-Singh and antekresic August 10, 2022 13:45

niksajakovljevic marked this pull request as ready for review August 10, 2022 13:46

niksajakovljevic requested review from a team as code owners August 10, 2022 13:46

niksajakovljevic force-pushed the niksa/batch-traces branch 2 times, most recently from a7546c1 to 149872d Compare August 10, 2022 16:53

arajkumar reviewed Aug 11, 2022

View reviewed changes

niksajakovljevic force-pushed the niksa/batch-traces branch from 149872d to e205cfa Compare August 15, 2022 09:38

niksajakovljevic requested a review from arajkumar August 15, 2022 09:38

niksajakovljevic force-pushed the niksa/batch-traces branch 3 times, most recently from 0e7e316 to ed765e8 Compare August 16, 2022 08:52

Harkishen-Singh suggested changes Aug 16, 2022

View reviewed changes

niksajakovljevic force-pushed the niksa/batch-traces branch 3 times, most recently from 2500299 to 07dbd59 Compare August 16, 2022 15:07

niksajakovljevic force-pushed the niksa/batch-traces branch from 07dbd59 to c2a70b4 Compare August 16, 2022 15:10

niksajakovljevic requested a review from Harkishen-Singh August 16, 2022 15:10

niksajakovljevic force-pushed the niksa/batch-traces branch from c2a70b4 to 4859065 Compare August 16, 2022 18:34

antekresic reviewed Aug 17, 2022

View reviewed changes

Harkishen-Singh suggested changes Aug 22, 2022

View reviewed changes

niksajakovljevic force-pushed the niksa/batch-traces branch 2 times, most recently from f2ac414 to ba9f7d2 Compare August 22, 2022 14:13

niksajakovljevic requested a review from Harkishen-Singh August 22, 2022 14:13

cevian suggested changes Aug 22, 2022

View reviewed changes

Harkishen-Singh approved these changes Aug 23, 2022

View reviewed changes

pkg/pgmodel/ingestor/trace/trace_batcher.go Show resolved Hide resolved

niksajakovljevic force-pushed the niksa/batch-traces branch from ba9f7d2 to 9cc518c Compare August 23, 2022 11:00

niksajakovljevic requested a review from cevian August 23, 2022 11:19

niksajakovljevic force-pushed the niksa/batch-traces branch 6 times, most recently from 24eb8b4 to b6528f6 Compare August 28, 2022 11:53

cevian approved these changes Aug 29, 2022

View reviewed changes

niksajakovljevic force-pushed the niksa/batch-traces branch from b6528f6 to 85556ae Compare August 29, 2022 06:17

niksajakovljevic merged commit 48cde7c into master Aug 29, 2022

niksajakovljevic deleted the niksa/batch-traces branch August 29, 2022 06:33

niksajakovljevic mentioned this pull request Aug 29, 2022

Add batching support for traces #1537

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batching for traces #1554

Add batching for traces #1554

niksajakovljevic commented Aug 10, 2022 •

edited

Loading

niksajakovljevic commented Aug 10, 2022

arajkumar Aug 11, 2022

niksajakovljevic Aug 15, 2022

Harkishen-Singh left a comment

Harkishen-Singh Aug 16, 2022

niksajakovljevic Aug 16, 2022

niksajakovljevic commented Aug 16, 2022

Harkishen-Singh left a comment

Harkishen-Singh Aug 22, 2022

cevian left a comment

cevian Aug 22, 2022

niksajakovljevic Aug 23, 2022

niksajakovljevic Aug 24, 2022

Harkishen-Singh left a comment

cevian Aug 29, 2022

Add batching for traces #1554

Add batching for traces #1554

Conversation

niksajakovljevic commented Aug 10, 2022 • edited Loading

niksajakovljevic commented Aug 10, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Harkishen-Singh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksajakovljevic commented Aug 16, 2022

Harkishen-Singh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cevian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Harkishen-Singh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

niksajakovljevic commented Aug 10, 2022 •

edited

Loading