storage: remove kafka producer limits in sinks #24784

petrosagg · 2024-01-29T14:22:35Z

Motivation

The kafka sink operator has nothing better to do with incoming data other than to buffer them, which provides no additional value than to send them to librdkafka and let that buffer them instead.

The operator is already set up to never buffer messages that are ready to send but the current limits are too conservative. If a large snapshot arrives fast enough then it is possible to reach the 10M message limit and cause the sink to restart. I have observed this happening locally during benchmarking.

For this reason this PR disables rdkafka message count and size limits.

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered.
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

guswynn · 2024-01-29T18:52:36Z

src/storage/src/sink/kafka.rs

+        options.insert("queue.buffering.max.kbytes", "2147483647".into());
+        // Disable the default buffer limit of 100k messages. We don't want to impose any limit
+        // here as the operator has nothing better to do with the data than to buffer them.
+        options.insert("queue.buffering.max.messages", "0".into());


Do we not have the fix to confluentinc/librdkafka#4018 in our fork? might explain the test failures

We don't :(

The kafka sink operator has nothing better to do with incoming data other than to buffer them, which provides no additional value than to send them to `librdkafka` and let that buffer them instead. The operator is already set up to never buffer messages that are ready to send but the current limits are too conservative. If a large snapshot arrives fast enough then it is possible to reach the 10M message limit and cause the sink to restart. I have observed this happening locally during benchmarking. For this reason this PR disables rdkafka message count and size limits. Signed-off-by: Petros Angelatos <petrosagg@gmail.com>

guswynn

This looks fine as long as ci passes! we might want to come back and add LD flags for this

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>

Now that we're on librdkafka v2.4.0, we don't need to catch and retry QueueFull errors, but can instead disable the queue limit. This commit is a combination of: * Reverting the QueueFull workaround from MaterializeInc#24871 * Reapplying Petros's original implementation in MaterializeInc#24784 Co-authored-by: Petros Angelatos <petrosagg@gmail.com>

petrosagg requested review from bkirwi and a team January 29, 2024 14:22

guswynn reviewed Jan 29, 2024

View reviewed changes

petrosagg force-pushed the kafka-sink-limits branch from ccd34d2 to da3d237 Compare January 30, 2024 10:04

guswynn approved these changes Jan 30, 2024

View reviewed changes

petrosagg force-pushed the kafka-sink-limits branch from da3d237 to a8f4ee4 Compare January 31, 2024 14:04

petrosagg enabled auto-merge January 31, 2024 14:04

petrosagg force-pushed the kafka-sink-limits branch 3 times, most recently from b29320f to 79a5814 Compare January 31, 2024 17:33

update rdkafka

83e5b88

Signed-off-by: Petros Angelatos <petrosagg@gmail.com>

petrosagg force-pushed the kafka-sink-limits branch from 79a5814 to 83e5b88 Compare January 31, 2024 17:49

petrosagg merged commit 832c0c9 into MaterializeInc:main Jan 31, 2024
66 of 67 checks passed

petrosagg deleted the kafka-sink-limits branch January 31, 2024 18:21

def- mentioned this pull request Jan 31, 2024

Revert "storage: remove kafka producer limits in sinks" #24865

Merged

This was referenced Feb 2, 2024

storage: fix librdkafka segfaults #24899

Closed

storage/sinks/kafka: remove rdkafka queue limit #24935

Draft

benesch mentioned this pull request Jul 1, 2024

storage/sink/kafka: prevent queue full by disabling buffering #27964

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: remove kafka producer limits in sinks #24784

storage: remove kafka producer limits in sinks #24784

petrosagg commented Jan 29, 2024

guswynn Jan 29, 2024

petrosagg Jan 29, 2024

guswynn left a comment

storage: remove kafka producer limits in sinks #24784

storage: remove kafka producer limits in sinks #24784

Conversation

petrosagg commented Jan 29, 2024

Motivation

Tips for reviewer

Checklist

guswynn Jan 29, 2024

Choose a reason for hiding this comment

petrosagg Jan 29, 2024

Choose a reason for hiding this comment

guswynn left a comment

Choose a reason for hiding this comment