release-23.1: changefeedccl: pubsub sink refactor to batching sink #100930
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport 1/1 commits from #100188 on behalf of @samiskin.
/cc @cockroachdb/release
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237
This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework.
The changes involve:
SinkClient
interface, moving to using the lower level v1 pubsub API that lets us publish batches manuallypstest
package for validating results in unit testsAt default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka.
Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds):
Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them):
In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together.
Publish requests remained the same despite way more messages in v2
Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting.
Release justification: Feature flagged (default off) and high impact to the pubsub sink throughput