Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-23.1: changefeedccl: pubsub sink refactor to batching sink #100930

Merged
merged 1 commit into from
Apr 7, 2023

Conversation

blathers-crl[bot]
Copy link

@blathers-crl blathers-crl bot commented Apr 7, 2023

Backport 1/1 commits from #100188 on behalf of @samiskin.

/cc @cockroachdb/release


Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237

This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework.

The changes involve:

  1. Moves the Pubsub code to match the SinkClient interface, moving to using the lower level v1 pubsub API that lets us publish batches manually
  2. Removing the extra call to json.Marshal
  3. Moving to using the pstest package for validating results in unit tests
  4. Adding topic handling to the batching sink, where batches are created per-topic
  5. Added a pubsub_sink_config since it can now handle Retry and Flush config settings
  6. Added metrics even to the old pubsub for the purpose of comparing the two versions

At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka.

Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds):
Screenshot 2023-03-30 at 3 38 25 PM

Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them):
Screenshot 2023-04-04 at 12 53 45 PM

In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together.
Screenshot 2023-04-04 at 12 59 51 PM

Publish requests remained the same despite way more messages in v2
Screenshot 2023-04-04 at 1 46 51 PM

Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting.


Release justification: Feature flagged (default off) and high impact to the pubsub sink throughput

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237

This change is a followup to
#99086 which moves the Pubsub sink
to the batching sink framework.

The changes involve:
1. Moves the Pubsub code to match the `SinkClient` interface, moving to using
the lower level v1 pubsub API that lets us publish batches manually
3. Removing the extra call to json.Marshal
4. Moving to using the `pstest` package for validating results in unit tests
5. Adding topic handling to the batching sink, where batches are created
per-topic
6. Added a pubsub_sink_config since it can now handle Retry and Flush config
settings

Release note (performance improvement): pubsub sink changefeeds can now support
higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster
setting.
@blathers-crl blathers-crl bot requested review from a team as code owners April 7, 2023 17:47
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-100188 branch from 7a8b969 to ca5ed74 Compare April 7, 2023 17:47
@blathers-crl blathers-crl bot requested review from herkolategan and renatolabs and removed request for a team April 7, 2023 17:47
@blathers-crl blathers-crl bot force-pushed the blathers/backport-release-23.1-100188 branch from b3e91ea to bd7cfe6 Compare April 7, 2023 17:47
@blathers-crl blathers-crl bot requested review from miretskiy, samiskin, smg260 and shermanCRL and removed request for a team April 7, 2023 17:47
@blathers-crl
Copy link
Author

blathers-crl bot commented Apr 7, 2023

Thanks for opening a backport.

Please check the backport criteria before merging:

  • Patches should only be created for serious issues or test-only changes.
  • Patches should not break backwards-compatibility.
  • Patches should change as little code as possible.
  • Patches should not change on-disk formats or node communication protocols.
  • Patches should not add new functionality.
  • Patches must not add, edit, or otherwise modify cluster versions; or add version gates.
If some of the basic criteria cannot be satisfied, ensure that the exceptional criteria are satisfied within.
  • There is a high priority need for the functionality that cannot wait until the next release and is difficult to address in another way.
  • The new functionality is additive-only and only runs for clusters which have specifically “opted in” to it (e.g. by a cluster setting).
  • New code is protected by a conditional check that is trivial to verify and ensures that it only runs for opt-in clusters.
  • The PM and TL on the team that owns the changed code have signed off that the change obeys the above rules.

Add a brief release justification to the body of your PR to justify this backport.

Some other things to consider:

  • What did we do to ensure that a user that doesn’t know & care about this backport, has no idea that it happened?
  • Will this work in a cluster of mixed patch versions? Did we test that?
  • If a user upgrades a patch version, uses this feature, and then downgrades, what happens?

@blathers-crl blathers-crl bot added blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot. labels Apr 7, 2023
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blathers-backport This is a backport that Blathers created automatically. O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants