-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce emission strategy into CockroachDB Watch API #2120
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
4 times, most recently
from
November 6, 2024 13:50
24c72be
to
1862861
Compare
github-actions
bot
added
the
area/tooling
Affects the dev or user toolchain (e.g. tests, ci, build tools)
label
Nov 6, 2024
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
4 times, most recently
from
November 6, 2024 14:22
72c6109
to
de69cc2
Compare
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
from
November 6, 2024 16:58
de69cc2
to
384c649
Compare
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
2 times, most recently
from
November 6, 2024 19:38
611ba5a
to
d9300de
Compare
github-actions
bot
removed
the
area/tooling
Affects the dev or user toolchain (e.g. tests, ci, build tools)
label
Nov 6, 2024
josephschorr
reviewed
Nov 6, 2024
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
from
November 6, 2024 20:00
d9300de
to
e51d42d
Compare
josephschorr
previously approved these changes
Nov 6, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
from
November 7, 2024 09:57
e51d42d
to
cbac7b9
Compare
…ockroachDB changefeeds. In CRDB, checkpointing for a span only occurs once its catch-up scan completes. As a result, resolved timestamps for the changefeed won't be emitted until all spans have been checkpointed and moved past the initial high-water mark. Unfortunately, no workarounds exist to make the changefeed emit resolved timestamps during a catch-up scan. This is a known issue on CRL's radar. This is important because, without checkpoints, it means changes are accumulated in memory, and depending on the use case can lead to increased memory pressure and out-of-memory errors This commit introduces a new change emission strategy option into WatchOptions: - `EmitWhenCheckpointedStrategy` is the original strategy, which accumulates until a checkpoint is emitted. - `EmitImmediatelyStrategy` is the new strategy that streams changes right away. It's important to note that makes the client responsible of: - deduplicating changes - ordering revisions - accumulate changes until a checkpoint is emitted Essentially the client is responsible for doing all the work that used to be done with the `Change` struct helper. For every other datastore, calling Watch API with `EmitImmediatelyStrategy` will return an error. We don't consider there is an evident need at the moment for other data stores. Once we've proven this strategy is viable in production workloads, we will study implementing a Watch API strategy that buffers to disk to address out-of-memory issues, and potentially deprecate the emit immediately strategy in favor of one that does all the heavy lifting for the client.
vroldanbet
force-pushed
the
crdb-emission-strategy
branch
from
November 7, 2024 10:32
cbac7b9
to
6c4ceb2
Compare
adinhodovic
approved these changes
Nov 7, 2024
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
depends on #2122
introduces
WatchOptions.EmissionStrategy
to handle limitations of CockroachDB changefeeds.In CRDB, checkpointing for a span only occurs once its catch-up scan completes. As a result, resolved timestamps for the changefeed won't be emitted until all spans have been checkpointed and moved past the initial high-water mark.
Unfortunately, no workarounds exist to make the changefeed emit resolved timestamps during a catch-up scan. This is a known issue on CRL's radar.
This is important because, without checkpoints, it means changes are accumulated in memory, and depending on the use case can lead to increased memory pressure and out-of-memory errors
This commit introduces a new change emission strategy option into WatchOptions:
EmitWhenCheckpointedStrategy
is the original strategy, which accumulates until a checkpoint is emitted.EmitImmediatelyStrategy
is the new strategy that streams changes right away. It's important to note that makes the client responsible of:Essentially the client is responsible for doing all the work that used to be done with the
Change
struct helper.For every other datastore, calling Watch API with
EmitImmediatelyStrategy
will return an error. We don't consider there is an evident need at the moment for other data stores. Once we've proven this strategy is viable in production workloads, we will study implementing a Watch API strategy that buffers to disk to address out-of-memory issues, and potentially deprecate the emit immediately strategy in favor of one that does all the heavy lifting for the client.