-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
streamingccl: improvements to the random stream test client #59441
streamingccl: improvements to the random stream test client #59441
Conversation
da8ca88
to
645dcec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with what we talked about offline to track in the PR.
Reviewed 3 of 14 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru)
pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):
// StreamClientValidatorWrapper wraps a Validator and exposes additional methods // used by stream ingestion to check for correctness. type StreamClientValidatorWrapper interface {
Making a note, although I think this is addressed in the follow up PR:
Would it make sense to move this somewhere within streamingccl
? If we do, does it make sense for this to just be a struct that embeds cdctest.Validator
?
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):
} //// Generate namespace entry.
nit: double comment
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):
systemKVs = systemKVs[1:] } else { // Generate a duplicate KVEvent, and update its timestamp to now().
CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?
pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):
// partitionEvent augments a normal event with the partition it came from. type partitionEvent struct {
As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.
6ab8ffa
to
7ae44ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Making a note, although I think this is addressed in the follow up PR:
Would it make sense to move this somewhere within
streamingccl
? If we do, does it make sense for this to just be a struct that embedscdctest.Validator
?
Yep, I'll be sure to move it in the next PR.
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
nit: double comment
done.
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?
done, also set the key to nil every time we encounter a resolved ts to ensure we don't emit a duplicate with a ts less than the resolved ts.
pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.
Done.
7ae44ad
to
37fcaf5
Compare
This change improves on the random stream client to allow for better testing of the various components of the stream ingestion job. Specifically: - Adds support for specifying number of partitions. For simplicity, a partition generates KVs for a particular table span. - Generates system KVs (descriptor and namespace) KVs, as the first two KVs on the partition stream. I played around with the idea of having a separate "system" and "table data" partition, but the code and tests became more convoluted, compared to the current approach. - Hookup the CDC orderValidator to the random stream client's output. This gives us some guarantees that the data being generated is semantically correct. - Maintain an in-memory copy of all the streamed events, that can be efficiently queried. This allows us to compare the ingested KVs to the streamed KVs and gain more confidence in our pipeline. Release note: None
37fcaf5
to
8247919
Compare
TFTR! bors r=pbardea |
Build succeeded: |
This change improves on the random stream client to allow for better
testing of the various components of the stream ingestion job.
Specifically:
Adds support for specifying number of partitions. For simplicity,
a partition generates KVs for a particular table span.
Generates system KVs (descriptor and namespace) KVs, as the first two
KVs on the partition stream. I played around with the idea of having a
separate "system" and "table data" partition, but the code and tests
became more convoluted, compared to the current approach.
Hookup the CDC orderValidator to the random stream client's output.
This gives us some guarantees that the data being generated is
semantically correct.
Maintain an in-memory copy of all the streamed events, that can be
efficiently queried. This allows us to compare the ingested KVs to the
streamed KVs and gain more confidence in our pipeline.
Infroms: #59175
Release note: None