Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamingccl: improvements to the random stream test client #59441

Merged
merged 1 commit into from
Feb 9, 2021

Conversation

adityamaru
Copy link
Contributor

@adityamaru adityamaru commented Jan 26, 2021

This change improves on the random stream client to allow for better
testing of the various components of the stream ingestion job.
Specifically:

  • Adds support for specifying number of partitions. For simplicity,
    a partition generates KVs for a particular table span.

  • Generates system KVs (descriptor and namespace) KVs, as the first two
    KVs on the partition stream. I played around with the idea of having a
    separate "system" and "table data" partition, but the code and tests
    became more convoluted, compared to the current approach.

  • Hookup the CDC orderValidator to the random stream client's output.
    This gives us some guarantees that the data being generated is
    semantically correct.

  • Maintain an in-memory copy of all the streamed events, that can be
    efficiently queried. This allows us to compare the ingested KVs to the
    streamed KVs and gain more confidence in our pipeline.

Infroms: #59175

Release note: None

@adityamaru adityamaru requested review from pbardea and a team January 26, 2021 20:25
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with what we talked about offline to track in the PR.

Reviewed 3 of 14 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru)


pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):

// StreamClientValidatorWrapper wraps a Validator and exposes additional methods
// used by stream ingestion to check for correctness.
type StreamClientValidatorWrapper interface {

Making a note, although I think this is addressed in the follow up PR:

Would it make sense to move this somewhere within streamingccl? If we do, does it make sense for this to just be a struct that embeds cdctest.Validator?


pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):

	}

	//// Generate namespace entry.

nit: double comment


pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):

					systemKVs = systemKVs[1:]
				} else {
					// Generate a duplicate KVEvent, and update its timestamp to now().

CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?


pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):

// partitionEvent augments a normal event with the partition it came from.
type partitionEvent struct {

As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.

@adityamaru adityamaru force-pushed the random-client-augmentation branch 2 times, most recently from 6ab8ffa to 7ae44ad Compare February 9, 2021 15:20
Copy link
Contributor Author

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @pbardea)


pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Making a note, although I think this is addressed in the follow up PR:

Would it make sense to move this somewhere within streamingccl? If we do, does it make sense for this to just be a struct that embeds cdctest.Validator?

Yep, I'll be sure to move it in the next PR.


pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

nit: double comment

done.


pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?

done, also set the key to nil every time we encounter a resolved ts to ensure we don't emit a duplicate with a ts less than the resolved ts.


pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):

Previously, pbardea (Paul Bardea) wrote…

As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.

Done.

@adityamaru adityamaru force-pushed the random-client-augmentation branch from 7ae44ad to 37fcaf5 Compare February 9, 2021 16:43
This change improves on the random stream client to allow for better
testing of the various components of the stream ingestion job.
Specifically:

- Adds support for specifying number of partitions. For simplicity,
  a partition generates KVs for a particular table span.

- Generates system KVs (descriptor and namespace) KVs, as the first two
  KVs on the partition stream. I played around with the idea of having a
separate "system" and "table data" partition, but the code and tests
became more convoluted, compared to the current approach.

- Hookup the CDC orderValidator to the random stream client's output.
  This gives us some guarantees that the data being generated is
semantically correct.

- Maintain an in-memory copy of all the streamed events, that can be
  efficiently queried. This allows us to compare the ingested KVs to the
streamed KVs and gain more confidence in our pipeline.

Release note: None
@adityamaru adityamaru force-pushed the random-client-augmentation branch from 37fcaf5 to 8247919 Compare February 9, 2021 17:34
@adityamaru
Copy link
Contributor Author

TFTR!

bors r=pbardea

@craig
Copy link
Contributor

craig bot commented Feb 9, 2021

Build succeeded:

@craig craig bot merged commit 4086c3e into cockroachdb:master Feb 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants