-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig/sqlwatcher: hide factory pattern, speed up tests, surface no-op checkpoints periodically #73171
spanconfig/sqlwatcher: hide factory pattern, speed up tests, surface no-op checkpoints periodically #73171
Conversation
Using a more aggressive closed timestamp target duration brings the runtime for TestSQLWatcherReactsToUpdate down from 45s+ to single digits. To speed it up, we also just re-use the same sqlwatcher instead of creating a new one for every test case. Other tests in the package benefit from a similar treatment. Using the default target duration in fact masked a buggy test; in a future commit we end up rewrite that test so it's skipped for now. Release note: None
dea62dd
to
76d236d
Compare
The only mutable state across concurrent WatchForSQLUpdate calls was the internal buffer, which does not need to hang off the surrounding struct. This obviates the need for the factory pattern we were using -- callers can set up multiple SQLWatchers concurrently as is (see TestSQLWatcherMultiple). This commit simply hides the factory under the package boundary; in a future commit we'll shed it altogether. This has the benefit of reducing the number of symbols in pkg/spanconfig and making it symmetric with the other spanconfig dependencies typically found together (KVAccessor, SQLTranslator). It's also every-so-slightly less verbose to use in the upcoming spanconfig.Reconciler (cockroachdb#71994). Release note: None
A previous commit obviated the need for a factory altogether, but went only as far hiding it within the package boundary. This commit gets rid of it entirely. Release note: None
76d236d
to
6d738b9
Compare
6d738b9
to
f8ae233
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r1, 10 of 10 files at r2, 1 of 1 files at r3, 6 of 6 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @irfansharif)
pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher.go, line 165 at r4 (raw file):
return err } if len(events) == 0 && !checkpointNoops.ShouldProcess(timeutil.Now()) {
Would it be straightforward to write a test for this?
pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher_test.go, line 371 at r1 (raw file):
return default: time.Sleep(100 * time.Millisecond)
Why was this needed?
pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher_test.go, line 320 at r2 (raw file):
} wg.Wait()
nit: consider pulling this call into the branch that includes the goroutines it's waiting on.
If `system.{descriptor,zones}` is static, we would previously discard all checkpoint events. This is not what we want -- we'd still like to know how caught up we are, with ongoing updates or otherwise. MVCC GC, after all, can still happen; if our last checkpoint was when the last update occurred, we may end up doing a lot of unnecessary work when finding out our last checkpoint was GC-ed from underneath us. Lack of periodic checkpoints also makes tests difficult to write -- you'd have to induce a benign update to flush out all earlier ones. This latent flakiness was uncovered after speeding up some existing tests in an earlier commit. NB: For incremental (possibly noop) checkpoints, we need to ensure that the sqlwatcher's buffer is initialized with a low watermark. When flushing events, we take the most conservative timestamp. If not initialized to a high water, this might be 0 -- violating the sqlwatcher's monotonically increasing checkpoint ts invariant. Release note: None
f8ae233
to
d00baeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nathan! Merging on green.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)
pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher.go, line 165 at r4 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Would it be straightforward to write a test for this?
It would be, done.
pkg/spanconfig/spanconfigsqlwatcher/sqlwatcher_test.go, line 371 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Why was this needed?
We were busy looping before. The wait isn't "needed" for anything, but I don't think busy looping is what we wanted here either.
bors r+ |
Build succeeded: |
We introduce two testing knobs that we'll use in future commits: - SQLWatcherCheckpointNoopsEveryDurationOverride, if set, will override how often the SQLWatcher checkpoints no-ops (follow up to cockroachdb#73171) - ExcludeDroppedDescriptorsFromLookup will control whether the SQLTranslator ignores dropped descriptors. We'll also use this knob for the Reconciler (cockroachdb#71994), for the same purpose. Ignoring dropped descriptors in tests is a convenient (and faster!) alternative to waiting for the descriptor to actually get GC-ed by the GC job. It also makes it easier to order events transactionally -- it's sufficient for a checkpoint to cross the timestamp when the table is dropped, instead of waiting for the GC job to pick it up post TTL. This latter form is used in our current sqltranslator tests where we sleep arbitrarily: # Sleep for 5 seconds, which is more than the TTL on db.t1, so that # the gc job can delete the descriptor. sleep duration=5 This is unfortunate and makes for slower tests than necessary. We can get rid of it in a subsequent commit. Release note: None
We introduce two testing knobs that we'll use in future commits: - SQLWatcherCheckpointNoopsEveryDurationOverride, if set, will override how often the SQLWatcher checkpoints no-ops (follow up to cockroachdb#73171) - ExcludeDroppedDescriptorsFromLookup will control whether the SQLTranslator ignores dropped descriptors. We'll also use this knob for the Reconciler (cockroachdb#71994), for the same purpose. Ignoring dropped descriptors in tests is a convenient (and faster!) alternative to waiting for the descriptor to actually get GC-ed by the GC job. It also makes it easier to order events transactionally -- it's sufficient for a checkpoint to cross the timestamp when the table is dropped, instead of waiting for the GC job to pick it up post TTL. This latter form is used in our current sqltranslator tests where we sleep arbitrarily: # Sleep for 5 seconds, which is more than the TTL on db.t1, so that # the gc job can delete the descriptor. sleep duration=5 This is unfortunate and makes for slower tests than necessary. We can get rid of it in a subsequent commit. Release note: None
We introduce two testing knobs that we'll use in future commits: - SQLWatcherCheckpointNoopsEveryDurationOverride, if set, will override how often the SQLWatcher checkpoints no-ops (follow up to cockroachdb#73171) - ExcludeDroppedDescriptorsFromLookup will control whether the SQLTranslator ignores dropped descriptors. We'll also use this knob for the Reconciler (cockroachdb#71994), for the same purpose. Ignoring dropped descriptors in tests is a convenient (and faster!) alternative to waiting for the descriptor to actually get GC-ed by the GC job. It also makes it easier to order events transactionally -- it's sufficient for a checkpoint to cross the timestamp when the table is dropped, instead of waiting for the GC job to pick it up post TTL. This latter form is used in our current sqltranslator tests where we sleep arbitrarily: # Sleep for 5 seconds, which is more than the TTL on db.t1, so that # the gc job can delete the descriptor. sleep duration=5 This is unfortunate and makes for slower tests than necessary. We can get rid of it in a subsequent commit. Release note: None
73416: spanconfig/{sqltranslator,testcluster}: support multi-tenant tests r=irfansharif a=irfansharif **spanconfig/{sqltranslator,testcluster}: support multi-tenant tests** We introduce a nimble testing framework for multi-tenant span config tests, and use it to power tests for spanconfig.SQLTranslator when run on behalf of secondary tenants. This framework will see further addition when testing spanconfig.Reconciler -- we'll maintain a handle on each tenant's reconciler, kvaccessor, and use both to determine what span config mutations are made after SQL changes. This commit also leverages the ExcludeDroppedDescriptorsFromLookup testing knob we introduced in an earlier commit -- it obviates the need for arbitrary sleeps (which are both fragile and slow). It also packages a PrintSpanConfigDiffedAgainstDefaults helper that we'll use in a future PR (#71994). **sqltranslatorccl: carve standalone package for tests** Rename-only commit. We'll use subpackages within ccl/spanconfigccl when testing other components (spanconfig.Reconciler; [#71994](#71994)). **spanconfig/{watcher,translator}: add testing knobs** We introduce two testing knobs that we'll use in future commits: - SQLWatcherCheckpointNoopsEveryDurationOverride, if set, will override how often the SQLWatcher checkpoints no-ops (follow up to [#73171](#73171)) - ExcludeDroppedDescriptorsFromLookup will control whether the SQLTranslator ignores dropped descriptors. We'll also use this knob for the Reconciler ([#71994](#71994)), for the same purpose. Ignoring dropped descriptors in tests is a convenient (and faster!) alternative to waiting for the descriptor to actually get GC-ed by the GC job. It also makes it easier to order events transactionally -- it's sufficient for a checkpoint to cross the timestamp when the table is dropped, instead of waiting for the GC job to pick it up post TTL. This latter form is used in our current sqltranslator tests where we sleep arbitrarily: # Sleep for 5 seconds, which is more than the TTL on db.t1, so that # the gc job can delete the descriptor. sleep duration=5 This is unfortunate and makes for slower tests than necessary. We can get rid of it in a subsequent commit. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
See individual commits.
spanconfig/sqlwatcher: speed up tests
Using a more aggressive closed timestamp target duration brings the runtime for
TestSQLWatcherReactsToUpdate down from 45s+ to single digits. To speed it up,
we also just re-use the same SQLWatcher instead of creating a new one for every
test case.
Other tests in the package benefit from a similar treatment. Using the default
target duration in fact masked a buggy test; in a future commit we end up
rewrite that test so it's skipped for now.
spanconfig/sqlwatcher: hide the factory pattern
The only mutable state across concurrent WatchForSQLUpdate calls was the
internal buffer, which does not need to hang off the surrounding struct. This
obviates the need for the factory pattern we were using -- callers can set up
multiple SQLWatchers concurrently as is (see TestSQLWatcherMultiple).
This PR first hides the factory under the package boundary; in a later commit
it shed it altogether. This has the benefit of reducing the number of symbols
in pkg/spanconfig and making it symmetric with the other spanconfig
dependencies typically found together (KVAccessor, SQLTranslator). It's also
every-so-slightly less verbose to use in the upcoming spanconfig.Reconciler
(#71994).
spanconfig/sqlwatcher: checkpoint, even without updates
If
system.{descriptor,zones}
is static, we would previously discardall checkpoint events. This is not what we want -- we'd still like to
know how caught up we are, with ongoing updates or otherwise.
MVCC GC, after all, can still happen; if our last checkpoint was when
the last update occurred, we may end up doing a lot of unnecessary work
when finding out our last checkpoint was GC-ed from underneath us. Lack
of periodic checkpoints also makes tests difficult to write -- you'd
have to induce a benign update to flush out all earlier ones. This
latent flakiness was uncovered after speeding up some existing tests in
an earlier commit.
NB: For incremental (possibly noop) checkpoints, we need to ensure that
the sqlwatcher's buffer is initialized with a low watermark. When
flushing events, we take the most conservative timestamp. If not
initialized to a high water, this might be 0 -- violating the
sqlwatcher's monotonically increasing checkpoint ts invariant.
Release note: None