-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig: visualize differences with gossiped SystemConfigSpan #73746
spanconfig: visualize differences with gossiped SystemConfigSpan #73746
Conversation
994a8b7
to
6929ad7
Compare
(Gentle bump here as well, it's a test only change + plus an entirely optional commit I'm happy to drop. It led to the realization in #73749 which we'll try and get to pre-22.1.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rebase this now that #71994 has merged? I think it but it'll be easy to take a last pass after the PR stands alone.
Reviewed 5 of 28 files at r1, 1 of 8 files at r2, 11 of 19 files at r5, 7 of 8 files at r6.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, and @nvanbenschoten)
pkg/ccl/spanconfigccl/spanconfigcomparedccl/datadriven_test.go, line 168 at r6 (raw file):
Quoted 15 lines of code…
// Now that all tenants have reconciled their SQL changes, // read the current time and wait for the kvsubscriber to // advance past it. This lets us observe the effects of all: // (i) reconciliation processes; // (ii) tenant initializations (where seed span configs are // installed). now := systemTenant.Clock().Now() testutils.SucceedsSoon(t, func() error { lastUpdated := kvSubscriber.LastUpdated() if lastUpdated.Less(now) { return errors.Newf("kvsubscriber last updated timestamp (%s) lagging barrier timestamp (%s)", lastUpdated.GoTime(), now.GoTime()) } return nil })
does this SuceedsSoon
need to be nested in the other one?
6929ad7
to
ab2768f
Compare
We introduce a test-only package that compares the span configs infrastructure against the gossiped SystemConfigSpan it was designed to replace. The datadriven test spits out a visual "diff" of the two subsystems, indicating exactly where they differ (or not!) in terms of implied range boundaries and the configurations to apply. It does so by having both systems run side-by-side and scanning their contents. These range boundaries and configurations have second-order effects (various actions are executed on behalf of ranges by internal queues), but we don't bother with any of that in the test. Of particular note is the behavior of each system in the presence of secondary tenants that create arbitrary schemas and zone configs. This after all is what motivated the new infrastructure. Here's an example, as a diff, that shows how the system config span is unable to split within an active tenant's boundary unlike the new thing: configs version=current offset=37 limit=5 ---- ... /Tenant/10 range default /Tenant/11/Table/3 range default /Tenant/11/Table/4 range default /Tenant/11/Table/5 range default /Tenant/11/Table/6 range default ... configs version=legacy offset=43 ---- ... /Tenant/10 range default /Tenant/11 range default diff offset=31 limit=8 ---- --- gossiped system config span (legacy) +++ span config infrastructure (current) ... @@ -43,4 +37,37 @@ /Table/47 range system /Tenant/10 range default -/Tenant/11 range default +/Tenant/11/Table/3 range default +/Tenant/11/Table/4 range default +/Tenant/11/Table/5 range default +/Tenant/11/Table/6 range default ... This commit also gets rid of the ShadowReader. We were using it previously to manually diagnose differences between the two subsystems. With the improved testing, we can get rid of the painter's tape. Release note: None
In the previous commit we demonstrated visually the differences between the gossip-backed system config span and the new infrastructure. This commit gets rid of the benign differences. We could also just not do anything here, but it's only a minor change. Release note: None
ab2768f
to
f6bc6fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebased.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)
pkg/ccl/spanconfigccl/spanconfigcomparedccl/datadriven_test.go, line 168 at r6 (raw file):
Previously, ajwerner wrote…
// Now that all tenants have reconciled their SQL changes, // read the current time and wait for the kvsubscriber to // advance past it. This lets us observe the effects of all: // (i) reconciliation processes; // (ii) tenant initializations (where seed span configs are // installed). now := systemTenant.Clock().Now() testutils.SucceedsSoon(t, func() error { lastUpdated := kvSubscriber.LastUpdated() if lastUpdated.Less(now) { return errors.Newf("kvsubscriber last updated timestamp (%s) lagging barrier timestamp (%s)", lastUpdated.GoTime(), now.GoTime()) } return nil })
does this
SuceedsSoon
need to be nested in the other one?
It doesn't, un-nested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 19 files at r5, 5 of 59 files at r7, 8 of 54 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 19 files at r5, 5 of 59 files at r7, 8 of 54 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)
bors r+ |
bors r- |
Canceled. |
bors r+ |
Build succeeded: |
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a benign difference between the span configs infrastructure and the system config span. Whe nnabling span configs by default (cockroachdb#73876), we observed a wide blast radius with respect to test failures. This was in large part due to assumptions we've baked in regarding the number of splits we should expect at cluster start, often waiting for the same number of ranges to form before executing the rest of the test. The differences between the SystemConfigSpan and the span configs infrastructure in how they handled (empty) pseudo table keyspaces makes for an annoying large list of tests to adjust manually. Instead we take the lazy route and generate empty ranges for pseudo table IDs as we did before. We can get rid of this special handling once we actually get rid of the system config span, and adjust all these tests accordingly. Doing it later is also preferable should we need to revert pieces of cockroachdb#73876 due to unforeseen instability. Release note: None
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a benign difference between the span configs infrastructure and the system config span. Whe nnabling span configs by default (cockroachdb#73876), we observed a wide blast radius with respect to test failures. This was in large part due to assumptions we've baked in regarding the number of splits we should expect at cluster start, often waiting for the same number of ranges to form before executing the rest of the test. The differences between the SystemConfigSpan and the span configs infrastructure in how they handled (empty) pseudo table keyspaces makes for an annoying large list of tests to adjust manually. Instead we take the lazy route and generate empty ranges for pseudo table IDs as we did before. We can get rid of this special handling once we actually get rid of the system config span, and adjust all these tests accordingly. Doing it later is also preferable should we need to revert pieces of cockroachdb#73876 due to unforeseen instability. Release note: None
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a benign difference between the span configs infrastructure and the system config span. Whe nnabling span configs by default (cockroachdb#73876), we observed a wide blast radius with respect to test failures. This was in large part due to assumptions we've baked in regarding the number of splits we should expect at cluster start, often waiting for the same number of ranges to form before executing the rest of the test. The differences between the SystemConfigSpan and the span configs infrastructure in how they handled (empty) pseudo table keyspaces makes for an annoying large list of tests to adjust manually. Instead we take the lazy route and generate empty ranges for pseudo table IDs as we did before. We can get rid of this special handling once we actually get rid of the system config span, and adjust all these tests accordingly. Doing it later is also preferable should we need to revert pieces of cockroachdb#73876 due to unforeseen instability. Release note: None
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a benign difference between the span configs infrastructure and the system config span. Whe nnabling span configs by default (cockroachdb#73876), we observed a wide blast radius with respect to test failures. This was in large part due to assumptions we've baked in regarding the number of splits we should expect at cluster start, often waiting for the same number of ranges to form before executing the rest of the test. The differences between the SystemConfigSpan and the span configs infrastructure in how they handled (empty) pseudo table keyspaces makes for an annoying large list of tests to adjust manually. Instead we take the lazy route and generate empty ranges for pseudo table IDs as we did before. We can get rid of this special handling once we actually get rid of the system config span, and adjust all these tests accordingly. Doing it later is also preferable should we need to revert pieces of cockroachdb#73876 due to unforeseen instability. Release note: None
74171: spanconfig: carve empty ranges for pseudo-table IDs r=irfansharif a=irfansharif This commit is similar in spirit to #73746 in that it gets rid of a benign difference between the span configs infrastructure and the system config span. When enabling span configs by default (#73876), we observed a wide blast radius with respect to test failures. This was in large part due to assumptions we've baked in regarding the number of splits we should expect at cluster start, often waiting for the same number of ranges to form before executing the rest of the test. The differences between the SystemConfigSpan and the span configs infrastructure in how they handled (empty) pseudo table keyspaces makes for an annoying large list of tests to adjust manually. Instead we take the lazy route and generate empty ranges for pseudo table IDs as we did before. We can get rid of this special handling once we actually get rid of the system config span, and adjust all these tests accordingly. Doing it later is also preferable should we need to revert pieces of #73876 due to unforeseen instability. Release note: None Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
spanconfig: visualize differences with gossiped SystemConfigSpan
We introduce a test-only package that compares the span configs
infrastructure against the gossiped SystemConfigSpan it was designed to
replace. The datadriven test spits out a visual "diff" of the two
subsystems, indicating exactly where they differ (or not!) in terms of
implied range boundaries and the configurations to apply. It does so by
having both systems run side-by-side and scanning their contents. These
range boundaries and configurations have second-order effects (various
actions are executed on behalf of ranges by internal queues), but we
don't bother with any of that in the test.
Of particular note is the behavior of each system in the presence of
secondary tenants that create arbitrary schemas and zone configs. This
after all is what motivated the new infrastructure. Here's an example,
as a diff, that shows how the system config span is unable to split
within an active tenant's boundary unlike the new thing:
This commit also gets rid of the ShadowReader. We were using it
previously to manually diagnose differences between the two subsystems.
With the improved testing, we can get rid of the painter's tape.
spanconfig: rm benign differences with SystemConfigSpan
In the previous commit we demonstrated visually the differences between
the gossip-backed system config span and the new infrastructure. This
commit gets rid of the benign differences. We could also just not do
anything here, but it's only a minor change.
Release note: None
Only the last two commits are of interest here, the earlier ones are from #71994.