Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: visualize differences with gossiped SystemConfigSpan #73746

Merged
merged 2 commits into from
Dec 16, 2021

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Dec 13, 2021

spanconfig: visualize differences with gossiped SystemConfigSpan

We introduce a test-only package that compares the span configs
infrastructure against the gossiped SystemConfigSpan it was designed to
replace. The datadriven test spits out a visual "diff" of the two
subsystems, indicating exactly where they differ (or not!) in terms of
implied range boundaries and the configurations to apply. It does so by
having both systems run side-by-side and scanning their contents. These
range boundaries and configurations have second-order effects (various
actions are executed on behalf of ranges by internal queues), but we
don't bother with any of that in the test.

Of particular note is the behavior of each system in the presence of
secondary tenants that create arbitrary schemas and zone configs. This
after all is what motivated the new infrastructure. Here's an example,
as a diff, that shows how the system config span is unable to split
within an active tenant's boundary unlike the new thing:

configs version=current offset=37 limit=5
----
...
/Tenant/10                                 range default
/Tenant/11/Table/3                         range default
/Tenant/11/Table/4                         range default
/Tenant/11/Table/5                         range default
/Tenant/11/Table/6                         range default
...

configs version=legacy offset=43
----
...
/Tenant/10                                 range default
/Tenant/11                                 range default

diff offset=31 limit=8
----
--- gossiped system config span (legacy)
+++ span config infrastructure (current)
...
@@ -43,4 +37,37 @@
 /Table/47                                  range system
 /Tenant/10                                 range default
-/Tenant/11                                 range default
+/Tenant/11/Table/3                         range default
+/Tenant/11/Table/4                         range default
+/Tenant/11/Table/5                         range default
+/Tenant/11/Table/6                         range default
...

This commit also gets rid of the ShadowReader. We were using it
previously to manually diagnose differences between the two subsystems.
With the improved testing, we can get rid of the painter's tape.


spanconfig: rm benign differences with SystemConfigSpan

In the previous commit we demonstrated visually the differences between
the gossip-backed system config span and the new infrastructure. This
commit gets rid of the benign differences. We could also just not do
anything here, but it's only a minor change.

Release note: None


Only the last two commits are of interest here, the earlier ones are from #71994.

@irfansharif irfansharif requested a review from a team as a code owner December 13, 2021 17:44
@irfansharif irfansharif requested a review from a team December 13, 2021 17:44
@irfansharif irfansharif requested a review from a team as a code owner December 13, 2021 17:44
@irfansharif irfansharif requested review from otan, a team and stevendanna and removed request for a team, otan and stevendanna December 13, 2021 17:44
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif
Copy link
Contributor Author

(Gentle bump here as well, it's a test only change + plus an entirely optional commit I'm happy to drop. It led to the realization in #73749 which we'll try and get to pre-22.1.)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase this now that #71994 has merged? I think it :lgtm: but it'll be easy to take a last pass after the PR stands alone.

Reviewed 5 of 28 files at r1, 1 of 8 files at r2, 11 of 19 files at r5, 7 of 8 files at r6.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, and @nvanbenschoten)


pkg/ccl/spanconfigccl/spanconfigcomparedccl/datadriven_test.go, line 168 at r6 (raw file):

Quoted 15 lines of code…
					// Now that all tenants have reconciled their SQL changes,
					// read the current time and wait for the kvsubscriber to
					// advance past it. This lets us observe the effects of all:
					// (i)  reconciliation processes;
					// (ii) tenant initializations (where seed span configs are
					//      installed).
					now := systemTenant.Clock().Now()
					testutils.SucceedsSoon(t, func() error {
						lastUpdated := kvSubscriber.LastUpdated()
						if lastUpdated.Less(now) {
							return errors.Newf("kvsubscriber last updated timestamp (%s) lagging barrier timestamp (%s)",
								lastUpdated.GoTime(), now.GoTime())
						}
						return nil
					})

does this SuceedsSoon need to be nested in the other one?

We introduce a test-only package that compares the span configs
infrastructure against the gossiped SystemConfigSpan it was designed to
replace. The datadriven test spits out a visual "diff" of the two
subsystems, indicating exactly where they differ (or not!) in terms of
implied range boundaries and the configurations to apply. It does so by
having both systems run side-by-side and scanning their contents. These
range boundaries and configurations have second-order effects (various
actions are executed on behalf of ranges by internal queues), but we
don't bother with any of that in the test.

Of particular note is the behavior of each system in the presence of
secondary tenants that create arbitrary schemas and zone configs. This
after all is what motivated the new infrastructure. Here's an example,
as a diff, that shows how the system config span is unable to split
within an active tenant's boundary unlike the new thing:

    configs version=current offset=37 limit=5
    ----
    ...
    /Tenant/10                                 range default
    /Tenant/11/Table/3                         range default
    /Tenant/11/Table/4                         range default
    /Tenant/11/Table/5                         range default
    /Tenant/11/Table/6                         range default
    ...

    configs version=legacy offset=43
    ----
    ...
    /Tenant/10                                 range default
    /Tenant/11                                 range default

    diff offset=31 limit=8
    ----
    --- gossiped system config span (legacy)
    +++ span config infrastructure (current)
    ...
    @@ -43,4 +37,37 @@
     /Table/47                                  range system
     /Tenant/10                                 range default
    -/Tenant/11                                 range default
    +/Tenant/11/Table/3                         range default
    +/Tenant/11/Table/4                         range default
    +/Tenant/11/Table/5                         range default
    +/Tenant/11/Table/6                         range default
    ...

This commit also gets rid of the ShadowReader. We were using it
previously to manually diagnose differences between the two subsystems.
With the improved testing, we can get rid of the painter's tape.

Release note: None
In the previous commit we demonstrated visually the differences between
the gossip-backed system config span and the new infrastructure. This
commit gets rid of the benign differences. We could also just not do
anything here, but it's only a minor change.

Release note: None
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebased.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)


pkg/ccl/spanconfigccl/spanconfigcomparedccl/datadriven_test.go, line 168 at r6 (raw file):

Previously, ajwerner wrote…
					// Now that all tenants have reconciled their SQL changes,
					// read the current time and wait for the kvsubscriber to
					// advance past it. This lets us observe the effects of all:
					// (i)  reconciliation processes;
					// (ii) tenant initializations (where seed span configs are
					//      installed).
					now := systemTenant.Clock().Now()
					testutils.SucceedsSoon(t, func() error {
						lastUpdated := kvSubscriber.LastUpdated()
						if lastUpdated.Less(now) {
							return errors.Newf("kvsubscriber last updated timestamp (%s) lagging barrier timestamp (%s)",
								lastUpdated.GoTime(), now.GoTime())
						}
						return nil
					})

does this SuceedsSoon need to be nested in the other one?

It doesn't, un-nested.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 19 files at r5, 5 of 59 files at r7, 8 of 54 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 19 files at r5, 5 of 59 files at r7, 8 of 54 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, and @nvanbenschoten)

@irfansharif
Copy link
Contributor Author

bors r+

@irfansharif
Copy link
Contributor Author

bors r-

@craig
Copy link
Contributor

craig bot commented Dec 16, 2021

Canceled.

@irfansharif
Copy link
Contributor Author

bors r+

@craig
Copy link
Contributor

craig bot commented Dec 16, 2021

Build succeeded:

@craig craig bot merged commit 66c3ff0 into cockroachdb:master Dec 16, 2021
@irfansharif irfansharif deleted the 211211.diff-syscfg-spancfg branch December 16, 2021 14:05
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Dec 23, 2021
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a
benign difference between the span configs infrastructure and the system
config span. Whe nnabling span configs by default (cockroachdb#73876), we observed
a wide blast radius with respect to test failures. This was in large
part due to assumptions we've baked in regarding the number of splits we
should expect at cluster start, often waiting for the same number of
ranges to form before executing the rest of the test.

The differences between the SystemConfigSpan and the span configs
infrastructure in how they handled (empty) pseudo table keyspaces makes
for an annoying large list of tests to adjust manually. Instead we take
the lazy route and generate empty ranges for pseudo table IDs as we did
before. We can get rid of this special handling once we actually get rid
of the system config span, and adjust all these tests accordingly. Doing
it later is also preferable should we need to revert pieces of cockroachdb#73876
due to unforeseen instability.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Dec 24, 2021
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a
benign difference between the span configs infrastructure and the system
config span. Whe nnabling span configs by default (cockroachdb#73876), we observed
a wide blast radius with respect to test failures. This was in large
part due to assumptions we've baked in regarding the number of splits we
should expect at cluster start, often waiting for the same number of
ranges to form before executing the rest of the test.

The differences between the SystemConfigSpan and the span configs
infrastructure in how they handled (empty) pseudo table keyspaces makes
for an annoying large list of tests to adjust manually. Instead we take
the lazy route and generate empty ranges for pseudo table IDs as we did
before. We can get rid of this special handling once we actually get rid
of the system config span, and adjust all these tests accordingly. Doing
it later is also preferable should we need to revert pieces of cockroachdb#73876
due to unforeseen instability.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jan 4, 2022
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a
benign difference between the span configs infrastructure and the system
config span. Whe nnabling span configs by default (cockroachdb#73876), we observed
a wide blast radius with respect to test failures. This was in large
part due to assumptions we've baked in regarding the number of splits we
should expect at cluster start, often waiting for the same number of
ranges to form before executing the rest of the test.

The differences between the SystemConfigSpan and the span configs
infrastructure in how they handled (empty) pseudo table keyspaces makes
for an annoying large list of tests to adjust manually. Instead we take
the lazy route and generate empty ranges for pseudo table IDs as we did
before. We can get rid of this special handling once we actually get rid
of the system config span, and adjust all these tests accordingly. Doing
it later is also preferable should we need to revert pieces of cockroachdb#73876
due to unforeseen instability.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Jan 5, 2022
This commit is similar in spirit to cockroachdb#73746 in that it gets rid of a
benign difference between the span configs infrastructure and the system
config span. Whe nnabling span configs by default (cockroachdb#73876), we observed
a wide blast radius with respect to test failures. This was in large
part due to assumptions we've baked in regarding the number of splits we
should expect at cluster start, often waiting for the same number of
ranges to form before executing the rest of the test.

The differences between the SystemConfigSpan and the span configs
infrastructure in how they handled (empty) pseudo table keyspaces makes
for an annoying large list of tests to adjust manually. Instead we take
the lazy route and generate empty ranges for pseudo table IDs as we did
before. We can get rid of this special handling once we actually get rid
of the system config span, and adjust all these tests accordingly. Doing
it later is also preferable should we need to revert pieces of cockroachdb#73876
due to unforeseen instability.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 5, 2022
74171: spanconfig: carve empty ranges for pseudo-table IDs r=irfansharif a=irfansharif

This commit is similar in spirit to #73746 in that it gets rid of a
benign difference between the span configs infrastructure and the system
config span. When enabling span configs by default (#73876), we observed
a wide blast radius with respect to test failures. This was in large
part due to assumptions we've baked in regarding the number of splits we
should expect at cluster start, often waiting for the same number of
ranges to form before executing the rest of the test.

The differences between the SystemConfigSpan and the span configs
infrastructure in how they handled (empty) pseudo table keyspaces makes
for an annoying large list of tests to adjust manually. Instead we take
the lazy route and generate empty ranges for pseudo table IDs as we did
before. We can get rid of this special handling once we actually get rid
of the system config span, and adjust all these tests accordingly. Doing
it later is also preferable should we need to revert pieces of #73876
due to unforeseen instability.

Release note: None

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants