Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zonepb: make subzone DiffWithZone more accurate #62839

Merged
merged 1 commit into from
Apr 1, 2021

Conversation

otan
Copy link
Contributor

@otan otan commented Mar 31, 2021

  • Subzones may be defined in a different order. We did not take this
    into account which can cause bugs when e.g. ADD REGION adds a subzone
    in the end rather than in the old "expected" location in the subzones
    array. This has been fixed by comparing subzones using an unordered
    map.
  • The ApplyZoneConfig we previously did overwrote subzone fields on the
    original subzone array element, meaning that if there was a mismatch
    it would not be reported through validation. This is now fixed by
    applying the expected zone config to *zonepb.NewZoneConfig() instead.
  • Added logic to only check for zone config matches subzones from
    active subzone IDs.
  • Improve the error messaging when a subzone config is mismatching -
    namely, add index and partitioning information and differentiate
    between missing fields and missing / extraneous zone configs

Resolves #62790

Release note (bug fix): Fixed validation bugs during ALTER TABLE ... SET
LOCALITY / crdb_internal.validate_multi_region_zone_config where
validation errors could occur when the database of a REGIONAL BY ROW
table has a new region added. Also fix a validation bug partition zone
mismatches configs were not caught.

@otan otan requested review from ajstorm, a team and adityamaru and removed request for a team March 31, 2021 01:29
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @ajstorm)


pkg/ccl/logictestccl/testdata/logic_test/multi_region_zone_configs, line 323 at r2 (raw file):

ALTER table regional_by_row CONFIGURE ZONE USING gc.ttlseconds = 10

statement error zone configuration for table regional_by_row contains incorrectly configured field "num_replicas"

oops this is wrong, one second

@otan otan force-pushed the check_index_correctly branch 5 times, most recently from ac77025 to f47e5c3 Compare March 31, 2021 11:54
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2, 5 of 5 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @otan)


pkg/ccl/logictestccl/testdata/logic_test/multi_region_zone_configs, line 323 at r2 (raw file):

Previously, otan (Oliver Tan) wrote…

oops this is wrong, one second

?


pkg/ccl/logictestccl/testdata/logic_test/multi_region_zone_configs, line 468 at r3 (raw file):

SET override_multi_region_zone_config = false

statement error attempting to update zone config which is missing an expected zone configuration for partition "ap-southeast-2" of regional_by_row@idx

Nit: Worth calling crdb_internal.validate_multi_region_zone_configs() before this call?


pkg/config/zonepb/zone.go, line 774 at r3 (raw file):

		if !found {
			// Zone configs could be extraneously defined as long as it does not
			// differ between fieldLists.

Nit: I found this comment a bit confusing. Do we mean that there can be an extra zone config defined so long as it doesn't have any fields in the fieldList set?


pkg/config/zonepb/zone.go, line 811 at r3 (raw file):

	}

	// Anything remaining in the map can be presumed to be missing.

Nit: ...provided that there's something set that matches a field in the fieldList?


pkg/config/zonepb/zone.go, line 826 at r3 (raw file):

		}
	}
	return true, DiffWithZoneMismatch{}, nil

Wow. Nice catch! The zone config possibilities are pretty intense.


pkg/sql/region_util.go, line 456 at r3 (raw file):

func isPlaceholderZoneConfigForMultiRegion(zc zonepb.ZoneConfig) bool {
	// Placeholders must have at least 1 subzone.
	if len(zc.Subzones) < 1 {

Nit: Should this be == 0, to be a bit more clear?


pkg/sql/region_util.go, line 570 at r3 (raw file):

	}

	// Mark the NumReplicas as 0 if we have a placeholder.

Nit: I actually like the old comment better. If you really prefer the new comment, can you add in a reference to "See zonepb.IsSubzonePlaceholder for why this is necessary." I personally find this whole placeholder concept tricky, and I suspect others will too.


pkg/sql/region_util.go, line 1349 at r3 (raw file):

	// Remove inactive subzones from the comparison.
	filteredSubzones := currentZoneConfig.Subzones[:0]

Nit: Did you add this out of an abundance of caution? Or did you hit a case where there were inactive indexes with defined zone configs? I'd think we'd want to root out those cases in addition to handling them here.


pkg/sql/region_util.go, line 1363 at r3 (raw file):

	// Mark the expected NumReplicas as 0 if we have a placeholder
	// and the current zone config is also a placeholder.
	// The latter check is requires as in cases where non-multiregion fields

Nit: required


pkg/sql/region_util.go, line 1364 at r3 (raw file):

	// and the current zone config is also a placeholder.
	// The latter check is requires as in cases where non-multiregion fields
	// is set on the current zone config, the expected zone config has needs

Nit: is -> are, also there's an extra "has"


pkg/sql/region_util.go, line 1367 at r3 (raw file):

	// the placeholder marked so that DiffWithZone does not error when
	// num_replicas is expectedly different.
	if currentZoneConfig.IsSubzonePlaceholder() && isPlaceholderZoneConfigForMultiRegion(expectedZoneConfig) {

I don't understand this check. If expected is a placeholder, won't its NumReplicas already be set to 0?

Copy link
Contributor Author

@otan otan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @ajstorm)


pkg/ccl/logictestccl/testdata/logic_test/multi_region_zone_configs, line 323 at r2 (raw file):

Previously, ajstorm (Adam Storm) wrote…

?

earlier revision had the wrong result a few lines up.


pkg/config/zonepb/zone.go, line 774 at r3 (raw file):

there can be an extra zone config defined so long as it doesn't have any fields in the fieldList set
much cleaner. comment change.


pkg/config/zonepb/zone.go, line 826 at r3 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Wow. Nice catch! The zone config possibilities are pretty intense.

image.png


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Nit: Did you add this out of an abundance of caution? Or did you hit a case where there were inactive indexes with defined zone configs? I'd think we'd want to root out those cases in addition to handling them here.

There were indexes that are being cleaned up during, e.g. transitioning from RBR -> Global which appears to not get cleaned up immediately, in which the zone config hangs around for a while. If these were a concern, I'd say delete this filtering and you'll see a few tests fail, e.g. go test ./pkg/ccl/telemetryccl -run 'TestTelemetry/multiregion/server'


pkg/sql/region_util.go, line 1367 at r3 (raw file):

Previously, ajstorm (Adam Storm) wrote…

I don't understand this check. If expected is a placeholder, won't its NumReplicas already be set to 0?

nope, it is nil. we have to set it by ourselves.
BUT if current zone config is not a placeholder because e.g. gc.ttlseconds is set, then we fudge expected zone config to be a placeholder.

* Subzones may be defined in a different order. We did not take this
  into account which can cause bugs when e.g. ADD REGION adds a subzone
  in the end rather than in the old "expected" location in the subzones
  array. This has been fixed by comparing subzones using an unordered
  map.
* The ApplyZoneConfig we previously did overwrote subzone fields on the
  original subzone array element, meaning that if there was a mismatch
  it would not be reported through validation. This is now fixed by
  applying the expected zone config to *zonepb.NewZoneConfig() instead.
* Added logic to only check for zone config matches subzones from
  active subzone IDs.
* Improve the error messaging when a subzone config is mismatching -
  namely, add index and partitioning information and differentiate
  between missing fields and missing / extraneous zone configs

Release note (bug fix): Fixed validation bugs during ALTER TABLE ... SET
LOCALITY / crdb_internal.validate_multi_region_zone_config where
validation errors could occur when the database of a REGIONAL BY ROW
table has a new region added. Also fix a validation bug partition zone
mismatches configs were not caught.
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 3 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajstorm, and @otan)


pkg/config/zonepb/zone.go, line 826 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

image.png

The zone configs giveth, and the zone configs taketh away. Now we just have to figure out how to get them to giveth a bit.


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

There were indexes that are being cleaned up during, e.g. transitioning from RBR -> Global which appears to not get cleaned up immediately, in which the zone config hangs around for a while. If these were a concern, I'd say delete this filtering and you'll see a few tests fail, e.g. go test ./pkg/ccl/telemetryccl -run 'TestTelemetry/multiregion/server'

Makes sense that we put this in now, but I'm wondering if there's more work to be done here to clean up those zone configs properly before the schema change, like we do for RBR.

@ajstorm ajstorm self-requested a review March 31, 2021 20:24
@otan
Copy link
Contributor Author

otan commented Mar 31, 2021


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, ajstorm (Adam Storm) wrote…

Makes sense that we put this in now, but I'm wondering if there's more work to be done here to clean up those zone configs properly before the schema change, like we do for RBR.

We can't do it before, because if the schema change fails we have to "put them back". they need to be live right until the atomic swap of the indexes.
Doing it after makes the most sense atm...

@ajstorm ajstorm self-requested a review March 31, 2021 21:15
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajstorm, and @otan)


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

We can't do it before, because if the schema change fails we have to "put them back". they need to be live right until the atomic swap of the indexes.
Doing it after makes the most sense atm...

The issue that we hit in the RBR->Global transition originally, was that the ZCs hung around while the indexes were waiting to be GCed. I thought that we caught those by removing the ZCs when transitioning from RBR. If you're still hitting them, then something's not right.

@otan
Copy link
Contributor Author

otan commented Mar 31, 2021


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, ajstorm (Adam Storm) wrote…

The issue that we hit in the RBR->Global transition originally, was that the ZCs hung around while the indexes were waiting to be GCed. I thought that we caught those by removing the ZCs when transitioning from RBR. If you're still hitting them, then something's not right.

Something indeed isn't right (found the bug). I'm inclined to keep this since a) the original bug for that is fixed and b) it doesn't make sense for me to error in this case, and will fix that in a separate issue.

@ajstorm ajstorm self-requested a review March 31, 2021 22:21
Copy link
Collaborator

@ajstorm ajstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @ajstorm, and @otan)


pkg/sql/region_util.go, line 1349 at r3 (raw file):

Previously, otan (Oliver Tan) wrote…

Something indeed isn't right (found the bug). I'm inclined to keep this since a) the original bug for that is fixed and b) it doesn't make sense for me to error in this case, and will fix that in a separate issue.

🎉

@otan
Copy link
Contributor Author

otan commented Mar 31, 2021

bors r=ajstorm

@craig
Copy link
Contributor

craig bot commented Apr 1, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: multiregion validation does not check partition zone configs
3 participants