-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add an Insights into Constraint Conformance RFC #38309
Conversation
0cc449c
to
3062f24
Compare
cc @cockroachdb/rfc-prs |
@@ -0,0 +1,256 @@ | |||
- Feature Name: Insights into Constrain Conformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Feature Name: Insights into Constrain Conformance | |
- Feature Name: Insights into Constraint Conformance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @knz, @piyush-singh, and @tbg)
docs/RFCS/20190619_constraint_conformance_report.md, line 75 at r1 (raw file):
1. Testing fault tolerance: A scenario people ran into was testing CRDB’s resiliency to networking failures. A cluster is configured with 3 Availability Zones (AZs) and the data is constrained to require a replica in every AZ (or
I don't think there were any constraints in use for the test in question, only the implicit diversity requirement.
docs/RFCS/20190619_constraint_conformance_report.md, line 106 at r1 (raw file):
A new system table will be introduced: ```sql CREATE TABLE crdb_internal.constraint_violations (
crdb_internal
is for virtual tables. If this is a regular table whose contents are written to storage, it goes in system
instead (I think).
docs/RFCS/20190619_constraint_conformance_report.md, line 174 at r1 (raw file):
won’t create a job for it. Altering a zone config (for a table, partition or index) will similarly cause a
What about adding nodes in new localities that trigger moves for diversification? I think it would be nice to detect this and create a job for it, although this might be a v2 feature.
docs/RFCS/20190619_constraint_conformance_report.md, line 199 at r1 (raw file):
The data powering the virtual table and the jobs progress will be a process running on the leaseholder for the meta1 range. Every minute, this process will
s/meta1 range/range 1/ (meta1 is always in range 1)
docs/RFCS/20190619_constraint_conformance_report.md, line 210 at r1 (raw file):
that have been dead for 5 minutes (cluster setting server.time_until_store_dead) - that’s when it starts moving replicas away. Nodes that died more recently are as good as live ones. But not so for this report; we can’t claim that everything
Why can't we? I'd expect that the threshold for reporting constraint violations would be the same as the threshold for taking action to fix them.
docs/RFCS/20190619_constraint_conformance_report.md, line 242 at r1 (raw file):
leading to non-sensical counts. ## Unresolved questions
How exactly is under-diversification computed? Consider a cluster with one region, three AZs, and six nodes. The region level is specified in the locality flags in anticipation of future growth, but there's only one region now and we don't want to alert on the inability to handle region failures. Similarly, system ranges are replicated 5x, but there are only three AZs available.
docs/RFCS/20190619_constraint_conformance_report.md, line 244 at r1 (raw file):
## Unresolved questions 1. Should we report violations on a per-zone basis or per table/index (when there’s
In general I'd generate the report at the finest level possible, and then have aggregate/summary views in the UI.
docs/RFCS/20190619_constraint_conformance_report.md, line 248 at r1 (raw file):
1. It seems nice to report violations in terms of amount of data, not number of ranges, but how would we get the range sizes exactly since it’s obviously not part of the descriptors? Is it worth computing the sizes?
I wouldn't worry about getting the range sizes at this point.
docs/RFCS/20190619_constraint_conformance_report.md, line 249 at r1 (raw file):
ranges, but how would we get the range sizes exactly since it’s obviously not part of the descriptors? Is it worth computing the sizes? 1. Should we periodically compute this report even if no one is consuming it? We
I think we should generate it whether or not anyone is reading it; in addition to keeping the velocity up to date (I'm not sure this is critical) it will be good for proactive alerting when/if we support that. We also want to feed some summaries of this information into metrics so alerts can be generated from external monitoring systems.
ab5c6cc
to
aced9f3
Compare
57851e2
to
a951ecd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR Ben. I've clarified some things and added a section on the collection of lease info. Please look again.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @bdarnell, @darinpp, @knz, @piyush-singh, @tbg, and @vilterp)
docs/RFCS/20190619_constraint_conformance_report.md, line 1 at r1 (raw file):
Previously, vilterp (Pete Vilter) wrote…
- Feature Name: Insights into Constraint Conformance
Done.
docs/RFCS/20190619_constraint_conformance_report.md, line 75 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I don't think there were any constraints in use for the test in question, only the implicit diversity requirement.
right, but "partial constraints" could have also been used to require one replica in each AZ, right? So I think this text is fine, no?
docs/RFCS/20190619_constraint_conformance_report.md, line 106 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
crdb_internal
is for virtual tables. If this is a regular table whose contents are written to storage, it goes insystem
instead (I think).
right, but I think I want this to be a virtual table driven by data stored in a different format (and probably stored without multiple versions). What do you think?
docs/RFCS/20190619_constraint_conformance_report.md, line 174 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
What about adding nodes in new localities that trigger moves for diversification? I think it would be nice to detect this and create a job for it, although this might be a v2 feature.
yeah I had considered it but didn't seem straight forward. Added an "out of scope" section :)
docs/RFCS/20190619_constraint_conformance_report.md, line 199 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
s/meta1 range/range 1/ (meta1 is always in range 1)
done
docs/RFCS/20190619_constraint_conformance_report.md, line 210 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Why can't we? I'd expect that the threshold for reporting constraint violations would be the same as the threshold for taking action to fix them.
I dunno... I would not expect that. The allocator waits the 5 minutes as a tradeoff since there's a significant cost to it being too trigger happy.
But the truth of the matter is that ranges are underreplicated for those 5 minutes, and I think the report would better indicate that. I could add information to the report saying that no movement will happen for 5 minutes (and even add a "move now!" button to the UI). What do you think?
docs/RFCS/20190619_constraint_conformance_report.md, line 242 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
How exactly is under-diversification computed? Consider a cluster with one region, three AZs, and six nodes. The region level is specified in the locality flags in anticipation of future growth, but there's only one region now and we don't want to alert on the inability to handle region failures. Similarly, system ranges are replicated 5x, but there are only three AZs available.
Added some clarifications under "How much data is under-diversified". Please take a look and comment there if need be.
docs/RFCS/20190619_constraint_conformance_report.md, line 244 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
In general I'd generate the report at the finest level possible, and then have aggregate/summary views in the UI.
Agreed. So I've modified the text to describe two virtual tables - one offering a per-zone view and one offering a per-object view.
docs/RFCS/20190619_constraint_conformance_report.md, line 248 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I wouldn't worry about getting the range sizes at this point.
well I think the sizes can be reported naturally in the same way that leases are reported (and I've now included words on that). And I think it's fairly important. We should strive to stop talking about ranges to users; we should talk in higher-level terms. And the applicable higher-level concept is data size.
If you agree, I'll remove this this from the open questions too.
docs/RFCS/20190619_constraint_conformance_report.md, line 249 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I think we should generate it whether or not anyone is reading it; in addition to keeping the velocity up to date (I'm not sure this is critical) it will be good for proactive alerting when/if we support that. We also want to feed some summaries of this information into metrics so alerts can be generated from external monitoring systems.
agreed
were migrated away during the first outage. | ||
As things stand, it’s entirely unclear how one is supposed to know when it’s | ||
safe to bring down the 2nd AZ. (Of course, here we’re talking about disaster | ||
testing, not the use of node decomissioning.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a comment about how its extremely unlikely for this situation to play out in production since you would need to lose multiple AZs/regions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do get customers reading the docs but im fine if you want to leave it out
example, if all the nodes holding a partition go away, the ranges in that | ||
partition are not counted (see | ||
https://github.com/cockroachdb/cockroach/issues/19644). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we also need to discuss locality here. While it isn't technically a "constraint" it effectively is one to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you referring to? Locality is "constraints".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well isn't the --locality flag technically not a part of the constraints within replication zones?
partition are not counted (see | ||
https://github.com/cockroachdb/cockroach/issues/19644). | ||
|
||
Besides the general need for an administrator to know that everything is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also learned from a customer this week that they would like to know if the cluster has experienced corruption. I could imagine something like zero fails of the consistency checker on any range in the last 24 hrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is that related to data location / constraint conformance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not per se--and totally fine if we want to pull it out. It does relate to the notion of, is my cluster in a healthy state before I do X
|
||
1. Testing fault tolerance: A scenario people ran into was testing CRDB’s | ||
resiliency to networking failures. A cluster is configured with 3 Availability | ||
Zones (AZs) and the data is constrained to require a replica in every AZ (or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we will apply this to single machine, data center/availability zone, and region failures as all are important for testing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
crdb doesn't understand region/az/dc specifically. --locality takes a list of key-value pairs, without particular semantics for each level in the list. So everything we do is generic.
But perhaps I'm not understanding what you're suggesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just thought it would be worth explaining that it applies to all but we can remove that if you think its unnecessary
@piyush-singh probably worth having design take a look at this to make sure the virtual tables have what is needed |
cc @Annebirzin (we can discuss in person on Monday) |
a951ecd
to
4a370ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for making the changes--lgtm
4a370ce
to
e4b37f7
Compare
docs/RFCS/20190619_constraint_conformance_report.md, line 137 at r3 (raw file):
Should we include information about which constraints have cascaded from parent schema objects in these warnings? So for example, imagine I have table Not sure if the best way to handle is changing |
This patch implements the "Insights into Constraint Conformance" RFC (cockroachdb#38309). At the time of this patch, the RFC is still pending and also out of date. Developed together with Darin. The patch introduces the following tables in the system database, providing information about constraint conformance, replication status and critical localities (i.e. localities that, were they to become unavailable, would cause some ranges to lose quorum): CREATE TABLE replication_constraint_stats ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, type STRING NOT NULL, config STRING NOT NULL, report_id INT8 NOT NULL, violation_start TIMESTAMP NULL, violating_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, type ASC, config ASC), FAMILY "primary" (zone_id, subzone_id, type, config, report_id, violation_start, violating_ranges) ); CREATE TABLE replication_stats ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, report_id INT8 NOT NULL, total_ranges INT8 NOT NULL, unavailable_ranges INT8 NOT NULL, under_replicated_ranges INT8 NOT NULL, over_replicated_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC), FAMILY "primary" (zone_id, subzone_id, report_id, total_ranges, unavailable_ranges, under_replicated_ranges, over_replicated_ranges) ); CREATE TABLE replication_critical_localities ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, locality STRING NOT NULL, report_id INT8 NOT NULL, at_risk_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, locality ASC), FAMILY "primary" (zone_id, subzone_id, locality, report_id, at_risk_ranges) ) And also a system.report_meta table with metadata for all these reports (their creation time). The reports are generated periodically (once a minute by default, subject to the cluster setting kv.replication_reports.interval) by a job running on the leaseholder of range 1. The data is produced by joing range descriptor data from meta2 with zone config information from the gossiped SystemConfig. Release note (sql change): The following system tables containing report about replication status, constraint conformance and critical localities are introduced: replication_constraint_stats, replication_stats, replication_critical_localities.
40625: storage: introduce a couple of replication reports r=andreimatei a=andreimatei This patch implements the "Insights into Constraint Conformance" RFC (#38309). At the time of this patch, the RFC is still pending and also out of date. The patch introduces the following tables in the system database, providing information about constraint conformance, replication status and critical localities (i.e. localities that, were they to become unavailable, would cause some ranges to lose quorum): ``` CREATE TABLE replication_constraint_stats ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, type STRING NOT NULL, config STRING NOT NULL, report_id INT8 NOT NULL, violation_start TIMESTAMP NULL, violating_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, type ASC, config ASC), FAMILY "primary" (zone_id, subzone_id, type, config, report_id, violation_start, violating_ranges) ); CREATE TABLE replication_stats ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, report_id INT8 NOT NULL, total_ranges INT8 NOT NULL, unavailable_ranges INT8 NOT NULL, under_replicated_ranges INT8 NOT NULL, over_replicated_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC), FAMILY "primary" (zone_id, subzone_id, report_id, total_ranges, unavailable_ranges, under_replicated_ranges, over_replicated_ranges) ); CREATE TABLE replication_critical_localities ( zone_id INT8 NOT NULL, subzone_id INT8 NOT NULL, locality STRING NOT NULL, report_id INT8 NOT NULL, at_risk_ranges INT8 NOT NULL, CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, locality ASC), FAMILY "primary" (zone_id, subzone_id, locality, report_id, at_risk_ranges) ) ``` And also a system.report_meta table with metadata for all these reports (their creation time). The reports are generated periodically (once a minute by default, subject to the cluster setting kv.replication_reports.interval) by a job running on the leaseholder of range 1. The data is produced by joing range descriptor data from meta2 with zone config information from the gossiped SystemConfig. Release note (sql change): The following system tables containing report about replication status, constraint conformance and critical localities are introduced: replication_constraint_stats, replication_stats, replication_critical_localities. Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rebase the RFC and update it with the work that was actually implemented.
Then change the status to completed
and merge this.
Reviewed 1 of 1 files at r7.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @nvanbenschoten, @petermattis, @piyush-singh, @tbg, and @vilterp)
docs/RFCS/20190619_constraint_conformance_report.md, line 37 at r7 (raw file):
**Replication zone configs and constraints:** key spans can be grouped into *“replication zones” and properties can be set for each zone through a “zone
Extraneous asterisks at the start of each line until below.
@andreimatei can you merge this? |
The set of features described here aim to provide admins with visibility into aspects of replication and replica placement. In particular admins will be able to access a report that details which replication constraints are violated. Release note: None
21868b0
to
bce0175
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @knz, @nvanbenschoten, @petermattis, @piyush-singh, and @vilterp)
Build failed (retrying...): |
Build failed (retrying...): |
@andreimatei gotta have your release justification, don't ya know it? |
Build failed (retrying...): |
Build failed (retrying...): |
Build succeeded: |
The set of features described here aim to provide admins with visibility into
aspects of replication and replica placement. In particular admins will be able
to access a report that details which replication constraints are violated.
Release justification: non-code change
Release note: None