Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add an Insights into Constraint Conformance RFC #38309

Merged
merged 1 commit into from
Mar 1, 2022

Conversation

andreimatei
Copy link
Contributor

@andreimatei andreimatei commented Jun 19, 2019

The set of features described here aim to provide admins with visibility into
aspects of replication and replica placement. In particular admins will be able
to access a report that details which replication constraints are violated.

Release justification: non-code change
Release note: None

@andreimatei andreimatei requested a review from a team as a code owner June 19, 2019 23:43
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andreimatei
Copy link
Contributor Author

cc @cockroachdb/rfc-prs

@@ -0,0 +1,256 @@
- Feature Name: Insights into Constrain Conformance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Feature Name: Insights into Constrain Conformance
- Feature Name: Insights into Constraint Conformance

Copy link
Contributor

@bdarnell bdarnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @knz, @piyush-singh, and @tbg)


docs/RFCS/20190619_constraint_conformance_report.md, line 75 at r1 (raw file):

1. Testing fault tolerance: A scenario people ran into was testing CRDB’s
resiliency to networking failures. A cluster is configured with 3 Availability
Zones (AZs) and the data is constrained to require a replica in every AZ (or

I don't think there were any constraints in use for the test in question, only the implicit diversity requirement.


docs/RFCS/20190619_constraint_conformance_report.md, line 106 at r1 (raw file):

A new system table will be introduced:
```sql
CREATE TABLE crdb_internal.constraint_violations (

crdb_internal is for virtual tables. If this is a regular table whose contents are written to storage, it goes in system instead (I think).


docs/RFCS/20190619_constraint_conformance_report.md, line 174 at r1 (raw file):

won’t create a job for it.

Altering a zone config (for a table, partition or index) will similarly cause a

What about adding nodes in new localities that trigger moves for diversification? I think it would be nice to detect this and create a job for it, although this might be a v2 feature.


docs/RFCS/20190619_constraint_conformance_report.md, line 199 at r1 (raw file):

The data powering the virtual table and the jobs progress will be a process
running on the leaseholder for the meta1 range. Every minute, this process will

s/meta1 range/range 1/ (meta1 is always in range 1)


docs/RFCS/20190619_constraint_conformance_report.md, line 210 at r1 (raw file):

that have been dead for 5 minutes (cluster setting server.time_until_store_dead)
- that’s when it starts moving replicas away. Nodes that died more recently are
as good as live ones. But not so for this report; we can’t claim that everything

Why can't we? I'd expect that the threshold for reporting constraint violations would be the same as the threshold for taking action to fix them.


docs/RFCS/20190619_constraint_conformance_report.md, line 242 at r1 (raw file):

leading to non-sensical counts.

## Unresolved questions

How exactly is under-diversification computed? Consider a cluster with one region, three AZs, and six nodes. The region level is specified in the locality flags in anticipation of future growth, but there's only one region now and we don't want to alert on the inability to handle region failures. Similarly, system ranges are replicated 5x, but there are only three AZs available.


docs/RFCS/20190619_constraint_conformance_report.md, line 244 at r1 (raw file):

## Unresolved questions

1. Should we report violations on a per-zone basis or per table/index (when there’s

In general I'd generate the report at the finest level possible, and then have aggregate/summary views in the UI.


docs/RFCS/20190619_constraint_conformance_report.md, line 248 at r1 (raw file):

1. It seems nice to report violations in terms of amount of data, not number of
ranges, but how would we get the range sizes exactly since it’s obviously not
part of the descriptors? Is it worth computing the sizes?

I wouldn't worry about getting the range sizes at this point.


docs/RFCS/20190619_constraint_conformance_report.md, line 249 at r1 (raw file):

ranges, but how would we get the range sizes exactly since it’s obviously not
part of the descriptors? Is it worth computing the sizes?
1. Should we periodically compute this report even if no one is consuming it? We

I think we should generate it whether or not anyone is reading it; in addition to keeping the velocity up to date (I'm not sure this is critical) it will be good for proactive alerting when/if we support that. We also want to feed some summaries of this information into metrics so alerts can be generated from external monitoring systems.

@andreimatei andreimatei force-pushed the rfc.constraints branch 4 times, most recently from ab5c6cc to aced9f3 Compare June 21, 2019 19:32
@andreimatei andreimatei changed the title docs: add an Insights into Constrain Conformance RFC docs: add an Insights into Constraint Conformance RFC Jun 21, 2019
@andreimatei andreimatei force-pushed the rfc.constraints branch 5 times, most recently from 57851e2 to a951ecd Compare June 21, 2019 19:38
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR Ben. I've clarified some things and added a section on the collection of lease info. Please look again.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @awoods187, @bdarnell, @darinpp, @knz, @piyush-singh, @tbg, and @vilterp)


docs/RFCS/20190619_constraint_conformance_report.md, line 1 at r1 (raw file):

Previously, vilterp (Pete Vilter) wrote…
- Feature Name: Insights into Constraint Conformance

Done.


docs/RFCS/20190619_constraint_conformance_report.md, line 75 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I don't think there were any constraints in use for the test in question, only the implicit diversity requirement.

right, but "partial constraints" could have also been used to require one replica in each AZ, right? So I think this text is fine, no?


docs/RFCS/20190619_constraint_conformance_report.md, line 106 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

crdb_internal is for virtual tables. If this is a regular table whose contents are written to storage, it goes in system instead (I think).

right, but I think I want this to be a virtual table driven by data stored in a different format (and probably stored without multiple versions). What do you think?


docs/RFCS/20190619_constraint_conformance_report.md, line 174 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

What about adding nodes in new localities that trigger moves for diversification? I think it would be nice to detect this and create a job for it, although this might be a v2 feature.

yeah I had considered it but didn't seem straight forward. Added an "out of scope" section :)


docs/RFCS/20190619_constraint_conformance_report.md, line 199 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

s/meta1 range/range 1/ (meta1 is always in range 1)

done


docs/RFCS/20190619_constraint_conformance_report.md, line 210 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

Why can't we? I'd expect that the threshold for reporting constraint violations would be the same as the threshold for taking action to fix them.

I dunno... I would not expect that. The allocator waits the 5 minutes as a tradeoff since there's a significant cost to it being too trigger happy.
But the truth of the matter is that ranges are underreplicated for those 5 minutes, and I think the report would better indicate that. I could add information to the report saying that no movement will happen for 5 minutes (and even add a "move now!" button to the UI). What do you think?


docs/RFCS/20190619_constraint_conformance_report.md, line 242 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

How exactly is under-diversification computed? Consider a cluster with one region, three AZs, and six nodes. The region level is specified in the locality flags in anticipation of future growth, but there's only one region now and we don't want to alert on the inability to handle region failures. Similarly, system ranges are replicated 5x, but there are only three AZs available.

Added some clarifications under "How much data is under-diversified". Please take a look and comment there if need be.


docs/RFCS/20190619_constraint_conformance_report.md, line 244 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

In general I'd generate the report at the finest level possible, and then have aggregate/summary views in the UI.

Agreed. So I've modified the text to describe two virtual tables - one offering a per-zone view and one offering a per-object view.


docs/RFCS/20190619_constraint_conformance_report.md, line 248 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I wouldn't worry about getting the range sizes at this point.

well I think the sizes can be reported naturally in the same way that leases are reported (and I've now included words on that). And I think it's fairly important. We should strive to stop talking about ranges to users; we should talk in higher-level terms. And the applicable higher-level concept is data size.
If you agree, I'll remove this this from the open questions too.


docs/RFCS/20190619_constraint_conformance_report.md, line 249 at r1 (raw file):

Previously, bdarnell (Ben Darnell) wrote…

I think we should generate it whether or not anyone is reading it; in addition to keeping the velocity up to date (I'm not sure this is critical) it will be good for proactive alerting when/if we support that. We also want to feed some summaries of this information into metrics so alerts can be generated from external monitoring systems.

agreed

were migrated away during the first outage.
As things stand, it’s entirely unclear how one is supposed to know when it’s
safe to bring down the 2nd AZ. (Of course, here we’re talking about disaster
testing, not the use of node decomissioning.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comment about how its extremely unlikely for this situation to play out in production since you would need to lose multiple AZs/regions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do get customers reading the docs but im fine if you want to leave it out

example, if all the nodes holding a partition go away, the ranges in that
partition are not counted (see
https://github.com/cockroachdb/cockroach/issues/19644).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need to discuss locality here. While it isn't technically a "constraint" it effectively is one to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you referring to? Locality is "constraints".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well isn't the --locality flag technically not a part of the constraints within replication zones?

partition are not counted (see
https://github.com/cockroachdb/cockroach/issues/19644).

Besides the general need for an administrator to know that everything is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also learned from a customer this week that they would like to know if the cluster has experienced corruption. I could imagine something like zero fails of the consistency checker on any range in the last 24 hrs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is that related to data location / constraint conformance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not per se--and totally fine if we want to pull it out. It does relate to the notion of, is my cluster in a healthy state before I do X

docs/RFCS/20190619_constraint_conformance_report.md Outdated Show resolved Hide resolved

1. Testing fault tolerance: A scenario people ran into was testing CRDB’s
resiliency to networking failures. A cluster is configured with 3 Availability
Zones (AZs) and the data is constrained to require a replica in every AZ (or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we will apply this to single machine, data center/availability zone, and region failures as all are important for testing here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crdb doesn't understand region/az/dc specifically. --locality takes a list of key-value pairs, without particular semantics for each level in the list. So everything we do is generic.
But perhaps I'm not understanding what you're suggesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i just thought it would be worth explaining that it applies to all but we can remove that if you think its unnecessary

docs/RFCS/20190619_constraint_conformance_report.md Outdated Show resolved Hide resolved
docs/RFCS/20190619_constraint_conformance_report.md Outdated Show resolved Hide resolved
docs/RFCS/20190619_constraint_conformance_report.md Outdated Show resolved Hide resolved
@awoods187
Copy link
Contributor

@piyush-singh probably worth having design take a look at this to make sure the virtual tables have what is needed

@piyush-singh
Copy link

piyush-singh commented Jun 21, 2019

cc @Annebirzin (we can discuss in person on Monday)

Copy link
Contributor

@awoods187 awoods187 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for making the changes--lgtm

@piyush-singh
Copy link

piyush-singh commented Jun 24, 2019


docs/RFCS/20190619_constraint_conformance_report.md, line 137 at r3 (raw file):

    ```
    table foo, replication factor, 1/1000 ranges have only 2 replicas, 2/1000 ranges

Should we include information about which constraints have cascaded from parent schema objects in these warnings?

So for example, imagine I have table a that has an index a.idx. If I set the zone config on table a to avoid a region ([-region=us-west-1b]), as a user I might be surprised when I see my constraint conformance warning me that my index a.idx isn't conforming to my constraint by being in the region. In fact, when I run SHOW ZONE CONFIGURATIONS, because the zone constraint is inherited, I won't even see the constraint on a.idx listed which will only add to the confusion (would have to check SHOW ZONE CONFIGURATION FOR INDEX a.idx specifically).

Not sure if the best way to handle is changing SHOW ZONE CONFIGURATIONS or doing something different here.

andreimatei added a commit to andreimatei/cockroach that referenced this pull request Sep 10, 2019
This patch implements the "Insights into Constraint Conformance" RFC
(cockroachdb#38309). At the time of this patch, the RFC is still pending and also
out of date.
Developed together with Darin.

The patch introduces the following tables in the system database,
providing information about constraint conformance, replication status
and critical localities (i.e. localities that, were they to become
unavailable, would cause some ranges to lose quorum):
  CREATE TABLE replication_constraint_stats (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      type STRING NOT NULL,
      config STRING NOT NULL,
      report_id INT8 NOT NULL,
      violation_start TIMESTAMP NULL,
      violating_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, type ASC, config ASC),
      FAMILY "primary" (zone_id, subzone_id, type, config, report_id, violation_start, violating_ranges)
  );

  CREATE TABLE replication_stats (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      report_id INT8 NOT NULL,
      total_ranges INT8 NOT NULL,
      unavailable_ranges INT8 NOT NULL,
      under_replicated_ranges INT8 NOT NULL,
      over_replicated_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC),
      FAMILY "primary" (zone_id, subzone_id, report_id, total_ranges, unavailable_ranges, under_replicated_ranges, over_replicated_ranges)
  );

    CREATE TABLE replication_critical_localities (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      locality STRING NOT NULL,
      report_id INT8 NOT NULL,
      at_risk_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, locality ASC),
      FAMILY "primary" (zone_id, subzone_id, locality, report_id, at_risk_ranges)
  )

And also a system.report_meta table with metadata for all these reports
(their creation time).

The reports are generated periodically (once a minute by default,
subject to the cluster setting kv.replication_reports.interval) by a job
running on the leaseholder of range 1.
The data is produced by joing range descriptor data from meta2 with zone
config information from the gossiped SystemConfig.

Release note (sql change): The following system tables containing report about
replication status, constraint conformance and critical localities are
introduced: replication_constraint_stats, replication_stats,
replication_critical_localities.
craig bot pushed a commit that referenced this pull request Sep 10, 2019
40625: storage: introduce a couple of replication reports r=andreimatei a=andreimatei

This patch implements the "Insights into Constraint Conformance" RFC
(#38309). At the time of this patch, the RFC is still pending and also
out of date.
The patch introduces the following tables in the system database,
providing information about constraint conformance, replication status
and critical localities (i.e. localities that, were they to become
unavailable, would cause some ranges to lose quorum):
```
  CREATE TABLE replication_constraint_stats (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      type STRING NOT NULL,
      config STRING NOT NULL,
      report_id INT8 NOT NULL,
      violation_start TIMESTAMP NULL,
      violating_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, type ASC, config ASC),
      FAMILY "primary" (zone_id, subzone_id, type, config, report_id, violation_start, violating_ranges)
  );

  CREATE TABLE replication_stats (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      report_id INT8 NOT NULL,
      total_ranges INT8 NOT NULL,
      unavailable_ranges INT8 NOT NULL,
      under_replicated_ranges INT8 NOT NULL,
      over_replicated_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC),
      FAMILY "primary" (zone_id, subzone_id, report_id, total_ranges, unavailable_ranges, under_replicated_ranges, over_replicated_ranges)
  );

    CREATE TABLE replication_critical_localities (
      zone_id INT8 NOT NULL,
      subzone_id INT8 NOT NULL,
      locality STRING NOT NULL,
      report_id INT8 NOT NULL,
      at_risk_ranges INT8 NOT NULL,
      CONSTRAINT "primary" PRIMARY KEY (zone_id ASC, subzone_id ASC, locality ASC),
      FAMILY "primary" (zone_id, subzone_id, locality, report_id, at_risk_ranges)
  )
```

And also a system.report_meta table with metadata for all these reports
(their creation time).

The reports are generated periodically (once a minute by default,
subject to the cluster setting kv.replication_reports.interval) by a job
running on the leaseholder of range 1.
The data is produced by joing range descriptor data from meta2 with zone
config information from the gossiped SystemConfig.

Release note (sql change): The following system tables containing report about
replication status, constraint conformance and critical localities are
introduced: replication_constraint_stats, replication_stats,
replication_critical_localities.

Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rebase the RFC and update it with the work that was actually implemented.
Then change the status to completed and merge this.

Reviewed 1 of 1 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @nvanbenschoten, @petermattis, @piyush-singh, @tbg, and @vilterp)


docs/RFCS/20190619_constraint_conformance_report.md, line 37 at r7 (raw file):

**Replication zone configs and constraints:** key spans can be grouped into
*“replication zones” and properties can be set for each zone through a “zone

Extraneous asterisks at the start of each line until below.

@bobvawter
Copy link
Member

bobvawter commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

@tbg tbg removed their request for review June 25, 2020 09:48
@tbg tbg added the X-noremind Bots won't notify about PRs with X-noremind label May 6, 2021
@ajwerner
Copy link
Contributor

@andreimatei can you merge this?

The set of features described here aim to provide admins with visibility into
aspects of replication and replica placement. In particular admins will be able
to access a report that details which replication constraints are violated.

Release note: None
Copy link
Contributor Author

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @awoods187, @bdarnell, @darinpp, @knz, @nvanbenschoten, @petermattis, @piyush-singh, and @vilterp)

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed (retrying...):

@ajwerner
Copy link
Contributor

@andreimatei gotta have your release justification, don't ya know it?

@craig
Copy link
Contributor

craig bot commented Feb 28, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 1, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Mar 1, 2022

Build succeeded:

@craig craig bot merged commit aeb1e28 into cockroachdb:master Mar 1, 2022
@andreimatei andreimatei deleted the rfc.constraints branch March 18, 2022 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.