-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: better SHOW RANGES
and changes to crdb_internal.ranges{,_no_leases}
#93644
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
knz
force-pushed
the
20221214-show-ranges
branch
5 times, most recently
from
December 15, 2022 21:20
4236e48
to
9f83fca
Compare
knz
force-pushed
the
20221214-show-ranges
branch
2 times, most recently
from
December 16, 2022 11:37
283d96b
to
47a4ca1
Compare
todo: update commit message with the new WITH clause |
knz
force-pushed
the
20221214-show-ranges
branch
2 times, most recently
from
December 16, 2022 19:46
b30034d
to
2a3c34a
Compare
knz
force-pushed
the
20221214-show-ranges
branch
from
December 16, 2022 20:47
2a3c34a
to
dbc47d4
Compare
knz
requested review from
rhu713 and
miretskiy
and removed request for
a team
December 16, 2022 20:48
This is good to go now.
cc @ajwerner since we chatted about it. |
This was referenced Dec 23, 2022
craig bot
pushed a commit
that referenced
this pull request
Dec 23, 2022
94233: roachtest: update `failover` tests to use `SHOW RANGES` r=erikgrinaker a=erikgrinaker This patch updates the `failover` roachtests to use the new `SHOW RANGES` syntax, because of the recent `crdb_internal.ranges` changes. Resolves #94218. Resolves #94217. Resolves #94216. Resolves #94215. Resolves #94214. Resolves #94213. Resolves #94212. Resolves #94210. Touches #93644. Epic: none Release note: None Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
May 7, 2023
Fixes cockroachdb#81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - cockroachdb#93617: SHOW RANGES was broken by coalesced ranges. - cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing cockroachdb#84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
May 8, 2023
Fixes cockroachdb#81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - cockroachdb#93617: SHOW RANGES was broken by coalesced ranges. - cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing cockroachdb#84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
craig bot
pushed a commit
that referenced
this pull request
May 8, 2023
98820: spanconfig: enable range coalescing by default r=irfansharif a=irfansharif Fixes #81008. We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of #66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers: - #93617: SHOW RANGES was broken by coalesced ranges. - #84105: APIs to compute sizes for schema objects (used in our UI) was broken by coalesced ranges. In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed. We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant. - #93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both internally and its external UX, to accommodate ranges straddling table/database boundaries. - #96223 re-worked our SpanStats API to work in the face of coalesced ranges, addressing #84105. Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting: ``` SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false; ``` Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
irfansharif
added a commit
to irfansharif/cockroach
that referenced
this pull request
May 9, 2023
We enabled range coalescing by default in 23.2 as part of cockroachdb#98820. Alongside that change, we want to flip the SHOW RANGES behavior to be compatible with coalesced ranges, which this commit does. See cockroachdb#93644 and the accompanying release notes. Release note (backward-incompatible change): The pre-v23.1 output produced by SHOW RANGES, crdb_internal.ranges, crdb_internal.ranges_no_leases was deprecated in 23.1, and is now replaced by default with output that's compatible with coalesced ranges (i.e. ranges that pack multiple tables/indexes/partitions into individual ranges). See the 23.1 release notes for SHOW RANGES for more details.
craig bot
pushed a commit
that referenced
this pull request
May 9, 2023
101825: sql: add crdb_internal.pretty_value r=Xiang-Gu,msbutler a=stevendanna This composes nicely with crdb_internal.scan to allow inspection of table data and other command line tools like cockroach debug pebble. Epic: none Release note: None 102907: workload: remove initial prefix from bank workload payload r=herkolategan,srosenberg a=renatolabs An `initial-` prefix is added to the payload column of the `bank` table when the workload is initialized. It was introduced about 6 years ago [1] and its purpose at the time is not clear. There are two main problems with it: * the `initial-` prefix suggests the payload may be updated, but that actually never happens. * as currently implemented, it assumes that the `payload-bytes` command line flag is at least `len([]byte("initial-"))`. Passing a lower value to that command line flag leads to a panic. This is an implicit assumption that should not exist. This changes the row generation function so that `payload-bytes` bytes are randomly generated and inserted into the `payload` column, without the `initial-` prefix. [1] d49d535 Epic: none Release note: None 102961: sql: default sql.show_ranges_deprecated_behavior.enabled to true r=irfansharif a=irfansharif We enabled range coalescing by default in 23.2 as part of #98820. Alongside that change, we want to flip the SHOW RANGES behavior to be compatible with coalesced ranges, which this commit does. See #93644 and the accompanying release notes. Release note (backward-incompatible change): The pre-v23.1 output produced by SHOW RANGES, crdb_internal.ranges, crdb_internal.ranges_no_leases was deprecated in 23.1, and is now replaced by default with output that's compatible with coalesced ranges (i.e. ranges that pack multiple tables/indexes/partitions into individual ranges). See the 23.1 release notes for SHOW RANGES for more details. Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Renato Costa <renato@cockroachlabs.com> Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #93617.
Fixes #80906.
Fixes #93198.
Epic: CRDB-22701
The output of
crdb_internal.ranges{,_no_leases}
andSHOW RANGES
was irreparably broken by the introduction of range coalescing(ranges spanning multiple tables/databases).
Moreover, the start/end keys of SHOW RANGES were often empty or
NULL due to incorrect/excessive truncation.
This commit fixes this by introducing a new design for SHOW RANGES
and tweaking the definition of
crdb_internal.ranges{,_no_leases}
.Note: THIS IS A BREAKING CHANGE. See the "backward-incompatible
change" release notes below for suggestions on updating client code.
Short documentation.
The revised syntax is now:
New syntax:
SHOW CLUSTER RANGES
,SHOW RANGES
with noFROM
,FROM CURRENT_CATALOG
,WITH
clause.In summary, we have:
SHOW CLUSTER RANGES
which includes all ranges, including those notbelonging to any table.
SHOW RANGES [FROM DATABASE | FROM CURRENT_CATALOG]
which includesonly ranges overlapping with any table in the target db.
Note:
SHOW RANGES
without target (NEW!) is an alias forSHOW RANGES FROM CURRENT_CATALOG
.SHOW RANGES FROM TABLE
selects only ranges that overlap with thegiven table.
SHOW RANGES FROM INDEX
selects only ranges that overlap with thegiven index.
Then:
WITH TABLES
is specified, the rows are duplicated to detaileach table included in each range (1 row per range-table
intersection).
WITH INDEXES
is specified, the rows are duplicated to detail eachindex included in each range (1 row per range-index intersection).
In summary:
SHOW RANGES FROM DATABASE
SHOW RANGES FROM TABLE
SHOW RANGES FROM INDEX
SHOW RANGES FROM DATABASE ... WITH TABLES
(NEW)SHOW RANGES FROM DATABASE ... WITH INDEXES
(NEW)SHOW RANGES FROM TABLE ... WITH INDEXES
(NEW)SHOW CLUSTER RANGES
(NEW)SHOW CLUSTER RANGES WITH TABLES
(NEW)SHOW CLUSTER RANGES WITH INDEXES
(NEW)SHOW RANGES FROM DATABASE
SHOW RANGES FROM TABLE
SHOW RANGES FROM INDEX
SHOW RANGES FROM DATABASE ... WITH TABLES
(NEW)SHOW RANGES FROM DATABASE ... WITH INDEXES
(NEW)SHOW RANGES FROM TABLE ... WITH INDEXES
(NEW)SHOW CLUSTER RANGES
(NEW)SHOW CLUSTER RANGES WITH TABLES
(NEW)SHOW CLUSTER RANGES WITH INDEXES
(NEW)In any case, all the columns from
crdb_internal.ranges_no_leases
areincluded. By default, the start/end key boundaries are pretty-printed
as in previous versions.
Then:
WITH KEYS
is specified, the raw key bytes are exposed alongsidethe pretty-printed key boundaries.
WITH DETAILS
is specified, extra expensive information isincluded in the result, as of
crdb_internal.ranges
.(requires more roundtrips; makes the operation slower overall)
Then:
WITH EXPLAIN
is specified, the statement simply returns thetext of the SQL query it would use if
WITH EXPLAIN
was notspecified. This can be used for learning or troubleshooting.
See text of release notes below for more details; also the explanatory
comment at the top of
pkg/sql/delegate/show_ranges.go
.Example use.
To test this, use for example the following setup:
Example output for
SHOW RANGES FROM DATABASE
:New syntax:
WITH TABLES
/WITH INDEXES
:Example output for
SHOW RANGES FROM TABLE
:New syntax:
SHOW RANGES FROM TABLE ... WITH INDEXES
:Example output for
SHOW RANGES FROM INDEX
:See release notes below for details.
Backward-incompatible changes.
Release note (backward-incompatible change): CockroachDB now supports
sharing storage ranges across multiple indexes/tables. As a result,
there is no more guarantee that there is at most one SQL object (e.g.
table/index/sequence/materialized view) per storage range.
Therefore, the columns
table_id
,database_name
,schema_name
,table_name
andindex_name
incrdb_internal.ranges
and.ranges_no_leases
have become nonsensical: a range cannot beattributed to a single table/index any more.
As a result:
The aforementioned columns in the
crdb_internal
virtual tableshave been removed. Existing code can use the SHOW RANGES
statement instead, optionally using WITH KEYS to expose
the raw start/end keys.
SHOW RANGES FROM DATABASE
continues to report one row per range,but stops returning the database / schema / table / index name.
SHOW RANGES FROM TABLE
continues to report one row per range,but stops returning the index name.
Suggested replacements:
Instead of
SELECT range_id FROM crdb_internal.ranges WHERE table_name = 'x'
Use:
SELECT range_id FROM [SHOW RANGES FROM TABLE x]
Instead of
SELECT range_id FROM crdb_internal.ranges WHERE table_name = $1 OR table_id = $2
(variable / unpredictable table name or ID)
Use:
SELECT range_id FROM [SHOW RANGES FROM CURRENT_CATALOG WITH TABLES] WHERE table_name = $1 OR table_id = $2
Instead of
SELECT start_key FROM crdb_internal.ranges WHERE table_name = 'x'
Use:
SELECT raw_start_key FROM [SHOW RANGES FROM TABLE x WITH KEYS]
Instead of
SELECT start_key FROM crdb_internal.ranges WHERE table_name = $1 OR table_id = $2
(unpredictable / variable table name or ID)
Use:
SELECT raw_start_key FROM [SHOW RANGES FROM CURRENT_CATALOG WITH TABLES, KEYS] WHERE table_name = $1 OR table_id = $2
Release note (backward-incompatible change): The format of the
columns
start_key
andend_key
forSHOW RANGES FROM DATABASE
and
SHOW RANGES FROM TABLE
have been extended to include whichtable/index the key belongs to. This is necessary because a range
can now contain data from more than one table/index.
Release note (backward-incompatible change): The format of
the columns
start_key
andend_key
forSHOW RANGE ... FOR ROW
has been changed to stay consistent with the output of
SHOW RANGES FROM INDEX
.Release note (backward-incompatible change): The output of
SHOW RANGES
does not includerange_size
,range_size_mb
,lease_holder
and
lease_holder_localities
any more by default. This ensures thatSHOW RANGES
remains fast in the common case. Use the (NEW) optionWITH DETAILS
to include these columns.Other changes.
Release note (bug fix): In some cases the start/end key columns of the
output of
SHOW RANGES
was missing. This was corrected.Release note (sql change): Two new virtual tables
crdb_internal.index_spans
and.table_spans
have been introduced,which list the logical keyspace used by each index/table.
New features.
Release note (sql change): The following new statements are
introduced:
SHOW RANGES FROM CURRENT_CATALOG
andSHOW RANGES
withoutparameter: alias for
SHOW RANGES FROM DATABASE
on the session'scurrent database.
SHOW RANGES FROM DATABASE ... WITH TABLES
Reports at least one row per table. It's possible for the same
range ID to be repeated across multiple rows, when a range spans
multiple tables.
SHOW RANGES FROM DATABASE ... WITH INDEXES
Reports at least one row per index. It's possible for the same
range ID to be repeated across multiple rows, when a range spans
multiple indexes.
SHOW RANGES FROM TABLE ... WITH INDEXES
Reports at least one row per index. It's possible for the same
range ID to be repeated across multiple rows, when a range spans
multiple indexes.
SHOW CLUSTER RANGES [ WITH { INDEXES | TABLES } ]
Reports rangesacross the entire cluster, including ranges that don't contain table
data. The behavior of
WITH INDEXES
andWITH TABLES
is the sameas for
SHOW RANGES FROM DATABASE
.Additionally, the following new options have been added to the
SHOW RANGES
statement:WITH KEYS
: produce the raw bytes of the start/end key boundaries.WITH DETAILS
: produce more details, using computations thatrequire extra network roundtrips. Makes the operation slower
overall.
WITH EXPLAIN
: produce the text of the SQL query used torun the statement.