Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,kv: extend SHOW RANGES to include table/index size estimates #97858

Closed
knz opened this issue Mar 1, 2023 · 4 comments · Fixed by #103128
Closed

sql,kv: extend SHOW RANGES to include table/index size estimates #97858

knz opened this issue Mar 1, 2023 · 4 comments · Fixed by #103128
Assignees
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@knz
Copy link
Contributor

knz commented Mar 1, 2023

Describe the problem

Currently SHOW RANGES WITH DETAILS only includes range sizes, not object sizes.

When a range contains multiple table/indexes, the range size cannot be used to estimate the table/index size any more.

So we need a replacement for where people would previously use SUM on the range size.

This can use the new span stats computation that Zach implemented in #96223 .

Jira issue: CRDB-24925
Epic: CRDB-24928

@knz knz added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 1, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Mar 1, 2023
@knz
Copy link
Contributor Author

knz commented Mar 1, 2023

NB: we may have a package at https://github.com/cockroachlabs/metrics-exporter that needs to be updated accordingly

@Santamaura
Copy link
Contributor

Santamaura commented Mar 8, 2023

Maybe it's my lack of familiarity with SHOW RANGES but isn't it all generated in one statement in the delegate framework? If so then it's unclear to me how we would use a SpanStatsResponse directly unless the way forward is to make a span stats builtin and then add it to the statement?`
Edit: Seems like a builtin was already created 👍

@rafiss
Copy link
Collaborator

rafiss commented Apr 27, 2023

Does this issue replace #20712? Maybe also #85834? Also, does it make #86017 easier to resolve?

@knz
Copy link
Contributor Author

knz commented Apr 28, 2023

It might replace #20712 depending on where #98339 gets to.

It will not replace #85834 nor make #86017 easier to resolve. For that, we need a stored cache of the computed approximations (presumably alongside system.descriptors) and expose that data via the pg-compatible builtin function. Building that cache is out of scope here and in #98339 as far as i understand.

zachlite pushed a commit to zachlite/cockroach that referenced this issue May 17, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON value.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

In the JSON value, the 'range_count' key refers to the number of ranges
that the object spans. The 'total_stats' key refers to the MVCC statistics
of the object's span.
zachlite pushed a commit to zachlite/cockroach that referenced this issue May 22, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON value.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

In the JSON value, the 'range_count' key refers to the number of ranges
that the object spans. The 'total_stats' key refers to the MVCC statistics
of the object's span.
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 15, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON value.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

In the JSON value, the 'range_count' key refers to the number of ranges
that the object spans. The 'total_stats' key refers to the MVCC statistics
of the object's span.
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 20, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'[key|val|sys|live|intent]_count' and '[key|val|sys|live|intent]_bytes'
are defined in enginepb.MVCCStats.
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 20, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'[key|val|sys|live|intent]_count' and '[key|val|sys|live|intent]_bytes'
are defined in enginepb.MVCCStats.
craig bot pushed a commit that referenced this issue Jun 20, 2023
103128: sql: extend SHOW RANGES to include object size estimates r=zachlite a=zachlite

This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: #97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'[key|val|sys|live|intent]_count' and '[key|val|sys|live|intent]_bytes'
are defined in enginepb.MVCCStats.

105114: sql: zero out crdb_internal.node_statement_statistics(anonymized) column r=knz a=rafiss

This column is not used, and is expensive to compute. In the past, we
used to redact identifiers from queries for data privacy reasons, but
now we no longer have this policy. (Constants and user-supplied data are
still redacted.) To mitigate impact, we zero out the column value
instead of removing the column completely.

Epic: None
Release note: None

105116: sql: add an advisory check between default_tenant and alter tenant r=yuzefovich a=knz

Fixes #105115.
Epic: CRDB-26691

Prior to this patch, it was just too easy to mistakenly configure `default_tenant` to a non-existent tenant or one without service; or to disable the service for a tenant currently serving default traffic.

This patch makes this mistake harder, by adding a coherence check between the cluster setting and the service mode.

The coherence check can be disabled via the (new) cluster setting `server.controller.default_tenant.check_service.enabled`.

Release note: None

Co-authored-by: Zach Lite <zach@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@craig craig bot closed this as completed in 4263471 Jun 20, 2023
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 21, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'key_count' is the number of meta keys tracked under key_bytes.

'key_bytes' is the number of bytes stored in all non-system
point keys, including live, meta, old, and deleted keys.
Only meta keys really account for the "full" key; value
keys only for the timestamp suffix.

'val_count' is the number of meta values tracked under val_bytes.

'val_bytes' is the number of bytes in all non-system version
values, including meta values.

'sys_count' is the number of meta keys tracked under sys_bytes.

'sys_bytes' is the number of bytes stored in system-local kv-pairs.
This tracks the same quantity as (key_bytes + val_bytes), but
for system-local metadata keys (which aren't counted in either
key_bytes or val_bytes). Each of the keys falling into this group
is documented in keys/constants.go under the LocalPrefix constant
and is prefixed by either LocalRangeIDPrefix or LocalRangePrefix.

'live_count' is the number of meta keys tracked under live_bytes.

'live_bytes' is the number of bytes stored in keys and values which can
in principle be read by means of a Scan or Get in the far future,
including intents but not deletion tombstones (or their intents).
Note that the size of the meta kv pair (which could be explicit or implicit)
is included in this. Only the meta kv pair counts for the actual length
of the encoded key (regular pairs only count the timestamp suffix).

'intent_count' is the number of keys tracked under intent_bytes.
It is equal to the number of meta keys in the system with
a non-empty Transaction proto.

'intent_bytes' is the number of bytes in intent key-value
pairs (without their meta keys).
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 21, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'key_count' is the number of meta keys tracked under key_bytes.

'key_bytes' is the number of bytes stored in all non-system
point keys, including live, meta, old, and deleted keys.
Only meta keys really account for the "full" key; value
keys only for the timestamp suffix.

'val_count' is the number of meta values tracked under val_bytes.

'val_bytes' is the number of bytes in all non-system version
values, including meta values.

'sys_count' is the number of meta keys tracked under sys_bytes.

'sys_bytes' is the number of bytes stored in system-local kv-pairs.
This tracks the same quantity as (key_bytes + val_bytes), but
for system-local metadata keys (which aren't counted in either
key_bytes or val_bytes). Each of the keys falling into this group
is documented in keys/constants.go under the LocalPrefix constant
and is prefixed by either LocalRangeIDPrefix or LocalRangePrefix.

'live_count' is the number of meta keys tracked under live_bytes.

'live_bytes' is the number of bytes stored in keys and values which can
in principle be read by means of a Scan or Get in the far future,
including intents but not deletion tombstones (or their intents).
Note that the size of the meta kv pair (which could be explicit or implicit)
is included in this. Only the meta kv pair counts for the actual length
of the encoded key (regular pairs only count the timestamp suffix).

'intent_count' is the number of keys tracked under intent_bytes.
It is equal to the number of meta keys in the system with
a non-empty Transaction proto.

'intent_bytes' is the number of bytes in intent key-value
pairs (without their meta keys).
zachlite pushed a commit to zachlite/cockroach that referenced this issue Jun 21, 2023
This commit adds object size estimates to `SHOW RANGES WITH DETAILS`.
A new override of `crdb_internal.tenant_span_stats` is introduced
to leverage batched span statistics requests to KV, so only 1 fan-out is
required.

Resolves: cockroachdb#97858
Epic: https://cockroachlabs.atlassian.net/browse/CRDB-24928

Release note (sql change): The SHOW RANGES command will now
emit span statistics when the DETAILS option is specified.
The statistics are included in a new column named 'span_stats',
as a JSON object.

The statistics are calculated for the identifier of each row:
`SHOW RANGES WITH DETAILS` will compute span statistics for each range.
`SHOW RANGES WITH TABLES, DETAILS` will compute span statistics
for each table, etc.

The 'span_stats' JSON object has the following keys:
- approximate_disk_bytes
- [key|val|sys|live|intent]_count
- [key|val|sys|live|intent]_bytes

'approximate_disk_bytes' is an approximation of the total on-disk size
of the given object.

'key_count' is the number of meta keys tracked under key_bytes.

'key_bytes' is the number of bytes stored in all non-system
point keys, including live, meta, old, and deleted keys.
Only meta keys really account for the "full" key; value
keys only for the timestamp suffix.

'val_count' is the number of meta values tracked under val_bytes.

'val_bytes' is the number of bytes in all non-system version
values, including meta values.

'sys_count' is the number of meta keys tracked under sys_bytes.

'sys_bytes' is the number of bytes stored in system-local kv-pairs.
This tracks the same quantity as (key_bytes + val_bytes), but
for system-local metadata keys (which aren't counted in either
key_bytes or val_bytes). Each of the keys falling into this group
is documented in keys/constants.go under the LocalPrefix constant
and is prefixed by either LocalRangeIDPrefix or LocalRangePrefix.

'live_count' is the number of meta keys tracked under live_bytes.

'live_bytes' is the number of bytes stored in keys and values which can
in principle be read by means of a Scan or Get in the far future,
including intents but not deletion tombstones (or their intents).
Note that the size of the meta kv pair (which could be explicit or implicit)
is included in this. Only the meta kv pair counts for the actual length
of the encoded key (regular pairs only count the timestamp suffix).

'intent_count' is the number of keys tracked under intent_bytes.
It is equal to the number of meta keys in the system with
a non-empty Transaction proto.

'intent_bytes' is the number of bytes in intent key-value
pairs (without their meta keys).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-observability C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
5 participants