Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanconfig: enable range coalescing by default #98820

Merged
merged 1 commit into from
May 8, 2023

Conversation

irfansharif
Copy link
Contributor

@irfansharif irfansharif commented Mar 16, 2023

Fixes #81008.

We built the basic infrastructure to coalesce ranges across table boundaries back in 22.2 as part of #66063. We've enabled this optimization for secondary tenants since then, but hadn't for the system tenant because of two primary blockers:

In both these cases we baked in assumptions about there being a minimum of one-{table,index,partition}-per-range. These blockers didn't apply to secondary tenants at the time since they didn't have access to SHOW RANGES, nor the UI pages where these schema statistics were displayed.

We've addressed both these blockers in the 23.1 cycle as part of bridging the compatibility between secondary tenants and yesteryear's system tenant.

Release note (general change): CockroachDB would previously use separate ranges for each table, index, or partition. This is no longer true -- it's possible now to have multiple tables, indexes, and partitions to get packed into the same range. For users with many such "schema objects", this will reduce the total range count in their clusters. This is especially true if individual tables, indexes, or partitions are smaller than the default configured maximum range size (controlled using zone configs, specifically the range_max_bytes parameter). We made this change to improve scalability with respect to the number of schema objects, since the underlying range count now no longer a bottleneck. Users upgrading from 22.2, once finalizing their upgrade, may observe a round of range merges and snapshot transfers (to power said range merges) as a result of this change. If users want to opt-out of this optimization, they can use the following cluster setting:

  SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false; 

@irfansharif irfansharif requested a review from a team as a code owner March 16, 2023 21:58
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@irfansharif irfansharif force-pushed the 230316.enable-range-coalesce branch 2 times, most recently from 050822f to c58ff8c Compare March 16, 2023 22:06
@irfansharif
Copy link
Contributor Author

Hm, looks like #97942 is another dependency. Our data distribution page in the advanced debug page is making the same minimum-one-table-per-range assumption here (+cc @dhartunian). Womp womp.

_, tableID, err := keys.MakeSQLCodec(tenID).DecodeTablePrefix(rangeDesc.StartKey.AsRawKey())

@knz
Copy link
Contributor

knz commented Mar 16, 2023

For UX we want #97858 (already being worked on) and #97861 too.

@irfansharif
Copy link
Contributor Author

CI failure is #98555.

@irfansharif irfansharif changed the title spanconfig: enable range coalescing by default [v23.2] spanconfig: enable range coalescing by default Mar 20, 2023
craig bot pushed a commit that referenced this pull request Mar 26, 2023
98979: sql: backward-compatibility for SHOW RANGES / crdb_internal.ranges r=cucaroach,fqazi a=knz

Requested by `@mwang1026`  and `@nvanbenschoten` .

Fixes #97861.
Fixes #99073.
Needed for #98820.
Epic: CRDB-24928

See the individual commits for details; exceprt from the 2nd commit:

TLDR: the pre-v23.1 behavior of SHOW RANGES and
`crdb_internal.ranges{_no_leases}` is made available conditional on a
new cluster setting `sql.show_ranges_deprecated_behavior.enabled`.

It is set to true by default in v23.1, however any use of the feature
will prompt a SQL notice with the following message:

```
NOTICE: attention! the pre-23.1 behavior of SHOW RANGES and crdb_internal.ranges{,_no_leases} is deprecated!
HINT: Consider enabling the new functionality by setting 'sql.show_ranges_deprecated_behavior.enabled' to 'false'.
The new SHOW RANGES statement has more options. Refer to the online
documentation or execute 'SHOW RANGES ??' for details.
```

The deprecated behavior is also disabled in the test suite and by default in
`cockroach demo`, i.e. the tests continue to exercise the new
behavior.

Release note (backward-incompatible change): The pre-v23.1 output
produced by `SHOW RANGES`, `crdb_internal.ranges`,
`crdb_internal.ranges_no_leases` is deprecated. The new interface and
functionality for `SHWO RANGES` can be enabled by changing the cluster
setting `sql.show_ranges_deprecated_behavior.enabled` to `false`. It
will become default in v23.2.


Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
@knz knz marked this pull request as draft March 30, 2023 14:02
abarganier added a commit to abarganier/cockroach that referenced this pull request Apr 4, 2023
The DataDistribution endpoint reports replica counts by
database and table. When it was built, it operated off
the assumption that a range would only ever contain a
single table's data within.

Now that we have coalesced ranges, a single range can
span multiple tables. Unfortunately, the DataDistribution
endpoint does not take this fact into account, meaning
it reports garbled and inaccurate data, unless the
`spanconfig.storage_coalesce_adjacent.enabled` setting
is set to false (see cockroachdb#98820).

For secondary tenants, ranges are *always* coalesced,
so this endpoint in its current state could never report
meaningful data for a tenant.

Given all of this, we have decided to make this endpoint
only available for the system tenant. This patch
accomplishes this by moving the endpoint away from the
adminServer and into the systemAdminServer, making
it effectively unimplemented for secondary tenants.

Release note: none
craig bot pushed a commit that referenced this pull request Apr 7, 2023
99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19

This PR replaces #96358 and is part of the initial implementation of multiple active portals.

----

This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):

1. Not an internal query;
2. Read-only query;
3. No sub-queries or post-queries.

And such a portal will only have the statement executed with a _non-distributed_ plan. 

This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. 

The core idea of this implementation is 
1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052)
2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173)
3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR)

Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. 

Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up.

Epic: CRDB-17622

Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true,  portals satisfying all following restrictions can be executed in an interleaving manner:  1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 





99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier

#97995 updated the
DB Console to filter out KV-specific charts from the metrics page
when viewing DB Console as a secondary application tenant.

The PR missed a couple small details. This patch cleans those
up with the following:

- Removes KV latency charts for app tenants
- Adds a single storage graph for app tenants showing livebytes
- Removes the "Capacity" chart on the Overview dashboard for app
  tenants

Release note: none

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100

NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860

100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin

Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237

This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework.

The changes involve:
1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually
3. Removing the extra call to json.Marshal
4. Moving to using the `pstest` package for validating results in unit tests
5. Adding topic handling to the batching sink, where batches are created per-topic
6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings
7. Added metrics even to the old pubsub for the purpose of comparing the two versions

At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka.

Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds):
<img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png">

Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them):
<img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png">

In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together.
<img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png">

Publish requests remained the same despite way more messages in v2
<img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png">



Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting.

100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier

The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within.

Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820).

For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant.

Given all of this, we have decided to make this endpoint only available for the system tenant. This patch
accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants.

Release note: none

Informs: #97942

Co-authored-by: Jane Xing <zhouxing@uchicago.edu>
Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com>
Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com>
@irfansharif irfansharif force-pushed the 230316.enable-range-coalesce branch from b19d25c to b2c550f Compare May 7, 2023 23:07
@irfansharif irfansharif changed the title [v23.2] spanconfig: enable range coalescing by default spanconfig: enable range coalescing by default May 7, 2023
@irfansharif
Copy link
Contributor Author

irfansharif commented May 7, 2023

Discussed offline with David H. + Alex B., who are ok with just the following warning in the advanced debug page.

image

@arulajmani, can I get a review?

@irfansharif irfansharif force-pushed the 230316.enable-range-coalesce branch from b2c550f to 494b561 Compare May 7, 2023 23:42
pkg/spanconfig/spanconfigstore/span_store.go Outdated Show resolved Hide resolved
@@ -182,6 +182,15 @@ export class DataDistributionPage extends React.Component<
<BackToAdvanceDebug history={this.props.history} />
<section className="section">
<h1 className="base-heading">Data Distribution</h1>
<p style={{ maxWidth: 300, paddingTop: 10 }}>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a tracking issue to fix the data distribution screen or remove it outright?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was #97942, but the Obs-Inf folks closed it. I'll defer to them if they want something to fix this and/or removing it outright.

Copy link
Collaborator

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 10 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @irfansharif)


pkg/spanconfig/spanconfigstore/span_store.go line 159 at r1 (raw file):

			if roachpb.Key(rem).Compare(systemTableUpperBound) < 0 ||
				!(StorageCoalesceAdjacentSetting.Get(&s.settings.SV) &&
					s.settings.Version.IsActive(ctx, clusterversion.V23_2_EnableRangeCoalescingForSystemTenant)) {

Seems like the range coalescing cluster setting would be a no-op in 23.1 <-> 23.2 mixed version clusters; is the thinking that this is okay because it was a hidden cluster setting previously?

Fixes cockroachdb#81008.

We built the basic infrastructure to coalesce ranges across table
boundaries back in 22.2 as part of cockroachdb#66063. We've enabled this
optimization for secondary tenants since then, but hadn't for the system
tenant because of two primary blockers:

- cockroachdb#93617: SHOW RANGES was broken by coalesced ranges.
- cockroachdb#84105: APIs to compute sizes for schema objects (used in our UI) was
  broken by coalesced ranges.

In both these cases we baked in assumptions about there being a minimum
of one-{table,index,partition}-per-range. These blockers didn't apply to
secondary tenants at the time since they didn't have access to SHOW
RANGES, nor the UI pages where these schema statistics were displayed.

We've addressed both these blockers in the 23.1 cycle as part of
bridging the compatibility between secondary tenants and yesteryear's
system tenant.
- cockroachdb#93644 revised SHOW RANGES and crdb_internal.ranges{,_no_leases}, both
  internally and its external UX, to accommodate ranges straddling
  table/database boundaries.
- cockroachdb#96223 re-worked our SpanStats API to work in the face of coalesced
  ranges, addressing cockroachdb#84105.

Release note (general change): CockroachDB would previously use separate
ranges for each table, index, or partition. This is no longer true --
it's possible now to have multiple tables, indexes, and partitions to
get packed into the same range. For users with many such "schema
objects", this will reduce the total range count in their clusters. This
is especially true if individual tables, indexes, or partitions are
smaller than the default configured maximum range size (controlled using
zone configs, specifically the range_max_bytes parameter).
We made this change to improve scalability with respect to the number of
schema objects, since the underlying range count now no longer a
bottleneck. Users upgrading from 22.2, once finalizing their upgrade,
may observe a round of range merges and snapshot transfers (to power
said range merges) as a result of this change. If users want to opt-out
of this optimization, they can use the following cluster setting:

  SET CLUSTER SETTING spanconfig.storage_coalesce_adjacent.enabled = false;
@irfansharif irfansharif force-pushed the 230316.enable-range-coalesce branch from 494b561 to 7a57e7e Compare May 8, 2023 22:13
Copy link
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR.

bors r+

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @arulajmani, and @knz)


pkg/spanconfig/spanconfigstore/span_store.go line 36 at r1 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

Can you make this one public too then thanks

Done.


pkg/spanconfig/spanconfigstore/span_store.go line 159 at r1 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Seems like the range coalescing cluster setting would be a no-op in 23.1 <-> 23.2 mixed version clusters; is the thinking that this is okay because it was a hidden cluster setting previously?

Yes.

@irfansharif irfansharif marked this pull request as ready for review May 8, 2023 22:14
@irfansharif irfansharif requested a review from a team May 8, 2023 22:14
@craig
Copy link
Contributor

craig bot commented May 8, 2023

Build succeeded:

@craig craig bot merged commit 439c515 into cockroachdb:master May 8, 2023
@irfansharif irfansharif deleted the 230316.enable-range-coalesce branch May 8, 2023 22:44
irfansharif added a commit to irfansharif/cockroach that referenced this pull request May 9, 2023
We enabled range coalescing by default in 23.2 as part of cockroachdb#98820.
Alongside that change, we want to flip the SHOW RANGES behavior to be
compatible with coalesced ranges, which this commit does. See cockroachdb#93644 and
the accompanying release notes.

Release note (backward-incompatible change): The pre-v23.1 output
produced by SHOW RANGES, crdb_internal.ranges,
crdb_internal.ranges_no_leases was deprecated in 23.1, and is now
replaced by default with output that's compatible with coalesced ranges
(i.e. ranges that pack multiple tables/indexes/partitions into
individual ranges). See the 23.1 release notes for SHOW RANGES for more
details.
craig bot pushed a commit that referenced this pull request May 9, 2023
101825: sql: add crdb_internal.pretty_value r=Xiang-Gu,msbutler a=stevendanna

This composes nicely with crdb_internal.scan to allow inspection of table data and other command line tools like cockroach debug pebble.

Epic: none

Release note: None

102907: workload: remove initial prefix from bank workload payload r=herkolategan,srosenberg a=renatolabs

An `initial-` prefix is added to the payload column of the `bank` table when the workload is initialized. It was introduced about 6 years ago [1] and its purpose at the time is not clear. There are two main problems with it:

* the `initial-` prefix suggests the payload may be updated, but that actually never happens.
* as currently implemented, it assumes that the `payload-bytes` command line flag is at least `len([]byte("initial-"))`. Passing a lower value to that command line flag leads to a panic. This is an implicit assumption that should not exist.

This changes the row generation function so that `payload-bytes` bytes are randomly generated and inserted into the `payload` column, without the `initial-` prefix.

[1] d49d535

Epic: none

Release note: None

102961: sql: default sql.show_ranges_deprecated_behavior.enabled to true r=irfansharif a=irfansharif

We enabled range coalescing by default in 23.2 as part of #98820. Alongside that change, we want to flip the SHOW RANGES behavior to be compatible with coalesced ranges, which this commit does. See #93644 and the accompanying release notes.

Release note (backward-incompatible change): The pre-v23.1 output produced by SHOW RANGES, crdb_internal.ranges,
crdb_internal.ranges_no_leases was deprecated in 23.1, and is now replaced by default with output that's compatible with coalesced ranges (i.e. ranges that pack multiple tables/indexes/partitions into individual ranges). See the 23.1 release notes for SHOW RANGES for more details.

Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Renato Costa <renato@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanconfig,kv: enable spanconfig.storage_coalesce_adjacent by default
4 participants