sql: enable resumption of a flow for pausable portals #99173

ZhouXing19 · 2023-03-21T23:13:08Z

This PR is part of the implementation of multiple active portals. (Extracted from #96358)

We now introduce a Resume() method for flow, and when a pausable portal is being re-executed, rather than generating a new flow, we resume the persisted flow to continue the previous execution.

sql: add telemetry MultipleActivePortalCounter

This commit added a telemetry counter MultipleActivePortalCounter that would
be incremented each time the cluster setting
sql.pgwire.multiple_active_portals.enabled is set to true

sql: add Resume method for flowinfra.Flow and execinfra.Processor

For pausable portals, each execution needs to resume the processor with new
output receiver. We don't need to restart the processors, and this Resume()
step can be called many times after Run() is called.

sql: reuse flow for pausable portal

To execute portals in an interleaving manner, we need to persist the flow and
queryID so that we can continue fetching the result when we come back to the same
portal.

We now introduce pauseInfo field in sql.PreparedPortal that stores this
metadata. It's set during the first-time execution of an engine, and all
following executions will reuse the flow and the queryID. This also implies that
these resources should not be cleaned up with the end of each execution.
Implementation for the clean-up steps is included in the next commit.

Also, in this commit we hang a *PreparedPortal to the planner, and how it is
set can be seen in the next commit as well.

Epic: CRDB-17622

Release note: None

cockroach-teamcity · 2023-03-21T23:13:20Z

This change is

ZhouXing19 · 2023-03-21T23:20:25Z

@yuzefovich since vectorizedFlow.Resume() will always call f.FlowBase.Resume(recv), maybe we don't need to have resumeCtx for vectorizedFlow?

yuzefovich

I have a few more comments, and it seems like the linter is unhappy about some of the unused stuff.

since vectorizedFlow.Resume() will always call f.FlowBase.Resume(recv), maybe we don't need to have resumeCtx for vectorizedFlow?

Yeah, sounds good.

Reviewed 3 of 3 files at r1, 11 of 11 files at r2, 6 of 6 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ZhouXing19)

pkg/sql/conn_executor.go line 1077 at r1 (raw file):

	ex.extraTxnState.createdSequences = make(map[descpb.ID]struct{})

	enableMultipleActivePortals.SetOnChange(&s.cfg.Settings.SV, func(ctx context.Context) {

I know I was the one who suggested to put this into newConnExecutor, but this is incorrect. newConnExecutor is called for each new session which results in adding a new callback to this cluster setting for each session, and we don't want that. We want to add a callback only once per SQL server, so ideally we'd do so when evaluating server.newSQLServer.

Perhaps we should define a new exported method next to where enableMultipleActivePortals is defined, and that method would be called by server.newSQLServer to add the callback. Another alternative is to export enableMultipleActivePortals itself so that we'd call SetOnChange directly from server/server_sql.go. Probably I like the latter option the most.

pkg/sql/colflow/vectorized_flow.go line 299 at r2 (raw file):

				Err: errors.AssertionFailedf(
					"batchFlowCoordinator should be nil for vectorizedFlow",
				)})

nit: we should call recv.ProducerDone here.

pkg/sql/physicalplan/aggregator_funcs_test.go line 87 at r3 (raw file):

		t.Fatal(err)
	}
	flow.Run(ctx, false /* noWait */)

nit: this change belongs in the second commit.

This commit added a telemetry counter `MultipleActivePortalCounter` that would be incremented each time the cluster setting `sql.pgwire.multiple_active_portals.enabled` is set to true Release note: None

For pausable portals, each execution needs to resume the processor with new output receiver. We don't need to restart the processors, and this `Resume()` step can be called many times after `Run()` is called. Release note: None

To execute portals in an interleaving manner, we need to persist the flow and queryID so that we can _continue_ fetching the result when we come back to the same portal. We now introduce `pauseInfo` field in `sql.PreparedPortal` that stores this metadata. It's set during the first-time execution of an engine, and all following executions will reuse the flow and the queryID. This also implies that these resources should not be cleaned up with the end of each execution. Implementation for the clean-up steps is included in the next commit. Also, in this commit we hang a `*PreparedPortal` to the planner, and how it is set can be seen in the next commit as well. Release note: None

ZhouXing19

the linter is unhappy about some of the unused stuff.

I removed fields that are unused for now.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @yuzefovich)

pkg/sql/conn_executor.go line 1077 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

I know I was the one who suggested to put this into newConnExecutor, but this is incorrect. newConnExecutor is called for each new session which results in adding a new callback to this cluster setting for each session, and we don't want that. We want to add a callback only once per SQL server, so ideally we'd do so when evaluating server.newSQLServer.

Perhaps we should define a new exported method next to where enableMultipleActivePortals is defined, and that method would be called by server.newSQLServer to add the callback. Another alternative is to export enableMultipleActivePortals itself so that we'd call SetOnChange directly from server/server_sql.go. Probably I like the latter option the most.

Yup, I took the export solution.

yuzefovich

Also, we'll need to backport this to 23.1, so add the corresponding label and then request an approval in release-backports slack channel from Rafi (I'm not a TL).

Reviewed 20 of 20 files at r4, 13 of 13 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ZhouXing19)

ZhouXing19 · 2023-03-22T03:53:03Z

TFTR!!

bors r=yuzefovich

craig · 2023-03-22T09:09:03Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2023-03-22T09:09:25Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

Backport to branch 23.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

99663: sql: update connExecutor logic for pausable portals r=ZhouXing19 a=ZhouXing19 This PR replaces #96358 and is part of the initial implementation of multiple active portals. ---- This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it): 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with a _non-distributed_ plan. This feature is gated by a session variable `multiple_active_portals_enabled`. When it's set `true`, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire `Bind` stmt. The core idea of this implementation is 1. Add a `switchToAnotherPortal` status to the result-consumption state machine. When we receive an `ExecPortal` message for a different portal, we simply return the control to the connExecutor. (#99052) 2. Persist `flow` `queryID` `span` and `instrumentationHelper` for the portal, and reuse it when we re-execute a portal. This is to ensure we _continue_ the fetching rather than starting all over. (#99173) 3. To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR) Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code. Also, we don't support distributed plan yet, as it involves much more complicated changes. See `Start with an entirely local plan` section in the [design doc](https://docs.google.com/document/d/1SpKTrTqc4AlGWBqBNgmyXfTweUUsrlqIaSkmaXpznA8/edit). Support for this will come as a follow-up. Epic: CRDB-17622 Release note (sql change): initial support for multiple active portals. Now with session variable `multiple_active_portals_enabled` set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan. 99947: ui: small fixes to DB Console charts shown for secondary tenants r=dhartunian a=abarganier #97995 updated the DB Console to filter out KV-specific charts from the metrics page when viewing DB Console as a secondary application tenant. The PR missed a couple small details. This patch cleans those up with the following: - Removes KV latency charts for app tenants - Adds a single storage graph for app tenants showing livebytes - Removes the "Capacity" chart on the Overview dashboard for app tenants Release note: none Epic: https://cockroachlabs.atlassian.net/browse/CRDB-12100 NB: Please only review the final commit. 1st commit is being reviewed separately @ #99860 100188: changefeedccl: pubsub sink refactor to batching sink r=rickystewart a=samiskin Epic: https://cockroachlabs.atlassian.net/browse/CRDB-13237 This change is a followup to #99086 which moves the Pubsub sink to the batching sink framework. The changes involve: 1. Moves the Pubsub code to match the `SinkClient` interface, moving to using the lower level v1 pubsub API that lets us publish batches manually 3. Removing the extra call to json.Marshal 4. Moving to using the `pstest` package for validating results in unit tests 5. Adding topic handling to the batching sink, where batches are created per-topic 6. Added a pubsub_sink_config since it can now handle Retry and Flush config settings 7. Added metrics even to the old pubsub for the purpose of comparing the two versions At default settings, this resulted in a peak of 90k messages per second on a single node with throughput at 27.6% cpu usage, putting it at a similar level to kafka. Running pubsub v2 across all of TPCC (nodes ran out of ranges at different speeds): <img width="637" alt="Screenshot 2023-03-30 at 3 38 25 PM" src="https://user-images.githubusercontent.com/6236424/229863386-edaee27d-9762-4806-bab6-e18b8a6169d6.png"> Running pubsub v1 (barely visible, 2k messages per second) followed by v2 on tpcc.order_line (in v2 only 2 nodes ended up having ranges assigned to them): <img width="642" alt="Screenshot 2023-04-04 at 12 53 45 PM" src="https://user-images.githubusercontent.com/6236424/229863507-1883ea45-d8ce-437b-9b9c-550afec68752.png"> In the following graphs from the cloud console, where v1 was ran followed by v2, you can see how the main reason v1 was slow was that it wasn't able to batch different keys together. <img width="574" alt="Screenshot 2023-04-04 at 12 59 51 PM" src="https://user-images.githubusercontent.com/6236424/229864083-758c0814-d53c-447e-84c3-471cf5d56c44.png"> Publish requests remained the same despite way more messages in v2 <img width="1150" alt="Screenshot 2023-04-04 at 1 46 51 PM" src="https://user-images.githubusercontent.com/6236424/229875314-6e07177e-62c4-4c15-b13f-f75e8143e011.png"> Release note (performance improvement): pubsub sink changefeeds can now support higher throughputs by enabling the changefeed.new_pubsub_sink_enabled cluster setting. 100620: pkg/server: move DataDistribution to systemAdminServer r=dhartunian a=abarganier The DataDistribution endpoint reports replica counts by database and table. When it was built, it operated off the assumption that a range would only ever contain a single table's data within. Now that we have coalesced ranges, a single range can span multiple tables. Unfortunately, the DataDistribution endpoint does not take this fact into account, meaning it reports garbled and inaccurate data, unless the `spanconfig.storage_coalesce_adjacent.enabled` setting is set to false (see #98820). For secondary tenants, ranges are *always* coalesced, so this endpoint in its current state could never report meaningful data for a tenant. Given all of this, we have decided to make this endpoint only available for the system tenant. This patch accomplishes this by moving the endpoint away from the adminServer and into the systemAdminServer, making it effectively unimplemented for secondary tenants. Release note: none Informs: #97942 Co-authored-by: Jane Xing <zhouxing@uchicago.edu> Co-authored-by: Alex Barganier <abarganier@cockroachlabs.com> Co-authored-by: Shiranka Miskin <shiranka.miskin@gmail.com>

ZhouXing19 requested review from yuzefovich and a team March 21, 2023 23:13

ZhouXing19 requested review from a team as code owners March 21, 2023 23:13

ZhouXing19 force-pushed the resume-0321 branch from 2aea386 to a54d8c2 Compare March 21, 2023 23:18

yuzefovich reviewed Mar 21, 2023

View reviewed changes

ZhouXing19 force-pushed the resume-0321 branch from a54d8c2 to f0dffa9 Compare March 22, 2023 02:52

ZhouXing19 requested a review from a team as a code owner March 22, 2023 02:52

ZhouXing19 added 3 commits March 21, 2023 19:55

sql: add telemetry MultipleActivePortalCounter

80ef781

This commit added a telemetry counter `MultipleActivePortalCounter` that would be incremented each time the cluster setting `sql.pgwire.multiple_active_portals.enabled` is set to true Release note: None

ZhouXing19 force-pushed the resume-0321 branch from f0dffa9 to a79adba Compare March 22, 2023 02:56

ZhouXing19 commented Mar 22, 2023

View reviewed changes

yuzefovich approved these changes Mar 22, 2023

View reviewed changes

ZhouXing19 added the backport-23.1.x Flags PRs that need to be backported to 23.1 label Mar 22, 2023

craig bot merged commit 704f5ad into cockroachdb:master Mar 22, 2023

blathers-crl bot mentioned this pull request Mar 22, 2023

release-23.1: sql: enable resumption of a flow for pausable portals #99198

Closed

This was referenced Mar 22, 2023

release-23.1: sql: enable resumption of a flow for pausable portals #99237

Merged

sql: update connExecutor logic for pausable portals #99663

Merged

ZhouXing19 mentioned this pull request Apr 10, 2023

release-23.1: sql: update connExecutor logic for pausable portals #101026

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: enable resumption of a flow for pausable portals #99173

sql: enable resumption of a flow for pausable portals #99173

ZhouXing19 commented Mar 21, 2023 •

edited

Loading

cockroach-teamcity commented Mar 21, 2023

ZhouXing19 commented Mar 21, 2023

yuzefovich left a comment

ZhouXing19 left a comment

yuzefovich left a comment

ZhouXing19 commented Mar 22, 2023

craig bot commented Mar 22, 2023

blathers-crl bot commented Mar 22, 2023

sql: enable resumption of a flow for pausable portals #99173

sql: enable resumption of a flow for pausable portals #99173

Conversation

ZhouXing19 commented Mar 21, 2023 • edited Loading

sql: add telemetry MultipleActivePortalCounter

sql: add Resume method for flowinfra.Flow and execinfra.Processor

sql: reuse flow for pausable portal

cockroach-teamcity commented Mar 21, 2023

ZhouXing19 commented Mar 21, 2023

yuzefovich left a comment

Choose a reason for hiding this comment

ZhouXing19 left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

ZhouXing19 commented Mar 22, 2023

craig bot commented Mar 22, 2023

blathers-crl bot commented Mar 22, 2023

ZhouXing19 commented Mar 21, 2023 •

edited

Loading