release-23.1: sql: update connExecutor logic for pausable portals #101026

ZhouXing19 · 2023-04-10T00:04:57Z

Backport 8/8 commits from #99663

This PR is to add limited support for multiple active portals. Now portals satisfying all following restrictions can be paused and resumed (i.e., with other queries interleaving it):

Not an internal query;
Read-only query;
No sub-queries or post-queries.

And such a portal will only have the statement executed with a non-distributed plan.

This feature is gated by a session variable multiple_active_portals_enabled. When it's set true, all portals that satisfy the restrictions above will automatically become "pausable" when being created via the pgwire Bind stmt.

The core idea of this implementation is

Add a switchToAnotherPortal status to the result-consumption state machine. When we receive an ExecPortal message for a different portal, we simply return the control to the connExecutor. (sql: add switchToAnotherPortal signal for result consumer #99052)
Persist flow queryID span and instrumentationHelper for the portal, and reuse it when we re-execute a portal. This is to ensure we continue the fetching rather than starting all over. (sql: enable resumption of a flow for pausable portals #99173)
To enable 2, we need to delay the clean-up of resources till we close the portal. For this we introduced the stacks of cleanup functions. (This PR)

Note that we kept the implementation of the original "un-pausable" portal, as we'd like to limit this new functionality only to a small set of statements. Eventually some of them should be replaced (e.g. the limitedCommandResult's lifecycle) with the new code.

Also, we don't support distributed plan yet, as it involves much more complicated changes. See Start with an entirely local plan section in the design doc. Support for this will come as a follow-up.

Epic: CRDB-17622

Release note (sql change): initial support for multiple active portals. Now with session variable multiple_active_portals_enabled set to true, portals satisfying all following restrictions can be executed in an interleaving manner: 1. Not an internal query; 2. Read-only query; 3. No sub-queries or post-queries. And such a portal will only have the statement executed with an entirely local plan.

Release justification: this is the implementation of an important feature

With the introduction of pausable portals, the comment for `limitedCommandResult` needs to be updated to reflect the current behavior. Release note: None

This change introduces a new session variable for a preview feature. When set to `true`, all non-internal portals with read-only [`SELECT`](../v23.1/selection-queries.html) queries without sub-queries or post-queries can be paused and resumed in an interleaving manner, but are executed with a local plan. Release note (SQL change): Added the session variable `multiple_active_portals_enabled`. This setting is only for a preview feature. When set to `true`, it allows multiple portals to be open at the same time, with their execution interleaved with each other. In other words, these portals can be paused. The underlying statement for a pausable portal must be a read-only `SELECT` query without sub-queries or post-queries, and such a portal is always executed with a local plan.

…e persistence This commit is part of the implementation of multiple active portals. In order to execute portals interleavingly, certain resources need to be persisted and their clean-up must be delayed until the portal is closed. Additionally, these resources don't need to be re-setup when the portal is re-executed. To achieve this, we store the cleanup steps in the `cleanup` function stacks in `portalPauseInfo`, and they are called when any of the following events occur: 1. SQL transaction is committed or rolled back 2. Connection executor is closed 3. An error is encountered when executing the portal 4. The portal is explicited closed by the user The cleanup functions should be called in the original order of a normal portal. Since a portal's execution follows the `execPortal() -> execStmtInOpenState() -> dispatchToExecutionEngine() -> flow.Run()` function flow, we categorize the cleanup functions into 4 "layers", which are stored accordingly in `PreparedPortal.pauseInfo`. The cleanup is always LIFO, following the - resumableFlow.cleanup - dispatchToExecutionEngine.cleanup - execStmtInOpenState.cleanup - exhaustPortal.cleanup` order. Additionally, if an error occurs in any layer, we clean up the current and proceeding layers. For example, if an error occurs in `execStmtInOpenState()`, we perform `resumableFlow.cleanup` and `dispatchToExecutionEngine.cleanup` (proceeding) and then `execStmtInOpenState.cleanup` (current) before returning the error to `execPortal()`, where `exhaustPortal.cleanup` will eventually be called. This is to maintain the previous clean-up process for portals as much as possible. We also pass the `PreparedPortal` as a reference to the planner in `execStmtInOpenState()`,so that the portal's flow can be set and reused. Release note: None

When executing or cleaning up a pausable portal, we may encounter an error and need to run the corresponding clean-up steps, where we need to check the latest `retErr` and `retPayload` rather than the ones evaluated when creating the cleanup functions. To address this, we use portal.pauseInfo.retErr and .retPayload to record the latest error and payload. They need to be updated for each execution. Specifically, 1. If the error happens during portal execution, we ensure `portal.pauseInfo` records the error by adding the following code to the main body of `execStmtInOpenState()`: ``` go defer func() { updateRetErrAndPayload(retErr, retPayload) }() ``` Note this defer should always happen _after_ the defer of running the cleanups. 2. If the error occurs during a certain cleanup step for the pausable portal, we ensure that cleanup steps after it can see the error by always having `updateRetErrAndPayload(retErr, retPayload)` run at the end of each cleanup step. Release note: None

This commit adds several restrictions to pausable portals to ensure that they work properly with the current changes to the consumer-receiver model. Specifically, pausable portals must meet the following criteria: 1. Not be internal queries 2. Be read-only queries 3. Not contain sub-queries or post-queries 4. Only use local plans These restrictions are necessary because the current changes to the consumer-receiver model only consider the local push-based case. Release note: None

When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None

Release note: None

We now only support multiple active portals with local plan, so explicitly disable it for this test for now. Release note: None

blathers-crl · 2023-04-10T00:05:01Z

cockroach-teamcity · 2023-04-10T00:05:19Z

This change is

rafiss

thanks for getting it through!

yuzefovich · 2023-04-10T21:35:28Z

Great work on getting this in! 🎉

ZhouXing19 added 8 commits April 9, 2023 19:28

pgwire: update comments for limitedCommandResult

3c5b406

With the introduction of pausable portals, the comment for `limitedCommandResult` needs to be updated to reflect the current behavior. Release note: None

sql: correctly set inner plan for leafTxn when resuming flow for portal

e0c6ced

When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None

pgwire: add tests for multiple active portals

20292aa

Release note: None

sql: disable multiple active portals for TestDistSQLReceiverDrainsMeta.

9da2397

We now only support multiple active portals with local plan, so explicitly disable it for this test for now. Release note: None

ZhouXing19 requested a review from a team as a code owner April 10, 2023 00:04

ZhouXing19 requested a review from a team April 10, 2023 00:04

ZhouXing19 requested review from a team as code owners April 10, 2023 00:04

ZhouXing19 requested a review from cucaroach April 10, 2023 00:04

ZhouXing19 removed the request for review from cucaroach April 10, 2023 00:05

ZhouXing19 requested review from rafiss and yuzefovich April 10, 2023 13:59

rafiss approved these changes Apr 10, 2023

View reviewed changes

ZhouXing19 merged commit 57c6f6e into cockroachdb:release-23.1 Apr 10, 2023

ZhouXing19 mentioned this pull request Apr 10, 2023

sql/pgwire: prototype for multiple active portals #91157

Closed

cockroach-teamcity mentioned this pull request Apr 11, 2023

PR #101026 - sql: add session variable for multiple active portals cockroachdb/docs#16724

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-23.1: sql: update connExecutor logic for pausable portals #101026

release-23.1: sql: update connExecutor logic for pausable portals #101026

ZhouXing19 commented Apr 10, 2023 •

edited

Loading

blathers-crl bot commented Apr 10, 2023

cockroach-teamcity commented Apr 10, 2023

rafiss left a comment

yuzefovich commented Apr 10, 2023

release-23.1: sql: update connExecutor logic for pausable portals #101026

release-23.1: sql: update connExecutor logic for pausable portals #101026

Conversation

ZhouXing19 commented Apr 10, 2023 • edited Loading

blathers-crl bot commented Apr 10, 2023

cockroach-teamcity commented Apr 10, 2023

rafiss left a comment

Choose a reason for hiding this comment

yuzefovich commented Apr 10, 2023

ZhouXing19 commented Apr 10, 2023 •

edited

Loading