-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tmp]sql: fix bugs for multiple active portals #100851
Conversation
With the introduction of pausable portals, the comment for `limitedCommandResult` needs to be updated to reflect the current behavior. Release note: None
This change introduces a new session variable for a preview feature. When set to `true`, all non-internal portals with read-only [`SELECT`](../v23.1/selection-queries.html) queries without sub-queries or post-queries can be paused and resumed in an interleaving manner, but are executed with a local plan. Release note (SQL change): Added the session variable `multiple_active_portals_enabled`. This setting is only for a preview feature. When set to `true`, it allows multiple portals to be open at the same time, with their execution interleaved with each other. In other words, these portals can be paused. The underlying statement for a pausable portal must be a read-only `SELECT` query without sub-queries or post-queries, and such a portal is always executed with a local plan.
…e persistence This commit is part of the implementation of multiple active portals. In order to execute portals interleavingly, certain resources need to be persisted and their clean-up must be delayed until the portal is closed. Additionally, these resources don't need to be re-setup when the portal is re-executed. To achieve this, we store the cleanup steps in the `cleanup` function stacks in `portalPauseInfo`, and they are called when any of the following events occur: 1. SQL transaction is committed or rolled back 2. Connection executor is closed 3. An error is encountered when executing the portal 4. The portal is explicited closed by the user The cleanup functions should be called in the original order of a normal portal. Since a portal's execution follows the `execPortal() -> execStmtInOpenState() -> dispatchToExecutionEngine() -> flow.Run()` function flow, we categorize the cleanup functions into 4 "layers", which are stored accordingly in `PreparedPortal.pauseInfo`. The cleanup is always LIFO, following the - resumableFlow.cleanup - dispatchToExecutionEngine.cleanup - execStmtInOpenState.cleanup - exhaustPortal.cleanup` order. Additionally, if an error occurs in any layer, we clean up the current and proceeding layers. For example, if an error occurs in `execStmtInOpenState()`, we perform `resumableFlow.cleanup` and `dispatchToExecutionEngine.cleanup` (proceeding) and then `execStmtInOpenState.cleanup` (current) before returning the error to `execPortal()`, where `exhaustPortal.cleanup` will eventually be called. This is to maintain the previous clean-up process for portals as much as possible. We also pass the `PreparedPortal` as a reference to the planner in `execStmtInOpenState()`,so that the portal's flow can be set and reused. Release note: None
When executing or cleaning up a pausable portal, we may encounter an error and need to run the corresponding clean-up steps, where we need to check the latest `retErr` and `retPayload` rather than the ones evaluated when creating the cleanup functions. To address this, we use portal.pauseInfo.retErr and .retPayload to record the latest error and payload. They need to be updated for each execution. Specifically, 1. If the error happens during portal execution, we ensure `portal.pauseInfo` records the error by adding the following code to the main body of `execStmtInOpenState()`: ``` go defer func() { updateRetErrAndPayload(retErr, retPayload) }() ``` Note this defer should always happen _after_ the defer of running the cleanups. 2. If the error occurs during a certain cleanup step for the pausable portal, we ensure that cleanup steps after it can see the error by always having `updateRetErrAndPayload(retErr, retPayload)` run at the end of each cleanup step. Release note: None
This commit adds several restrictions to pausable portals to ensure that they work properly with the current changes to the consumer-receiver model. Specifically, pausable portals must meet the following criteria: 1. Not be internal queries 2. Be read-only queries 3. Not contain sub-queries or post-queries 4. Only use local plans These restrictions are necessary because the current changes to the consumer-receiver model only consider the local push-based case. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
Release note: None
(This commit will be added to the previous commits once it's approved.) This patch fixes a missed case where the last pgwire statement is to execute a pausable portal _without_ a row limit, and then we close the connection (i.e. without committing or rolling back the sql transaction). In this case, the latest result stored in `PreparedPortal.pauseInfo` is a `pgwire.commandResult`, instead of a `pgwire.limitedCommandResult`. It means that in the cleanup step we need to consider the case with `pgwire.commandResult`. To limit the scope of changes, I introduced this `ErrAllowReleased()` method, and it is expected to be used only for pausable portals. Release note: None
When we encounter an `eventRetriableErrPayload`, we will eventually rollback the transaction, so it makes sense to close all pausable portals. Otherwise, we will get `leftover bytes` error. Release note: None
…underlying stmt A portal's "pausability" can be revoked in `dispatchToExecutionEngine()` if the underlying statement contains sub/post queries. Thus, we should evaluate whether a portal is pausable when executing the cleanup step. Release note: None
Release note: None
If `stmt == ''`, the current `tree.IsAllowedToPause()` will get a nil pointer error. This commit is to fix it. Release note: None
We now have this new session var, so update the related logictests. Release note: None
We now only support multiple active portals with local plan, so explicitly disable it for this test for now. Release note: None
Now we support limited version of multiple active portals, so updating the existing tests here. Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fixes, and thanks for pulling everything out into separate commits!
Reviewed 12 of 15 files at r2, 3 of 7 files at r3, 3 of 6 files at r5, 1 of 1 files at r6, 4 of 4 files at r9, 1 of 1 files at r10, 1 of 1 files at r11, 2 of 2 files at r12, 1 of 1 files at r13, 3 of 3 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ZhouXing19)
pkg/sql/logictest/testdata/logic_test/information_schema
line 5263 at r14 (raw file):
max_identifier_length 128 max_index_keys 32 multiple_active_portals_enabled on
is this showing the default value? hm, that could be tricky since the default value is metamorphic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR! I'm closing the PR now and will fixup the commits in #99663.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)
pkg/sql/logictest/testdata/logic_test/information_schema
line 5263 at r14 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
is this showing the default value? hm, that could be tricky since the default value is metamorphic.
I think this could be done by having multiple_active_portals_enabled
in the NOT IN
list of the SELECT * FROM information_schema.session_variables
stmt. Will push the changes in #99663.
Can start from the commit with message:
--------- please review commits after this one ---------
This PR is based on #99663, and it contains fix for bugs revealed by enabling
multiple_active_portals_enabled
for metamorphic testing. I made this separated PR so that it's easier to review those the new bug-fix commits.This PR will eventually be closed and the new commits will be merged with those for the main implementation in #99663.
Epic: None
Release note: None