Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tmp]sql: fix bugs for multiple active portals #100851

Closed
wants to merge 16 commits into from

Conversation

ZhouXing19
Copy link
Collaborator

@ZhouXing19 ZhouXing19 commented Apr 6, 2023

Can start from the commit with message:

--------- please review commits after this one ---------


This PR is based on #99663, and it contains fix for bugs revealed by enabling multiple_active_portals_enabled for metamorphic testing. I made this separated PR so that it's easier to review those the new bug-fix commits.

This PR will eventually be closed and the new commits will be merged with those for the main implementation in #99663.

Epic: None
Release note: None

With the introduction of pausable portals, the comment for `limitedCommandResult`
needs to be updated to reflect the current behavior.

Release note: None
This change introduces a new session variable for a preview feature. When set to `true`,
all non-internal portals with read-only [`SELECT`](../v23.1/selection-queries.html)
queries without sub-queries or post-queries can be paused and resumed in an interleaving
manner, but are executed with a local plan.

Release note (SQL change): Added the session variable `multiple_active_portals_enabled`.
This setting is only for a preview feature. When set to `true`, it allows
multiple portals to be open at the same time, with their execution interleaved
with each other. In other words, these portals can be paused. The underlying
statement for a pausable portal must be a read-only `SELECT` query without
sub-queries or post-queries, and such a portal is always executed with a local
plan.
…e persistence

This commit is part of the implementation of multiple active portals. In order to
execute portals interleavingly, certain resources need to be persisted and their
clean-up must be delayed until the portal is closed. Additionally, these resources
don't need to be re-setup when the portal is re-executed.

To achieve this, we store the cleanup steps in the `cleanup` function stacks in
`portalPauseInfo`, and they are called when any of the following events occur:

1. SQL transaction is committed or rolled back
2. Connection executor is closed
3. An error is encountered when executing the portal
4. The portal is explicited closed by the user

The cleanup functions should be called in the original order of a normal portal.
Since a portal's execution follows the `execPortal() -> execStmtInOpenState() ->
dispatchToExecutionEngine() -> flow.Run()` function flow, we categorize the cleanup
functions into 4 "layers", which are stored accordingly in `PreparedPortal.pauseInfo`.
The cleanup is always LIFO, following the

- resumableFlow.cleanup
- dispatchToExecutionEngine.cleanup
- execStmtInOpenState.cleanup
- exhaustPortal.cleanup`

order. Additionally, if an error occurs in any layer, we clean up the current and
proceeding layers. For example, if an error occurs in `execStmtInOpenState()`, we
perform `resumableFlow.cleanup` and `dispatchToExecutionEngine.cleanup` (proceeding)
and then `execStmtInOpenState.cleanup` (current) before returning the error to
`execPortal()`, where `exhaustPortal.cleanup` will eventually be called. This is to
maintain the previous clean-up process for portals as much as possible.

We also pass the `PreparedPortal` as a reference to the planner in
`execStmtInOpenState()`,so that the portal's flow can be set and reused.

Release note: None
When executing or cleaning up a pausable portal, we may encounter an error and
need to run the corresponding clean-up steps, where we need to check the latest
`retErr` and `retPayload` rather than the ones evaluated when creating the
cleanup functions.

To address this, we use portal.pauseInfo.retErr and .retPayload to record the
latest error and payload. They need to be updated for each execution.

Specifically,

1. If the error happens during portal execution, we ensure `portal.pauseInfo`
records the error by adding the following code to the main body of
`execStmtInOpenState()`:

``` go
defer func() {
	updateRetErrAndPayload(retErr, retPayload)
}()
```

Note this defer should always happen _after_ the defer of running the cleanups.

2. If the error occurs during a certain cleanup step for the pausable portal,
we ensure that cleanup steps after it can see the error by always having
`updateRetErrAndPayload(retErr, retPayload)` run at the end of each cleanup step.

Release note: None
This commit adds several restrictions to pausable portals to ensure that they
work properly with the current changes to the consumer-receiver model.
Specifically, pausable portals must meet the following criteria:

1. Not be internal queries
2. Be read-only queries
3. Not contain sub-queries or post-queries
4. Only use local plans

These restrictions are necessary because the current changes to the
consumer-receiver model only consider the local push-based case.

Release note: None
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
(This commit will be added to the previous commits once it's approved.)

This patch fixes a missed case where the last pgwire statement is to execute
a pausable portal _without_ a row limit, and then we close the connection (i.e.
without committing or rolling back the sql transaction). In this case, the
latest result stored in `PreparedPortal.pauseInfo` is a `pgwire.commandResult`,
instead of a `pgwire.limitedCommandResult`.

It means that in the cleanup step we need to consider the case with
`pgwire.commandResult`. To limit the scope of changes, I introduced this
`ErrAllowReleased()` method, and it is expected to be used only for pausable
portals.

Release note: None
When we encounter an `eventRetriableErrPayload`, we will eventually rollback
the transaction, so it makes sense to close all pausable portals. Otherwise,
we will get `leftover bytes` error.

Release note: None
…underlying stmt

A portal's "pausability" can be revoked in `dispatchToExecutionEngine()` if the
underlying statement contains sub/post queries. Thus, we should evaluate
whether a portal is pausable when executing the cleanup step.

Release note: None
If `stmt == ''`, the current `tree.IsAllowedToPause()` will get a nil pointer
error. This commit is to fix it.

Release note: None
We now have this new session var, so update the related logictests.

Release note: None
We now only support multiple active portals with local plan, so explicitly
disable it for this test for now.

Release note: None
Now we support limited version of multiple active portals, so updating the
existing tests here.

Release note: None
@ZhouXing19 ZhouXing19 requested a review from a team as a code owner April 6, 2023 18:52
@ZhouXing19 ZhouXing19 requested a review from a team April 6, 2023 18:52
@ZhouXing19 ZhouXing19 requested review from a team as code owners April 6, 2023 18:52
@ZhouXing19 ZhouXing19 requested review from rytaft, yuzefovich and rafiss and removed request for rytaft April 6, 2023 18:52
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ZhouXing19 ZhouXing19 changed the title sql: fix bugs for multiple active portals [tmp]sql: fix bugs for multiple active portals Apr 6, 2023
Copy link
Member

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fixes, and thanks for pulling everything out into separate commits! :lgtm:

Reviewed 12 of 15 files at r2, 3 of 7 files at r3, 3 of 6 files at r5, 1 of 1 files at r6, 4 of 4 files at r9, 1 of 1 files at r10, 1 of 1 files at r11, 2 of 2 files at r12, 1 of 1 files at r13, 3 of 3 files at r14, 1 of 1 files at r15, 1 of 1 files at r16, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ZhouXing19)


pkg/sql/logictest/testdata/logic_test/information_schema line 5263 at r14 (raw file):

max_identifier_length                                 128
max_index_keys                                        32
multiple_active_portals_enabled                       on

is this showing the default value? hm, that could be tricky since the default value is metamorphic.

Copy link
Collaborator Author

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! I'm closing the PR now and will fixup the commits in #99663.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rafiss)


pkg/sql/logictest/testdata/logic_test/information_schema line 5263 at r14 (raw file):

Previously, rafiss (Rafi Shamim) wrote…

is this showing the default value? hm, that could be tricky since the default value is metamorphic.

I think this could be done by having multiple_active_portals_enabled in the NOT IN list of the SELECT * FROM information_schema.session_variables stmt. Will push the changes in #99663.

@ZhouXing19 ZhouXing19 closed this Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants