-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make sure that "inner" plans use the LeafTxn if the "outer" does #98120
Conversation
584f523
to
c7f3b6a
Compare
This commit fixes a bug where "inner" plans could incorrectly use the RootTxn when the "outer" plan used the LeafTxn. One example of such situation is when the "main" query is using the streamer (and thus is using the LeafTxn) and also has an apply join, but the apply join iteration plans would use the RootTxn. This could lead to "concurrent txn usage" detected on the RootTxn. This problem is fixed by auditing all code paths that might run plans that can spin up "inner" plans and plumbing the information that the LeafTxn must be used by those "inner" plans via the planner (we don't really have any other more convenient place to do that plumbing). Note that when create the flow for the main query we only know for sure whether it'll use the LeafTxn or not only after the flow setup is complete, so we adjust an existing `finishedSetupFn` callback to check the type of the txn that the flow ends up using and update the planner accordingly. This bug reliably reproduces when creating a materialized view, but for some (unknown to me) reason just running the query as is doesn't seem to trigger the bug (I tried stressing the query with no luck and decided it wasn't worth spending more time on it). I also believe that even though the underlying mechanism for the bug has been present since forever, it was really introduced only when we enabled the streamer by default in 22.2 (since without the streamer we always use the RootTxn for flows with apply joins or UDFs - they must be local). Release note (bug fix): CockroachDB previously could encounter "concurrent txn use detected" internal error in some rare cases, and this is now fixed. The bug was introduced in 22.2.0.
c7f3b6a
to
ed7feb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)
TFTR! And nice job figuring that bug out! bors r+ |
Build failed: |
Unrelated flake. bors r+ |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from ed7feb0 to blathers/backport-release-22.2-98120: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
When resuming a portal, we always reset the planner. However we still need the planner to respect the outer txn's situation, as we did in cockroachdb#98120. Release note: None
This commit fixes a bug where "inner" plans could incorrectly use
the RootTxn when the "outer" plan used the LeafTxn. One example of such
situation is when the "main" query is using the streamer (and thus is
using the LeafTxn) and also has an apply join, but the apply join
iteration plans would use the RootTxn. This could lead to "concurrent
txn usage" detected on the RootTxn. This problem is fixed by auditing
all code paths that might run plans that can spin up "inner" plans and
plumbing the information that the LeafTxn must be used by those "inner"
plans via the planner (we don't really have any other more convenient
place to do that plumbing).
Note that when create the flow for the main query we only know for sure
whether it'll use the LeafTxn or not only after the flow setup is
complete, so we adjust an existing
finishedSetupFn
callback to checkthe type of the txn that the flow ends up using and update the planner
accordingly.
This bug reliably reproduces when creating a materialized view, but for
some (unknown to me) reason just running the query as is doesn't seem to
trigger the bug (I tried stressing the query with no luck and decided it
wasn't worth spending more time on it). I also believe that even though
the underlying mechanism for the bug has been present since forever, it
was really introduced only when we enabled the streamer by default in
22.2 (since without the streamer we always use the RootTxn for flows
with apply joins or UDFs - they must be local).
Fixes: #97989.
Release note (bug fix): CockroachDB previously could encounter
"concurrent txn use detected" internal error in some rare cases, and
this is now fixed. The bug was introduced in 22.2.0.