Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: make sure that "inner" plans use the LeafTxn if the "outer" does #98120

Merged
merged 1 commit into from
Mar 10, 2023

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Mar 7, 2023

This commit fixes a bug where "inner" plans could incorrectly use
the RootTxn when the "outer" plan used the LeafTxn. One example of such
situation is when the "main" query is using the streamer (and thus is
using the LeafTxn) and also has an apply join, but the apply join
iteration plans would use the RootTxn. This could lead to "concurrent
txn usage" detected on the RootTxn. This problem is fixed by auditing
all code paths that might run plans that can spin up "inner" plans and
plumbing the information that the LeafTxn must be used by those "inner"
plans via the planner (we don't really have any other more convenient
place to do that plumbing).

Note that when create the flow for the main query we only know for sure
whether it'll use the LeafTxn or not only after the flow setup is
complete, so we adjust an existing finishedSetupFn callback to check
the type of the txn that the flow ends up using and update the planner
accordingly.

This bug reliably reproduces when creating a materialized view, but for
some (unknown to me) reason just running the query as is doesn't seem to
trigger the bug (I tried stressing the query with no luck and decided it
wasn't worth spending more time on it). I also believe that even though
the underlying mechanism for the bug has been present since forever, it
was really introduced only when we enabled the streamer by default in
22.2 (since without the streamer we always use the RootTxn for flows
with apply joins or UDFs - they must be local).

Fixes: #97989.

Release note (bug fix): CockroachDB previously could encounter
"concurrent txn use detected" internal error in some rare cases, and
this is now fixed. The bug was introduced in 22.2.0.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich yuzefovich force-pushed the fix-apply-join-txn branch 2 times, most recently from 584f523 to c7f3b6a Compare March 8, 2023 17:31
@yuzefovich yuzefovich marked this pull request as ready for review March 8, 2023 17:31
@yuzefovich yuzefovich requested a review from a team as a code owner March 8, 2023 17:31
@yuzefovich yuzefovich requested review from a team March 8, 2023 17:31
@yuzefovich yuzefovich requested a review from a team as a code owner March 8, 2023 17:31
@yuzefovich yuzefovich requested review from miretskiy, msirek, mgartner and DrewKimball and removed request for a team, miretskiy and msirek March 8, 2023 17:31
This commit fixes a bug where "inner" plans could incorrectly use
the RootTxn when the "outer" plan used the LeafTxn. One example of such
situation is when the "main" query is using the streamer (and thus is
using the LeafTxn) and also has an apply join, but the apply join
iteration plans would use the RootTxn. This could lead to "concurrent
txn usage" detected on the RootTxn. This problem is fixed by auditing
all code paths that might run plans that can spin up "inner" plans and
plumbing the information that the LeafTxn must be used by those "inner"
plans via the planner (we don't really have any other more convenient
place to do that plumbing).

Note that when create the flow for the main query we only know for sure
whether it'll use the LeafTxn or not only after the flow setup is
complete, so we adjust an existing `finishedSetupFn` callback to check
the type of the txn that the flow ends up using and update the planner
accordingly.

This bug reliably reproduces when creating a materialized view, but for
some (unknown to me) reason just running the query as is doesn't seem to
trigger the bug (I tried stressing the query with no luck and decided it
wasn't worth spending more time on it). I also believe that even though
the underlying mechanism for the bug has been present since forever, it
was really introduced only when we enabled the streamer by default in
22.2 (since without the streamer we always use the RootTxn for flows
with apply joins or UDFs - they must be local).

Release note (bug fix): CockroachDB previously could encounter
"concurrent txn use detected" internal error in some rare cases, and
this is now fixed. The bug was introduced in 22.2.0.
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Reviewed 14 of 14 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner)

@yuzefovich
Copy link
Member Author

TFTR! And nice job figuring that bug out!

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 10, 2023

Build failed:

@yuzefovich
Copy link
Member Author

Unrelated flake.

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 10, 2023

Build succeeded:

@craig craig bot merged commit d4a584e into cockroachdb:master Mar 10, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 10, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from ed7feb0 to blathers/backport-release-22.2-98120: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 3, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 4, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 4, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 4, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 6, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 7, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
ZhouXing19 added a commit to ZhouXing19/cockroach that referenced this pull request Apr 10, 2023
When resuming a portal, we always reset the planner. However we still need the
planner to respect the outer txn's situation, as we did in cockroachdb#98120.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

distsql: apply-join can capture the wrong transaction
3 participants