Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/catalog/descs: unify and fix TxnWithExecutor #86427

Merged
merged 4 commits into from
Aug 20, 2022

Conversation

ajwerner
Copy link
Contributor

The code was bogus before because it was trying to use the transaction
after committing. That's not allowed. Also, the retry behavior was unsound.
We make it sound by tracking when the invariant violation has happened
and propagate that up for a true restart.

Release justification: fixes very broken and needed code

Release note: None

@ajwerner ajwerner requested a review from a team August 18, 2022 22:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner ajwerner force-pushed the ajwerner/fix-descs-txn branch 2 times, most recently from b446e62 to 2bf92b3 Compare August 19, 2022 02:57
@ajwerner ajwerner requested a review from a team as a code owner August 19, 2022 02:57
@ajwerner ajwerner requested a review from a team August 19, 2022 02:57
@ajwerner ajwerner requested review from a team as code owners August 19, 2022 02:57
@ajwerner ajwerner requested review from miretskiy and removed request for a team August 19, 2022 02:57
@ajwerner ajwerner force-pushed the ajwerner/fix-descs-txn branch 2 times, most recently from e07e4ee to 1eb7cc6 Compare August 19, 2022 04:39
The code was bogus before because it was trying to use the transaction
after committing. That's not allowed. Also, the retry behavior was unsound.
We make it sound by tracking when the invariant violation has happened
and propagate that up for a true restart.

Release justification: fixes very broken and needed code

Release note: None
We need to track the jobs records and we need to run the jobs.

Release note: None

Release justification:
Release justification: testing only change

Release note: None
It was not being used.

Release justification: Import cleanup as part of a bug fix.

Release note: None
@ajwerner
Copy link
Contributor Author

@ZhouXing19 and/or @rafiss can y'all review this? I'm reasonably pleased with where it ended up. @maryliag would very much like #86433 merged ASAP.

Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing up the new interface, and LGTM! But maybe we want a stamp from @rafiss too.

Just wanted to confirm my understanding of some critical changes in this PR:

  1. Previously there were 2 sets of logic for validating the schema changes, one in CollectionFactory.Txn() and the other in ex.commitSQLTransactionInternal(). And even worse, in CollectionFactory.TxnWithExecutor(), we used both of them. We now unify them to the latter one.
  2. The retry invoked by a two-version invariant violation has been changed to first roll back the current kv.Txn, then propagate a retry error to the sql layer, and finally trigger the conn executor to reset the txn.

Do I understand it correctly?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)


-- commits line 20 at r4:
nit: release justification is missing

Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @miretskiy)


pkg/sql/internal.go line 1042 at r1 (raw file):

	}
	defer ex.close(ctx, externalTxnClose)
	return ex.commitSQLTransactionInternal(ctx)

Should we call ex.commitSQLTransaction() instead? Otherwise we're not applying the changes to provoke conn executor to reset txn when the two-version invariant is violated.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @ZhouXing19)


pkg/sql/internal.go line 1042 at r1 (raw file):

Otherwise we're not applying the changes to provoke conn executor to reset txn when the two-version invariant is violated.

The internal executor in question doesn't live past this function call, so what value does resetting its transaction have?

@ajwerner
Copy link
Contributor Author

The retry invoked by a two-version invariant violation has been changed to first roll back the current kv.Txn, then propagate a retry error to the sql layer, and finally trigger the conn executor to reset the txn.

This isn't quite right. We always rolled back the transaction when the invariant was violated, we just did it with a more confusing API. Before this change, we called (*kv.Txn).CleanupOnError and we had to synthesize an error to pass it. We then depended on how that error was handled implicitly rather than explicitly, but we also depended on an accompanying bool return value.

See CleanupOnError:

cockroach/pkg/kv/txn.go

Lines 691 to 700 in 00af7d7

// CleanupOnError cleans up the transaction as a result of an error.
func (txn *Txn) CleanupOnError(ctx context.Context, err error) {
if txn.typ != RootTxn {
panic(errors.WithContextTags(errors.AssertionFailedf("CleanupOnError() called on leaf txn"), ctx))
}
if err == nil {
panic(errors.WithContextTags(errors.AssertionFailedf("CleanupOnError() called with nil error"), ctx))
}
if replyErr := txn.rollback(ctx); replyErr != nil {

And Rollback:

cockroach/pkg/kv/txn.go

Lines 835 to 840 in 00af7d7

func (txn *Txn) Rollback(ctx context.Context) error {
if txn.typ != RootTxn {
return errors.WithContextTags(errors.AssertionFailedf("Rollback() called on leaf txn"), ctx)
}
return txn.rollback(ctx).GoError()

The code doesn't change the behavior in CheckTwoVersionInvariant, it just makes the way the result is communicated more explicit; rather than relying on a combination of a fake restart error and a boolean, it relies on an explicit error which can be detected.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! thanks for the new datadriven test too

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @miretskiy and @ZhouXing19)


-- commits line 20 at r4:

Previously, ZhouXing19 (Jane Xing) wrote…

nit: release justification is missing

i believe the justification is only needed in the PR (but confusingly there is a git hook that adds it to the commit msg too)

Copy link
Collaborator

@ZhouXing19 ZhouXing19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for the detailed explanation!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @miretskiy, and @ZhouXing19)


pkg/sql/internal.go line 1042 at r1 (raw file):

Previously, ajwerner wrote…

Otherwise we're not applying the changes to provoke conn executor to reset txn when the two-version invariant is violated.

The internal executor in question doesn't live past this function call, so what value does resetting its transaction have?

Ah I see, sorry I misunderstood.

@ajwerner
Copy link
Contributor Author

TFTR!

bors r+

@craig
Copy link
Contributor

craig bot commented Aug 19, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 20, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 20, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Aug 20, 2022

Build succeeded:

@craig craig bot merged commit a7fafb5 into cockroachdb:master Aug 20, 2022
craig bot pushed a commit that referenced this pull request Aug 20, 2022
86433: server: proper transaction state management in sql-over-http  r=ajwerner a=ajwerner

First 4 commits are #86427.
Next commit is #86461.

We need to construct the internal executor in the context of the transaction
so that we can make sure that its side-effects are properly managed. Without
this change, we'd be throwing away all of the extraTxnState between each
statement. We'd fail to create the jobs (which we defer to the end of the
transaction), and we'd fail to run those jobs and check for errors. We'd
also fail to validate the two-version invariant or wait for one version.

Fixes #86332

Release justification: Fixes critical bugs in new functionality.

Release note: None

Co-authored-by: Andrew Werner <awerner32@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants