Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: remove PrepareRetryableError from the (*kv.Txn) API #86361

Conversation

ajwerner
Copy link
Contributor

@ajwerner ajwerner commented Aug 18, 2022

This commit removes the PrepareRetryableError API from the *kv.Txn and the
TxnSender. This error was never what anybody wanted. It has the odd side-effect
of leaving the transaction usable -- and, worse, the (*kv.Txn).exec loop would
retry but would not restart or replace the transaction if such an error were
returned.

It then renames GenerateForcedRetriableError to GenerateRetriableAbortedError.
The advantage of aborting the transaction is that it means that we can use this after
rolling back the transaction to trigger a restart. The alternative API, CleanupOnError
also aborts the transaction, so that's no different. The difference is that the one cannot
create a retriable error from an already aborted transaction unless it itself came from
a TransactionAbortedError.

All of this change feels important, but a little off. The core motivation is that we'd
like to unify the logic related to descriptor management between the sql layer
and the descs layer. This is part of an initiative to make the InternalExecutor
more sane. We've realized that in many cases, we'd like to couple not just the
*kv.Txn to the descs.Collection, but also to an InternalExecutor. The change
in #82477 makes progress in that
direction. It, however, somewhat awkwardly leaves two nearly identical code paths
in (*descs.CollectionFactory).Txn and (*descs.CollectionFactory).TxnWithExecutor.
In #86301, we attempt to unify these and are almost successful (see here).

The problem is that the new, and nicer method TxnWithExecutor is buggy in two ways (but
it's okay because literally nothing currently uses that better API). The first bug is that the
commitTxnFn is going to, well, commit the *Txn, so it's weird for us to fully redo the
same work (and it is the same work) after the transaction is committed. The second bug is
that if that work fails, we're going to get one of those errors from PrepareRetryableError
and return it, but that won't lead to the transaction getting restarted because of this quirky
line in (*kv.Txn).PrepareForRetry:

cockroach/pkg/kv/txn.go

Lines 1018 to 1021 in bd8bc01

retryErr := txn.mu.sender.GetTxnRetryableErr(ctx)
if retryErr == nil {
return
}

Even if we called PrepareForRetry because we thought we had a retryable error, we don't
have a claim that the transaction thinks it has a retryable error. This change happened in
#74563. It seems to me like something should either complain if the transaction does not think
it has a retryable error but we get one or it should use the one it gets. Maybe that's all I really
need to do here.

Release justification: important correctness change as part of making the
SQL-over-HTTP APIs work for index recommendations.

Release note: None

Release justification: Does not change production code, only comments.

Release note: None
@ajwerner ajwerner requested a review from a team as a code owner August 18, 2022 01:09
@ajwerner ajwerner requested a review from a team August 18, 2022 01:09
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

I don't know that this is right either.

@ajwerner ajwerner force-pushed the ajwerner/kv-remove-PrepareRetryableError branch from e21b109 to bd8bc01 Compare August 18, 2022 02:17
This error was never what anybody wanted. It has the odd side-effect
of leaving the transaction usable -- and, worse, the (*kv.Txn).exec loop would
retry but would not restart or replace the transaction if such an error were
returned.

Release justification: important correctness change as part of making the
SQL-over-HTTP APIs work for index recommendations.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/kv-remove-PrepareRetryableError branch 2 times, most recently from 92f3ff6 to c475cc7 Compare August 18, 2022 03:25
@ajwerner ajwerner marked this pull request as draft August 18, 2022 04:47
@ajwerner
Copy link
Contributor Author

I'm going to need help figuring out the right thing here.

@ajwerner ajwerner closed this Aug 19, 2022
@ajwerner
Copy link
Contributor Author

The logic in #86427 avoids any hazards due to the weird behavior here. A follow-up to that PR will remove (*kv.Txn).PrepareRetryableError and, ideally, make the rough edge of retrying in the loo harder to hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants