-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: base transaction retries on error codes #953
Conversation
c757ce5
to
65c61cb
Compare
65c61cb
to
5c77e0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions. Also, I don't feel competent to review dev/test/transaction.ts
. There's some kind of higher-order ninjitsu going on there that I don't grok.
@@ -419,6 +422,9 @@ export class Transaction { | |||
); | |||
} | |||
|
|||
this._writeBatch._reset(); | |||
await this.maybeBackoff(lastError); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async/await
is amazing. ❤️
dev/src/transaction.ts
Outdated
this._backoff.resetToMax(); | ||
} else if (error.code === Status.ABORTED) { | ||
// We don't backoff for ABORTED to avoid starvation | ||
this._backoff.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This contradicts comments in go/transaction-retry-matrix. Do we have experimental data to support that starvation is a problem? Given multiple clients contending on a write, multiple clients failing to back off (with randomness) means they'll never make progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with the Firestore team and removed the special-casing on ABORTED, which simplifies the code a bit since we now always back off. This also allowed me to simplify the test suite.
if (error.code !== undefined) { | ||
// This list is based on https://github.com/firebase/firebase-js-sdk/blob/master/packages/firestore/src/core/transaction_runner.ts#L112 | ||
switch (error.code) { | ||
case Status.ABORTED: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth distinguishing retry-the-rpc vs retry-the-whole-transaction errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The per RPC retry is actually driven by GAX. The one downside of this approach is that if say a RunQuery RPC fails with DEADLINE_EXCEEDED, the RPC will retried by GAX first. If GAX retries don't resolve the issue, then we will restart the transaction.
It's probably possible to turn off GAX retries, but it is a pretty invasive change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report
@@ Coverage Diff @@
## master #953 +/- ##
==========================================
+ Coverage 97.39% 97.45% +0.06%
==========================================
Files 25 25
Lines 15875 15896 +21
Branches 1258 1193 -65
==========================================
+ Hits 15461 15491 +30
+ Misses 411 402 -9
Partials 3 3
Continue to review full report at Codecov.
|
Port of googleapis/nodejs-firestore#953 Implements go/transaction-retry-matrix for Java. Transaction retries are now dependent upon error code, and will only be retried on specific error codes. This change takes care of all of the state tracking and cleanup necessary to retry transactions.
This PR implements consistent retry behavior all transaction methods:
As per go/transaction-retry-matrix, it implements the following behavior by error code and RPC:
RPC: BeginTransaction, Rollback
RPC: RunQuery, BatchGetDocuments, Commit
A restart consists of a rollback and a BeginTransaction RPC with a
retryTransaction
.I also updated the test cases to allow us to verify the backoff behavior.
Fixes #889
Fixes #827