Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: base transaction retries on error codes #953

Merged
merged 7 commits into from
Mar 9, 2020

Conversation

schmidt-sebastian
Copy link
Contributor

@schmidt-sebastian schmidt-sebastian commented Mar 4, 2020

This PR implements consistent retry behavior all transaction methods:

As per go/transaction-retry-matrix, it implements the following behavior by error code and RPC:

RPC: BeginTransaction, Rollback

  • Implicit retry for DEADLINE_EXCEEDED, INTERNAL and UNAVAILABLE with backoff (not in this PR since this is done by google-gax)

RPC: RunQuery, BatchGetDocuments, Commit

  • Restart transaction for ABORTED (no backoff).
  • Restart transaction for CANCELLED, UNKNOWN, DEADLINE_EXCEEDED, INTERNAL, UNAVAILABLE, UNAUTHENTICATED, RESOURCE_EXHAUSTED (with backoff).
    • Use maximum backoff for RESOURCE_EXHAUSTED.

A restart consists of a rollback and a BeginTransaction RPC with a retryTransaction.

I also updated the test cases to allow us to verify the backoff behavior.

Fixes #889
Fixes #827

@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 4, 2020
@schmidt-sebastian schmidt-sebastian force-pushed the mrschmidt/transactions branch 2 times, most recently from c757ce5 to 65c61cb Compare March 5, 2020 00:58
@schmidt-sebastian schmidt-sebastian changed the title feat: use error codes for transaction retries feat: base transaction retries on error codes Mar 5, 2020
Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions. Also, I don't feel competent to review dev/test/transaction.ts. There's some kind of higher-order ninjitsu going on there that I don't grok.

@@ -419,6 +422,9 @@ export class Transaction {
);
}

this._writeBatch._reset();
await this.maybeBackoff(lastError);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

async/await is amazing. ❤️

this._backoff.resetToMax();
} else if (error.code === Status.ABORTED) {
// We don't backoff for ABORTED to avoid starvation
this._backoff.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contradicts comments in go/transaction-retry-matrix. Do we have experimental data to support that starvation is a problem? Given multiple clients contending on a write, multiple clients failing to back off (with randomness) means they'll never make progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked with the Firestore team and removed the special-casing on ABORTED, which simplifies the code a bit since we now always back off. This also allowed me to simplify the test suite.

if (error.code !== undefined) {
// This list is based on https://github.com/firebase/firebase-js-sdk/blob/master/packages/firestore/src/core/transaction_runner.ts#L112
switch (error.code) {
case Status.ABORTED:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth distinguishing retry-the-rpc vs retry-the-whole-transaction errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The per RPC retry is actually driven by GAX. The one downside of this approach is that if say a RunQuery RPC fails with DEADLINE_EXCEEDED, the RPC will retried by GAX first. If GAX retries don't resolve the issue, then we will restart the transaction.

It's probably possible to turn off GAX retries, but it is a pretty invasive change.

Copy link
Contributor

@wilhuff wilhuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wilhuff wilhuff assigned schmidt-sebastian and unassigned wilhuff Mar 9, 2020
@codecov
Copy link

codecov bot commented Mar 9, 2020

Codecov Report

Merging #953 into master will increase coverage by 0.06%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #953      +/-   ##
==========================================
+ Coverage   97.39%   97.45%   +0.06%     
==========================================
  Files          25       25              
  Lines       15875    15896      +21     
  Branches     1258     1193      -65     
==========================================
+ Hits        15461    15491      +30     
+ Misses        411      402       -9     
  Partials        3        3              
Impacted Files Coverage Δ
dev/src/transaction.ts 96.47% <100.00%> (-1.97%) ⬇️
dev/src/order.ts 98.09% <0.00%> (-1.15%) ⬇️
dev/src/pool.ts 97.75% <0.00%> (-0.90%) ⬇️
dev/src/watch.ts 98.77% <0.00%> (-0.37%) ⬇️
dev/src/serializer.ts 98.77% <0.00%> (-0.25%) ⬇️
dev/src/validate.ts 98.05% <0.00%> (-0.25%) ⬇️
dev/src/index.ts 98.62% <0.00%> (-0.07%) ⬇️
dev/src/reference.ts 99.83% <0.00%> (-0.05%) ⬇️
dev/src/v1/firestore_admin_client.ts 91.09% <0.00%> (+0.70%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e1647c...8d13959. Read the comment docs.

@schmidt-sebastian schmidt-sebastian merged commit 4a30820 into master Mar 9, 2020
schmidt-sebastian added a commit to schmidt-sebastian/java-firestore that referenced this pull request Mar 11, 2020
schmidt-sebastian added a commit to schmidt-sebastian/java-firestore that referenced this pull request Mar 12, 2020
schmidt-sebastian added a commit to schmidt-sebastian/java-firestore that referenced this pull request Mar 12, 2020
BenWhitehead pushed a commit to googleapis/java-firestore that referenced this pull request Apr 1, 2020
Port of googleapis/nodejs-firestore#953

Implements go/transaction-retry-matrix for Java.

Transaction retries are now dependent upon error code, and will only be retried on specific error codes.
This change takes care of all of the state tracking and cleanup necessary to retry transactions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
3 participants