Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

txnwait: increase TxnLivenessThreshold significantly, reduce txn aborts under load #36748

Merged

Commits on Apr 10, 2019

  1. kv: ignore result of txn heartbeat after EndTransaction

    The changes in cockroachdb#33396 made it so that a `HeartbeatTxnRequest` that finds
    a missing transaction record will attempt to create the record. This
    means that it will discover if a transaction is not "committable" and
    return a TransactionAbortedError. Unfortunately, after a transaction
    commits and GCs its transaction record, it will also be considered not
    "committable".
    
    There will always be cases where a heartbeat request races with an
    EndTransaction request and incorrectly considers the transaction
    aborted (which is touched upon in a TODO a few lines down). However,
    in many of these cases, the coordinator already knows that the transaction
    is committed, so it doesn't need to attempt to roll back the transaction
    and clean up its intents.
    
    This commit checks for these cases and avoids sending useless rollbacks.
    The intention is to backport this to 19.1.
    
    Release note: None
    nvanbenschoten committed Apr 10, 2019
    Configuration menu
    Copy the full SHA
    ee59435 View commit details
    Browse the repository at this point in the history

Commits on Apr 15, 2019

  1. txnwait: increase TxnLivenessThreshold significantly

    This change increases the duration between transaction heartbeats
    before a transaction is considered expired from 2 seconds to 5 seconds.
    This has been found to dramatically reduce the frequency of transaction
    aborts when a cluster is under significant load. This is not expected
    to noticeably hurt cluster availability in the presence of dead nodes
    because we already have availability loss on the order of 9 seconds due
    to the epoch-based lease duration.
    
    This is especially important now that the hack in cockroachdb#25034 is gone. That
    hack was hiding some of this badness and giving transactions a bit more
    room to avoid being aborted.
    
    Release note: None
    nvanbenschoten committed Apr 15, 2019
    Configuration menu
    Copy the full SHA
    9147c58 View commit details
    Browse the repository at this point in the history