Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: integrate raft async storage writes #94165

Merged

Commits on Feb 2, 2023

  1. kv: integrate raft async storage writes

    Fixes cockroachdb#17500.
    Waiting on github.com/cockroachdb/pebble/pull/2117.
    
    This commit integrates with the `AsyncStorageWrites` functionality that
    we added to Raft in github.com/etcd-io/raft/pull/8.
    
    \## Approach
    
    The commit makes the minimal changes needed to integrate with async
    storage writes and pull fsyncs out of the raft state machine loop. It
    does not make an effort to extract the non-durable portion of raft log
    writes or raft log application onto separate goroutine pools, as was
    described in cockroachdb#17500. Those changes will also be impactful, but they're
    non trivial and bump into a pipelining vs. batching trade-off, so they
    are left as future work items (TODO(nvanbenschoten): open new issues).
    
    With this change, asynchronous Raft log syncs are enabled by the new
    `DB.ApplyNoSyncWait` Pebble API introduced in github.com/cockroachdb/pebble/pull/2117.
    The `handleRaftReady` state machine loop continues to initiate Raft log
    writes, but it uses the Pebble API to offload waiting on durability to a
    separate goroutine. This separate goroutine then sends the corresponding
    `MsgStorageAppend`'s response messages where they need to go (locally
    and/or to the Raft leader) when the fsync completes. The async storage
    writes functionality in Raft makes this all safe.
    
    \## Benchmark Results
    
    The result of this change is reduced interference between Raft
    proposals. As a result, it reduces end-to-end commit latency.
    
    github.com/etcd-io/raft/pull/8 presented a collection of benchmark
    results captured from integrating async storage writes with rafttoy.
    
    When integrated into CockroachDB, we see similar improvements to average
    and tail latency. However, it doesn't provide the throughput
    improvements at the top end because log appends and state machine
    application have not yet been extracted into separate goroutine pools,
    which would facilitate increased opportunity for batching.
    
    TODO: add images
    
    ----
    
    Release note (performance improvement): The Raft proposal pipeline
    has been optimized to reduce interference between Raft proposals.
    This improves average and tail write latency at high concurrency.
    nvanbenschoten committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    702ff6f View commit details
    Browse the repository at this point in the history
  2. kv: stop limiting AckCommittedEntriesBeforeApplication to Raft log's …

    …previous LastIndex
    
    This commit removes logic in `apply.Task.AckCommittedEntriesBeforeApplication`
    which was ensuring that we don't acknowledge committed Raft log entries before
    they were durable in a node's local Raft log. This is now ensured inside of the
    etcd/raft library when AsyncStorageWrites is enabled, as the comment added here
    describes.
    nvanbenschoten committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    c181563 View commit details
    Browse the repository at this point in the history
  3. kv: exercise blocking and non-blocking log sync interleaving in tests

    This commit adds a testing facility to randomly disable non-blocking log syncs
    in tests. This ensures that we properly handle the case where non-blocking log
    sync is disabled while some log syncs are still in-progress.
    nvanbenschoten committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    cb1efdc View commit details
    Browse the repository at this point in the history
  4. kv: shuffle local raft messages before delivery in tests

    This commit adds a testing facility to shuffle local raft messages before
    delivering them to the raft state machine. These are not required to be in
    order, so we shuffle in tests ensure that re-ordering is handled properly.
    nvanbenschoten committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    14ebedc View commit details
    Browse the repository at this point in the history
  5. kv: rename r.mu.last{Index,Term} to r.mu.last{Index,Term}NotDurable

    Simple rename to make the semantics of these fields more clear and
    harder to misinterpret.
    nvanbenschoten committed Feb 2, 2023
    Configuration menu
    Copy the full SHA
    ebc33fa View commit details
    Browse the repository at this point in the history