Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kvserver: circuit-break requests to unavailable ranges #71806

Merged
merged 2 commits into from
Jan 14, 2022

Conversation

tbg
Copy link
Member

@tbg tbg commented Oct 21, 2021

Fixes #33007.
Closes #61311.

This PR uses the circuit breaker package introduced in #73641 to address
issue #33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an AmbiguousResultError, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within base.SlowRequestThreshold (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the existing
"slowness" detection (the informational "have been waiting ... for
proposal" warning in executeWriteBatch) was moved deeper into
replication (refreshProposalsLocked), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through executeWriteBatch) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not) refreshed
(#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
ProbeRequest (introduced in #72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is opt-in via the
kv.replica_circuit_breakers.enabled cluster setting while we determine
which of the many follow-up items to target for 22.1 (see #74705). For
example, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); #74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are UI work (#74713) and metrics (#74505).

The release note is deferred to #74705 (where breakers are turned on).

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from 8e40068 to e784823 Compare November 8, 2021 22:18
@tbg
Copy link
Member Author

tbg commented Nov 8, 2021

@cockroachdb/repl-prs this is still a draft but it is ready for a high-level review. There is a lot going on and I can easily pull 5+ pieces of this into separate PRs and am planning to do so. However, before I start that I would like to get both of you on board with the basic approach. I think we've sufficiently discussed the circuit breaker API and are happy with it; I will rewrite the internals to avoid all of the unsafe stuff so ignore the internals but give the API another look and voice any larger concerns. For the NoopWrite, there is an alternative where we don't bypass the lease. I think bypassing the lease has worked out nicely and keeps NoopWrite simple and possibly reusable, but on the other hand the probe may pass but for some unknown reason a lease may never be acquired. My preference is to keep NoopWrite as is but make the probe use two stages, where it issues the noop write first and a lease-requiring write second but possibly this is a bad idea and I need to sleep over it and talk it through.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from e784823 to 8d559f0 Compare November 9, 2021 11:39
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 5 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)


pkg/util/circuit/circuitbreaker.go, line 28 at r1 (raw file):

// from Breaker.Err(), i.e. `errors.Is(err, ErrBreakerOpen()) can be
// used to check whether an error originated from some Breaker.
func ErrBreakerOpen() error {

not: why not simply export the variable?


pkg/util/circuit/circuitbreaker.go, line 56 at r1 (raw file):

	Name         string
	ShouldTrip   func(err error) error
	AsyncProbe   func(report func(error), done func())

Why does this need to take a done closure? Since the probe is expected to keep running, can't we just clean up when the closure returns?

Ah, I suppose it's because it's the probe's responsibility to spawn the async task. Is there a reason for that, i.e. why can't the prober spawn the task? The only reason I can see is for the caller to pass in its context, but that seems like an antipattern anyway?

@tbg tbg force-pushed the circ-break-unavailable-ranges branch 2 times, most recently from 0215e1b to dbd4ce3 Compare November 18, 2021 14:16
tbg added a commit to tbg/cockroach that referenced this pull request Dec 9, 2021
This defaults to `base.SlowRequestThreshold` but is customizable, which
will be helpful for testing cockroachdb#71806.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Dec 9, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Release note: None
@tbg tbg force-pushed the circ-break-unavailable-ranges branch from dbd4ce3 to 85486f2 Compare December 9, 2021 17:12
craig bot pushed a commit that referenced this pull request Dec 10, 2021
73639: kvserver: introduce sCfg.SlowReplicationThreshold r=erikgrinaker a=tbg

This defaults to `base.SlowRequestThreshold` but is customizable, which
will be helpful for testing #71806.

Release note: None


Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
tbg added a commit to tbg/cockroach that referenced this pull request Dec 10, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Touches cockroachdb#33007.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Dec 14, 2021
These are initially for use in cockroachdb#71806 but were originally conceived of
for cockroachdb#70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR cockroachdb#71806 and
PR cockroachdb#70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Touches cockroachdb#33007.

Release note: None
craig bot pushed a commit that referenced this pull request Dec 14, 2021
73362: kv: don't unquiesce uninitialized replicas r=tbg a=nvanbenschoten

In a [support issue](https://github.com/cockroachlabs/support/issues/1340), we saw that 10s of thousands of uninitialized replicas were being ticked regularly and creating a large amount of background work on a node, driving up CPU. This commit updates the Raft quiescence logic to disallow uninitialized replicas from being unquiesced and Tick()'ing themselves.

Keeping uninitialized replicas quiesced even in the presence of Raft traffic avoids wasted work. We could Tick() these replicas, but doing so is unnecessary because uninitialized replicas can never win elections, so there is no reason for them to ever call an election. In fact, uninitialized replicas do not even know who their peers are, so there would be no way for them to call an election or for them to send any other non-reactive message. As a result, all work performed by an uninitialized replica is reactive and in response to incoming messages (see `processRequestQueue`).

There are multiple ways for an uninitialized replica to be created and then abandoned, and we don't do a good job garbage collecting them at a later point (see #73424), so it is important that they are cheap. Keeping them quiesced instead of letting them unquiesce and tick every 200ms indefinitely avoids a meaningful amount of periodic work for each uninitialized replica.

Release notes (bug fix): uninitialized replicas that are abandoned after an unsuccessful snapshot no longer perform periodic background work, so they no longer have a non-negligible cost.

73641: circuit: add probing-based circuit breaker r=erikgrinaker a=tbg

These are initially for use in #71806 but were originally conceived of
for #70485, which we are not currently prioritizing.

Importantly, this circuit breaker does not recruit a fraction of
requests to do the probing, which is desirable for both PR #71806 and
PR #70485; requests recruited as probes tend to incur high latency and
errors, and we don't want SQL client traffic to experience those.

Release note: None


73718: kv: pass roachpb.Header by pointer to DeclareKeysFunc r=nvanbenschoten a=nvanbenschoten

The `roachpb.Header` struct is up to 160 bytes in size. That's a little too
large to be passing by value repeatedly when doing so is easy to avoid. This
commit switches to passing roachpb.Header structs by pointer through the
DeclareKeysFunc implementations.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
@tbg tbg force-pushed the circ-break-unavailable-ranges branch from 85486f2 to ba48f8b Compare December 23, 2021 16:17
@tbg
Copy link
Member Author

tbg commented Dec 23, 2021

Spent some time on this again today. While putting together an end-to-end test (lose quorum, try to write and see the failfast) I realized that we do need to make all parts of the read/write pipeline aware of the circuit breaker to avoid getting stuck. I knew that we needed lease proposals to be sensitive to the breaker (so that you wouldn't get stuck trying to acquire a lease over and over when there wasn't a valid lease to begin with) but I wasn't aware of a much more basic problem: that when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing. I've hacked something together in the current version of the PR (as pushed right now) which I can polish to something that should get the job done for v1. Down the road, it would be interesting to explore #66884 as an ingredient here but I don't want this to become a prerequisite.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from ba48f8b to d6c7cc6 Compare January 3, 2022 16:21
@@ -119,6 +120,15 @@ func TestReplicaCircuitBreaker(t *testing.T) {
tc.RequireIsNotLeaseholderError(t, tc.Read(n2))
tc.RequireIsNotLeaseholderError(t, tc.Write(n2))
})

runCircuitBreakerTest(t, "leaseholder-unavailable", func(t *testing.T, ctx context.Context, tc *circuitBreakerTest) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go into the next commit, "wip: add leaseholder-unavailable test".

@erikgrinaker
Copy link
Contributor

when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing

Ugh, this is somewhat annoying. Conceptually, cancelling the context in Send() when the breaker trips makes sense -- it's essentially what we want to happen. But the bookkeeping gets a bit unwieldy, as you point out. It'd be nice if we could inject a cancel trigger into the context somehow, without needing to keep track of the individual cancel functions.

I'll have to remind myself of how the latch guards are managed, but if the latches turn out to be the only problem here, is it terrible to come up with a solution at/below the Raft level that causes us to abandon proposals/commands and release latches without having to do context management higher up? Although I realize that the context approach generalizes better across failure modes.

@tbg tbg force-pushed the circ-break-unavailable-ranges branch from d6c7cc6 to 1671b8d Compare January 4, 2022 16:29
@tbg
Copy link
Member Author

tbg commented Jan 5, 2022

Discussed this with Erik yesterday. Despite the potential perf overhead we decided that it was an important property that requests get canceled reliably on breaker tripping. This means making all ctxs cancelable and stashing the cancels. If we don't do that, we can sort of reliably refuse requests once the breaker has solidly tripped, but while the tripping is being detected, some requests might sneak in and get stuck on the latch manager (or really anywhere else that doesn't have a dedicated breaker check). We didn't want to plumb the breaker signal into the latch manager, as context cancellation is the "Go way" of doing what we're trying to do. As we have a few times before, we were dissatisfied with what the Go ecosystem provides in terms of joining multiple cancellation chains in a performant way; there are always more allocations involved than seems reasonable (related: we don't use Stopper.WithCancelOnQuiesce liberally because it's also expensive-ish).

We are abandoning our initial ambition to let reads that can be served succeed: this is just hard to build correctly. The main reason for reads to get stuck (no lease) is handled adequately by only circuit breaking on writes (lease acquisition is a write), but it's trickier if there is a stuck in-flight write that then prevents reads from getting their latches. We'd basically have to tell the latch manager to fail-fast a read but only if it is transitively waiting on a write's latch (write latches have to stick around on unavailable ranges, even if the client has long been fail-fasted). The "transitively" here is not something the latch manager is even concerned with today, see #73916 for a related discussion.

so:

  • early in Replica.Send, check the breaker, make context cancelable and stash the cancel func somewhere on Replica
  • on break, have the probe cancel everyone.

nicktrav pushed a commit that referenced this pull request Jan 28, 2022
On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that
of the proposal. As of #71806, this context is canceled right after the
corresponding proposal is signaled (and the client goroutine returns
from `sendWithRangeID`). This effectively prevents most consistency
checks from succeeding (they previously were not affected by
higher-level cancellation because the consistency check is triggered
from a local queue that talks directly to the replica, i.e. had
something like a minutes-long timeout).

This caused disastrous behavior in the `clearrange` suite of roachtests.
That test imports a large table. After the import, most ranges have
estimates (due to the ctx cancellation preventing the consistency
checks, which as a byproduct trigger stats adjustments) and their stats
claim that they are very small. Before recent PR #74674, `ClearRange` on
such ranges would use individual point deletions instead of the much
more efficient pebble range deletions, effectively writing a lot of data
and running the nodes out of disk.

Failures of `clearrange` with #74674 were also observed, but they did
not involve out-of-disk situations, so are possibly an alternative
failure mode (that may still be related to the newly introduced presence
of context cancellation).

Touches #68303.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jan 31, 2022
…egistry

PR cockroachdb#71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (cockroachdb#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (cockroachdb#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches cockroachdb#33007.
Fixes cockroachdb#74707.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Feb 4, 2022
…egistry

PR cockroachdb#71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (cockroachdb#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (cockroachdb#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches cockroachdb#33007.
Fixes cockroachdb#74707.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 4, 2022
75224: kvserver: replace circuit breaker cancel goroutine with per-Replica registry r=erikgrinaker a=tbg

PR #71806 originally landed with a very simple solution for the problem
of joining the breaker's cancellation chain with that of the incoming
context: spawning a goroutine. This was never meant to be the solution
in prod (spawning goroutines is not cheap) and this PR switches to a
real solution (which may still have soom room for improvement).

Replica circuit breakers are still off (#74705) and so this change
should not change any behavior, unless breakers are turned on. The
provided benchmarks show both the "end-to-end" `Replica.Send` as well as
the micro-bench `breaker.{R,Un}egister` perf and contrast it with that
of having the breakers disabled.

This sets the stage for evaluating and, if necessary, minimizing the
overhead, which (along with some additional end-to-end testing) then
allows us to turn breakers on by default (#74705).

`BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of
an no-result single-point read. We see that we lose a few percent,
though this benchmark is pretty micro already.

`BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring
the overhead of making a cancel, and calling `Register()` followed by
`Unregister()`. The "old" measurement is essentially just creating a
cancelable context (so the +XXXX% numbers don't mean a lot; we're doing
work for which there was previously no analog). Here we can clearly see
that sharding the mutex map can help, though note that this would
already only be visible at the `Replica.Send` level at much higher
concurrencies than in this test (16).

I also ran the kv95 roachtest with what corresponds to the `mutexmap-1`
strategy above (i.e. a single mutex); [raw data here]. In this
configuration (relative to disabling breakers) `kv95/enc=false/nodes=1`
and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps,
median over five runs). Once we've chosen a suitable shard count, I
expect that to drop a bit as well, and think we can turn on breakers in
production, at least from a performance point of view.

[raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e

old = disabled, new = enabled:

```
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.63µs ± 4%    1.77µs ± 5%     +8.38%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.66µs ± 4%    1.75µs ± 5%     +5.74%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.63µs ± 6%    1.75µs ± 4%     +7.48%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.58µs ± 7%    1.73µs ± 4%     +9.47%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.57µs ± 4%    1.62µs ± 6%     +3.19%  (p=0.046 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.56µs ± 5%    1.63µs ± 7%     +4.20%  (p=0.007 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.53µs ± 5%    1.62µs ± 8%     +6.01%  (p=0.001 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.55µs ± 7%    1.61µs ± 5%     +3.50%  (p=0.049 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.57µs ± 7%    1.62µs ± 6%     +3.53%  (p=0.042 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.53µs ± 7%    1.64µs ± 5%     +6.84%  (p=0.000 n=11+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16     1.42kB ± 0%    1.44kB ± 0%     +1.25%  (p=0.000 n=7+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16     1.42kB ± 0%    1.44kB ± 0%     +1.19%  (p=0.000 n=9+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16     1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16     1.42kB ± 0%    1.43kB ± 0%     +1.24%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16    1.42kB ± 0%    1.43kB ± 0%     +1.08%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16    1.42kB ± 0%    1.43kB ± 0%     +1.14%  (p=0.000 n=10+10)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16    1.42kB ± 0%    1.43kB ± 0%     +1.12%  (p=0.000 n=10+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16    1.42kB ± 0%    1.43kB ± 0%     +1.09%  (p=0.000 n=11+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16    1.41kB ± 0%    1.43kB ± 0%     +1.13%  (p=0.000 n=9+11)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16    1.41kB ± 0%    1.43kB ± 0%     +1.23%  (p=0.000 n=9+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16       11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16      11.0 ± 0%      12.0 ± 0%     +9.09%  (p=0.000 n=11+12)
```

```
ReplicaCircuitBreaker_Register/X/mutexmap-1-16        30.8ns ±21%   787.6ns ± 2%  +2457.74%  (p=0.000 n=11+11)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16        31.6ns ± 3%   782.0ns ± 2%  +2376.38%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16        31.0ns ± 0%   778.8ns ± 2%  +2409.61%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16        31.0ns ± 0%   775.8ns ± 1%  +2403.11%  (p=0.000 n=10+10)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16       31.0ns ± 1%   288.8ns ± 2%   +833.06%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16       31.1ns ± 1%   324.6ns ± 4%   +945.05%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16       31.1ns ± 0%   193.3ns ± 2%   +522.26%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16       31.0ns ± 0%   286.1ns ± 1%   +822.80%  (p=0.000 n=9+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16       31.0ns ± 0%   194.4ns ± 2%   +527.12%  (p=0.000 n=10+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16       31.0ns ± 1%   121.3ns ± 2%   +291.01%  (p=0.000 n=10+12)

name                                                old alloc/op   new alloc/op   delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16         80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16        80.0B ± 0%     96.0B ± 0%    +20.00%  (p=0.000 n=11+12)

name                                                old allocs/op  new allocs/op  delta
ReplicaCircuitBreaker_Register/X/mutexmap-1-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-2-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-4-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-8-16          2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-12-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-16-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-20-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-24-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-32-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
ReplicaCircuitBreaker_Register/X/mutexmap-64-16         2.00 ± 0%      3.00 ± 0%    +50.00%  (p=0.000 n=11+12)
```

Created via:

```
$ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt
$ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true'
```

For the record, I also tried a `sync.Map`, but it was vastly inferior
for this use case as it is allocation-heavy and its writes are fairly
slow.

Touches #33007.
Fixes #74707.

Release note: None

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
tbg added a commit to tbg/cockroach that referenced this pull request Mar 2, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
tbg added a commit to tbg/cockroach that referenced this pull request Mar 3, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
tbg added a commit to tbg/cockroach that referenced this pull request Mar 3, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
tbg added a commit to tbg/cockroach that referenced this pull request Mar 3, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
tbg added a commit to tbg/cockroach that referenced this pull request Mar 3, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
tbg added a commit to tbg/cockroach that referenced this pull request Mar 3, 2022
This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica
circuit breakers (cockroachdb#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

@nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses cockroachdb#74799.

Release note: None
Release justification: 22.1 project work
craig bot pushed a commit that referenced this pull request Mar 3, 2022
76858: kvserver: allow circuit-breaker to serve reads r=erikgrinaker a=tbg

This commit revamps an earlier implementation (#71806) of per-Replica
circuit breakers (#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

`@nvanbenschoten` suggested in #74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses #74799.

Release note: None
Release justification: 22.1 project work

77221: changefeedccl: Fix flaky test. r=miretskiy a=miretskiy

Fix flaky TestChangefeedHandlesDrainingNodes test.
The source of the flake was that cluster setting updates propagate
asynchronously to the other nodes in the cluster.  Thus, it was possible
for the test to flake because some of the nodes were observing the
old value for the setting.

The flake is fixed by introducing testing utility function that
sets the setting and ensures the setting propagates to all nodes in
the test cluster.

Fixes #76806

Release Notes: none

77317: util/log: remove Safe in favor of redact.Safe r=yuzefovich a=yuzefovich

My desire to make this change is to break the dependency of `treewindow`
on `util/log` (which is a part of the effort to clean up the
dependencies of `execgen`).

Addresses: #77234.

Release note: None

Release justification: low risk change to clean up the dependencies
a bit.

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

kvserver: circuit-break requests to unavailable ranges
3 participants