kvserver: circuit-break requests to unavailable ranges #71806

tbg · 2021-10-21T12:04:34Z

This PR uses the circuit breaker package introduced in #73641 to address
issue #33007: When a replica becomes unavailable, it should eagerly
refuse traffic that it believes would simply hang.

Concretely, every request to the Replica gets a cancelable context that
is sensitive to the circuit breaker tripping (relying on context
cancellation makes sure we don't end up with a second channel that needs
to be plumbed to everyone and their dog; Go makes it really difficult to
join two cancellation chains); if the breaker is tripped when the
request arrives, it is refused right away. In either case, the outgoing
error is augmented to carry information about the tripped breaker. This
isn't 100% trivial, since retreating from in-flight replication
typically causes an AmbiguousResultError, and while we could avoid it
in some cases we can't in all. A similar problem occurs when the
canceled request was waiting on a lease, in which case the result is a
NotLeaseholderError.

For any request that made it in, if it enters replication but does not
manage to succeed within base.SlowRequestThreshold (15s at time of
writing), the breaker is tripped, cancelling all inflight and future
requests until the breaker heals. Perhaps surprisingly, the existing
"slowness" detection (the informational "have been waiting ... for
proposal" warning in executeWriteBatch) was moved deeper into
replication (refreshProposalsLocked), where it now trips the breaker.
This has the advantage of providing a unified path for lease requests
(which don't go through executeWriteBatch) and pipelined writes (which
return before waiting on the inflight replication process). To make this
work, we need to pick a little fight with how leases are (not) refreshed
(#74711) and we need to remember the ticks at which a proposal was first
inserted (despite potential reproposals).

A tripped breaker deploys a background probe which sends a
ProbeRequest (introduced in #72972) to determine when to heal; this is
roughly the case whenever replicating a command through Raft is possible
for the Replica, either by appending to the log as the Raft leader, or
by forwarding to the Raft leader. A tiny bit of special casing allows
requests made by the probe to bypass the breaker.

As of this PR, all of this is opt-in via the
kv.replica_circuit_breakers.enabled cluster setting while we determine
which of the many follow-up items to target for 22.1 (see #74705). For
example, the breaker-sensitive context cancelation is a toy
implementation that comes with too much of a performance overhead (one
extra goroutine per request); #74707 will address that.

Other things not done here that we certainly want in the 22.1 release
are UI work (#74713) and metrics (#74505).

The release note is deferred to #74705 (where breakers are turned on).

Release note: None

cockroach-teamcity · 2021-10-21T12:04:42Z

This change is

tbg · 2021-11-08T22:24:05Z

@cockroachdb/repl-prs this is still a draft but it is ready for a high-level review. There is a lot going on and I can easily pull 5+ pieces of this into separate PRs and am planning to do so. However, before I start that I would like to get both of you on board with the basic approach. I think we've sufficiently discussed the circuit breaker API and are happy with it; I will rewrite the internals to avoid all of the unsafe stuff so ignore the internals but give the API another look and voice any larger concerns. For the NoopWrite, there is an alternative where we don't bypass the lease. I think bypassing the lease has worked out nicely and keeps NoopWrite simple and possibly reusable, but on the other hand the probe may pass but for some unknown reason a lease may never be acquired. My preference is to keep NoopWrite as is but make the probe use two stages, where it issues the noop write first and a lease-requiring write second but possibly this is a bad idea and I need to sleep over it and talk it through.

erikgrinaker

Reviewed 5 of 5 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @erikgrinaker and @tbg)

pkg/util/circuit/circuitbreaker.go, line 28 at r1 (raw file):

// from Breaker.Err(), i.e. `errors.Is(err, ErrBreakerOpen()) can be
// used to check whether an error originated from some Breaker.
func ErrBreakerOpen() error {

not: why not simply export the variable?

pkg/util/circuit/circuitbreaker.go, line 56 at r1 (raw file):

	Name         string
	ShouldTrip   func(err error) error
	AsyncProbe   func(report func(error), done func())

Why does this need to take a done closure? Since the probe is expected to keep running, can't we just clean up when the closure returns?

Ah, I suppose it's because it's the probe's responsibility to spawn the async task. Is there a reason for that, i.e. why can't the prober spawn the task? The only reason I can see is for the caller to pass in its context, but that seems like an antipattern anyway?

This defaults to `base.SlowRequestThreshold` but is customizable, which will be helpful for testing cockroachdb#71806. Release note: None

These are initially for use in cockroachdb#71806 but were originally conceived of for cockroachdb#70485, which we are not currently prioritizing. Importantly, this circuit breaker does not recruit a fraction of requests to do the probing, which is desirable for both PR cockroachdb#71806 and PR cockroachdb#70485; requests recruited as probes tend to incur high latency and errors, and we don't want SQL client traffic to experience those. Release note: None

73639: kvserver: introduce sCfg.SlowReplicationThreshold r=erikgrinaker a=tbg This defaults to `base.SlowRequestThreshold` but is customizable, which will be helpful for testing #71806. Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

These are initially for use in cockroachdb#71806 but were originally conceived of for cockroachdb#70485, which we are not currently prioritizing. Importantly, this circuit breaker does not recruit a fraction of requests to do the probing, which is desirable for both PR cockroachdb#71806 and PR cockroachdb#70485; requests recruited as probes tend to incur high latency and errors, and we don't want SQL client traffic to experience those. Touches cockroachdb#33007. Release note: None

73362: kv: don't unquiesce uninitialized replicas r=tbg a=nvanbenschoten In a [support issue](https://github.com/cockroachlabs/support/issues/1340), we saw that 10s of thousands of uninitialized replicas were being ticked regularly and creating a large amount of background work on a node, driving up CPU. This commit updates the Raft quiescence logic to disallow uninitialized replicas from being unquiesced and Tick()'ing themselves. Keeping uninitialized replicas quiesced even in the presence of Raft traffic avoids wasted work. We could Tick() these replicas, but doing so is unnecessary because uninitialized replicas can never win elections, so there is no reason for them to ever call an election. In fact, uninitialized replicas do not even know who their peers are, so there would be no way for them to call an election or for them to send any other non-reactive message. As a result, all work performed by an uninitialized replica is reactive and in response to incoming messages (see `processRequestQueue`). There are multiple ways for an uninitialized replica to be created and then abandoned, and we don't do a good job garbage collecting them at a later point (see #73424), so it is important that they are cheap. Keeping them quiesced instead of letting them unquiesce and tick every 200ms indefinitely avoids a meaningful amount of periodic work for each uninitialized replica. Release notes (bug fix): uninitialized replicas that are abandoned after an unsuccessful snapshot no longer perform periodic background work, so they no longer have a non-negligible cost. 73641: circuit: add probing-based circuit breaker r=erikgrinaker a=tbg These are initially for use in #71806 but were originally conceived of for #70485, which we are not currently prioritizing. Importantly, this circuit breaker does not recruit a fraction of requests to do the probing, which is desirable for both PR #71806 and PR #70485; requests recruited as probes tend to incur high latency and errors, and we don't want SQL client traffic to experience those. Release note: None 73718: kv: pass roachpb.Header by pointer to DeclareKeysFunc r=nvanbenschoten a=nvanbenschoten The `roachpb.Header` struct is up to 160 bytes in size. That's a little too large to be passing by value repeatedly when doing so is easy to avoid. This commit switches to passing roachpb.Header structs by pointer through the DeclareKeysFunc implementations. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

tbg · 2021-12-23T16:20:53Z

Spent some time on this again today. While putting together an end-to-end test (lose quorum, try to write and see the failfast) I realized that we do need to make all parts of the read/write pipeline aware of the circuit breaker to avoid getting stuck. I knew that we needed lease proposals to be sensitive to the breaker (so that you wouldn't get stuck trying to acquire a lease over and over when there wasn't a valid lease to begin with) but I wasn't aware of a much more basic problem: that when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing. I've hacked something together in the current version of the PR (as pushed right now) which I can polish to something that should get the job done for v1. Down the road, it would be interesting to explore #66884 as an ingredient here but I don't want this to become a prerequisite.

erikgrinaker · 2022-01-04T10:57:07Z

pkg/kv/kvserver/client_replica_circuit_breaker_test.go

@@ -119,6 +120,15 @@ func TestReplicaCircuitBreaker(t *testing.T) {
 		tc.RequireIsNotLeaseholderError(t, tc.Read(n2))
 		tc.RequireIsNotLeaseholderError(t, tc.Write(n2))
 	})
+
+	runCircuitBreakerTest(t, "leaseholder-unavailable", func(t *testing.T, ctx context.Context, tc *circuitBreakerTest) {


This should probably go into the next commit, "wip: add leaseholder-unavailable test".

erikgrinaker · 2022-01-04T11:34:42Z

when one command fails fast after having proposed a command, the latches stick around. They have to; but this means that the next command will get stuck on the latches. So we need to short-circuit the latch acquisition as well when the breaker trips. Of course we don't want to plumb the circuit breaker everywhere, you'd really want to rely on context.Context for that sort of thing

Ugh, this is somewhat annoying. Conceptually, cancelling the context in Send() when the breaker trips makes sense -- it's essentially what we want to happen. But the bookkeeping gets a bit unwieldy, as you point out. It'd be nice if we could inject a cancel trigger into the context somehow, without needing to keep track of the individual cancel functions.

I'll have to remind myself of how the latch guards are managed, but if the latches turn out to be the only problem here, is it terrible to come up with a solution at/below the Raft level that causes us to abandon proposals/commands and release latches without having to do context management higher up? Although I realize that the context approach generalizes better across failure modes.

tbg · 2022-01-05T11:17:11Z

Discussed this with Erik yesterday. Despite the potential perf overhead we decided that it was an important property that requests get canceled reliably on breaker tripping. This means making all ctxs cancelable and stashing the cancels. If we don't do that, we can sort of reliably refuse requests once the breaker has solidly tripped, but while the tripping is being detected, some requests might sneak in and get stuck on the latch manager (or really anywhere else that doesn't have a dedicated breaker check). We didn't want to plumb the breaker signal into the latch manager, as context cancellation is the "Go way" of doing what we're trying to do. As we have a few times before, we were dissatisfied with what the Go ecosystem provides in terms of joining multiple cancellation chains in a performant way; there are always more allocations involved than seems reasonable (related: we don't use Stopper.WithCancelOnQuiesce liberally because it's also expensive-ish).

We are abandoning our initial ambition to let reads that can be served succeed: this is just hard to build correctly. The main reason for reads to get stuck (no lease) is handled adequately by only circuit breaking on writes (lease acquisition is a write), but it's trickier if there is a stuck in-flight write that then prevents reads from getting their latches. We'd basically have to tell the latch manager to fail-fast a read but only if it is transitively waiting on a write's latch (write latches have to stick around on unavailable ranges, even if the client has long been fail-fasted). The "transitively" here is not something the latch manager is even concerned with today, see #73916 for a related discussion.

so:

early in Replica.Send, check the breaker, make context cancelable and stash the cancel func somewhere on Replica
on break, have the probe cancel everyone.

On the leaseholder, `ctx` passed to `computeChecksumPostApply` is that of the proposal. As of #71806, this context is canceled right after the corresponding proposal is signaled (and the client goroutine returns from `sendWithRangeID`). This effectively prevents most consistency checks from succeeding (they previously were not affected by higher-level cancellation because the consistency check is triggered from a local queue that talks directly to the replica, i.e. had something like a minutes-long timeout). This caused disastrous behavior in the `clearrange` suite of roachtests. That test imports a large table. After the import, most ranges have estimates (due to the ctx cancellation preventing the consistency checks, which as a byproduct trigger stats adjustments) and their stats claim that they are very small. Before recent PR #74674, `ClearRange` on such ranges would use individual point deletions instead of the much more efficient pebble range deletions, effectively writing a lot of data and running the nodes out of disk. Failures of `clearrange` with #74674 were also observed, but they did not involve out-of-disk situations, so are possibly an alternative failure mode (that may still be related to the newly introduced presence of context cancellation). Touches #68303. Release note: None

…egistry PR cockroachdb#71806 originally landed with a very simple solution for the problem of joining the breaker's cancellation chain with that of the incoming context: spawning a goroutine. This was never meant to be the solution in prod (spawning goroutines is not cheap) and this PR switches to a real solution (which may still have soom room for improvement). Replica circuit breakers are still off (cockroachdb#74705) and so this change should not change any behavior, unless breakers are turned on. The provided benchmarks show both the "end-to-end" `Replica.Send` as well as the micro-bench `breaker.{R,Un}egister` perf and contrast it with that of having the breakers disabled. This sets the stage for evaluating and, if necessary, minimizing the overhead, which (along with some additional end-to-end testing) then allows us to turn breakers on by default (cockroachdb#74705). `BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of an no-result single-point read. We see that we lose a few percent, though this benchmark is pretty micro already. `BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring the overhead of making a cancel, and calling `Register()` followed by `Unregister()`. The "old" measurement is essentially just creating a cancelable context (so the +XXXX% numbers don't mean a lot; we're doing work for which there was previously no analog). Here we can clearly see that sharding the mutex map can help, though note that this would already only be visible at the `Replica.Send` level at much higher concurrencies than in this test (16). I also ran the kv95 roachtest with what corresponds to the `mutexmap-1` strategy above (i.e. a single mutex); [raw data here]. In this configuration (relative to disabling breakers) `kv95/enc=false/nodes=1` and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps, median over five runs). Once we've chosen a suitable shard count, I expect that to drop a bit as well, and think we can turn on breakers in production, at least from a performance point of view. [raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e old = disabled, new = enabled: ``` ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 1.63µs ± 4% 1.77µs ± 5% +8.38% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 1.66µs ± 4% 1.75µs ± 5% +5.74% (p=0.000 n=10+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 1.63µs ± 6% 1.75µs ± 4% +7.48% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 1.58µs ± 7% 1.73µs ± 4% +9.47% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 1.57µs ± 4% 1.62µs ± 6% +3.19% (p=0.046 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 1.56µs ± 5% 1.63µs ± 7% +4.20% (p=0.007 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 1.53µs ± 5% 1.62µs ± 8% +6.01% (p=0.001 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 1.55µs ± 7% 1.61µs ± 5% +3.50% (p=0.049 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 1.57µs ± 7% 1.62µs ± 6% +3.53% (p=0.042 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 1.53µs ± 7% 1.64µs ± 5% +6.84% (p=0.000 n=11+12) name old alloc/op new alloc/op delta ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 1.42kB ± 0% 1.44kB ± 0% +1.25% (p=0.000 n=7+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 1.42kB ± 0% 1.44kB ± 0% +1.19% (p=0.000 n=9+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 1.42kB ± 0% 1.43kB ± 0% +1.14% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 1.42kB ± 0% 1.43kB ± 0% +1.24% (p=0.000 n=9+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 1.42kB ± 0% 1.43kB ± 0% +1.08% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 1.42kB ± 0% 1.43kB ± 0% +1.14% (p=0.000 n=10+10) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 1.42kB ± 0% 1.43kB ± 0% +1.12% (p=0.000 n=10+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 1.42kB ± 0% 1.43kB ± 0% +1.09% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 1.41kB ± 0% 1.43kB ± 0% +1.13% (p=0.000 n=9+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 1.41kB ± 0% 1.43kB ± 0% +1.23% (p=0.000 n=9+12) name old allocs/op new allocs/op delta ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ``` ``` ReplicaCircuitBreaker_Register/X/mutexmap-1-16 30.8ns ±21% 787.6ns ± 2% +2457.74% (p=0.000 n=11+11) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 31.6ns ± 3% 782.0ns ± 2% +2376.38% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 31.0ns ± 0% 778.8ns ± 2% +2409.61% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 31.0ns ± 0% 775.8ns ± 1% +2403.11% (p=0.000 n=10+10) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 31.0ns ± 1% 288.8ns ± 2% +833.06% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 31.1ns ± 1% 324.6ns ± 4% +945.05% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 31.1ns ± 0% 193.3ns ± 2% +522.26% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 31.0ns ± 0% 286.1ns ± 1% +822.80% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 31.0ns ± 0% 194.4ns ± 2% +527.12% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 31.0ns ± 1% 121.3ns ± 2% +291.01% (p=0.000 n=10+12) name old alloc/op new alloc/op delta ReplicaCircuitBreaker_Register/X/mutexmap-1-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) name old allocs/op new allocs/op delta ReplicaCircuitBreaker_Register/X/mutexmap-1-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ``` Created via: ``` $ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt $ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true' ``` For the record, I also tried a `sync.Map`, but it was vastly inferior for this use case as it is allocation-heavy and its writes are fairly slow. Touches cockroachdb#33007. Fixes cockroachdb#74707. Release note: None

75224: kvserver: replace circuit breaker cancel goroutine with per-Replica registry r=erikgrinaker a=tbg PR #71806 originally landed with a very simple solution for the problem of joining the breaker's cancellation chain with that of the incoming context: spawning a goroutine. This was never meant to be the solution in prod (spawning goroutines is not cheap) and this PR switches to a real solution (which may still have soom room for improvement). Replica circuit breakers are still off (#74705) and so this change should not change any behavior, unless breakers are turned on. The provided benchmarks show both the "end-to-end" `Replica.Send` as well as the micro-bench `breaker.{R,Un}egister` perf and contrast it with that of having the breakers disabled. This sets the stage for evaluating and, if necessary, minimizing the overhead, which (along with some additional end-to-end testing) then allows us to turn breakers on by default (#74705). `BenchmarkReplicaCircuitBreakerSendOverhead` measures `Replica.Send` of an no-result single-point read. We see that we lose a few percent, though this benchmark is pretty micro already. `BenchmarkReplicaCircuitBreaker_Register` is even more micro, measuring the overhead of making a cancel, and calling `Register()` followed by `Unregister()`. The "old" measurement is essentially just creating a cancelable context (so the +XXXX% numbers don't mean a lot; we're doing work for which there was previously no analog). Here we can clearly see that sharding the mutex map can help, though note that this would already only be visible at the `Replica.Send` level at much higher concurrencies than in this test (16). I also ran the kv95 roachtest with what corresponds to the `mutexmap-1` strategy above (i.e. a single mutex); [raw data here]. In this configuration (relative to disabling breakers) `kv95/enc=false/nodes=1` and `kv95/enc=false/nodes=1/cpu=32` sustained a -1.62% hit (in qps, median over five runs). Once we've chosen a suitable shard count, I expect that to drop a bit as well, and think we can turn on breakers in production, at least from a performance point of view. [raw data here]: https://gist.github.com/tbg/44b53a10d18490a6dabd57e1d1d3225e old = disabled, new = enabled: ``` ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 1.63µs ± 4% 1.77µs ± 5% +8.38% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 1.66µs ± 4% 1.75µs ± 5% +5.74% (p=0.000 n=10+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 1.63µs ± 6% 1.75µs ± 4% +7.48% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 1.58µs ± 7% 1.73µs ± 4% +9.47% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 1.57µs ± 4% 1.62µs ± 6% +3.19% (p=0.046 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 1.56µs ± 5% 1.63µs ± 7% +4.20% (p=0.007 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 1.53µs ± 5% 1.62µs ± 8% +6.01% (p=0.001 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 1.55µs ± 7% 1.61µs ± 5% +3.50% (p=0.049 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 1.57µs ± 7% 1.62µs ± 6% +3.53% (p=0.042 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 1.53µs ± 7% 1.64µs ± 5% +6.84% (p=0.000 n=11+12) name old alloc/op new alloc/op delta ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 1.42kB ± 0% 1.44kB ± 0% +1.25% (p=0.000 n=7+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 1.42kB ± 0% 1.44kB ± 0% +1.19% (p=0.000 n=9+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 1.42kB ± 0% 1.43kB ± 0% +1.14% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 1.42kB ± 0% 1.43kB ± 0% +1.24% (p=0.000 n=9+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 1.42kB ± 0% 1.43kB ± 0% +1.08% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 1.42kB ± 0% 1.43kB ± 0% +1.14% (p=0.000 n=10+10) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 1.42kB ± 0% 1.43kB ± 0% +1.12% (p=0.000 n=10+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 1.42kB ± 0% 1.43kB ± 0% +1.09% (p=0.000 n=11+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 1.41kB ± 0% 1.43kB ± 0% +1.13% (p=0.000 n=9+11) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 1.41kB ± 0% 1.43kB ± 0% +1.23% (p=0.000 n=9+12) name old allocs/op new allocs/op delta ReplicaCircuitBreakerSendOverhead/X/mutexmap-1-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-2-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-4-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-8-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-12-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-16-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-20-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-24-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-32-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ReplicaCircuitBreakerSendOverhead/X/mutexmap-64-16 11.0 ± 0% 12.0 ± 0% +9.09% (p=0.000 n=11+12) ``` ``` ReplicaCircuitBreaker_Register/X/mutexmap-1-16 30.8ns ±21% 787.6ns ± 2% +2457.74% (p=0.000 n=11+11) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 31.6ns ± 3% 782.0ns ± 2% +2376.38% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 31.0ns ± 0% 778.8ns ± 2% +2409.61% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 31.0ns ± 0% 775.8ns ± 1% +2403.11% (p=0.000 n=10+10) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 31.0ns ± 1% 288.8ns ± 2% +833.06% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 31.1ns ± 1% 324.6ns ± 4% +945.05% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 31.1ns ± 0% 193.3ns ± 2% +522.26% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 31.0ns ± 0% 286.1ns ± 1% +822.80% (p=0.000 n=9+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 31.0ns ± 0% 194.4ns ± 2% +527.12% (p=0.000 n=10+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 31.0ns ± 1% 121.3ns ± 2% +291.01% (p=0.000 n=10+12) name old alloc/op new alloc/op delta ReplicaCircuitBreaker_Register/X/mutexmap-1-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 80.0B ± 0% 96.0B ± 0% +20.00% (p=0.000 n=11+12) name old allocs/op new allocs/op delta ReplicaCircuitBreaker_Register/X/mutexmap-1-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-2-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-4-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-8-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-12-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-16-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-20-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-24-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-32-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ReplicaCircuitBreaker_Register/X/mutexmap-64-16 2.00 ± 0% 3.00 ± 0% +50.00% (p=0.000 n=11+12) ``` Created via: ``` $ ./dev bench ./pkg/kv/kvserver/ -f BenchmarkReplicaCircuitBreaker --count 10 | tee out.txt $ for f in enabled=true enabled=false; do grep -F "${f}" out.txt | sed "s/${f}/X/" > "${f}"; done && benchstat 'enabled=false' 'enabled=true' ``` For the record, I also tried a `sync.Map`, but it was vastly inferior for this use case as it is allocation-heavy and its writes are fairly slow. Touches #33007. Fixes #74707. Release note: None Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

@nvanbenschoten

This commit revamps an earlier implementation (cockroachdb#71806) of per-Replica circuit breakers (cockroachdb#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. @nvanbenschoten suggested in cockroachdb#74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses cockroachdb#74799. Release note: None Release justification: 22.1 project work

76858: kvserver: allow circuit-breaker to serve reads r=erikgrinaker a=tbg This commit revamps an earlier implementation (#71806) of per-Replica circuit breakers (#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. `@nvanbenschoten` suggested in #74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses #74799. Release note: None Release justification: 22.1 project work 77221: changefeedccl: Fix flaky test. r=miretskiy a=miretskiy Fix flaky TestChangefeedHandlesDrainingNodes test. The source of the flake was that cluster setting updates propagate asynchronously to the other nodes in the cluster. Thus, it was possible for the test to flake because some of the nodes were observing the old value for the setting. The flake is fixed by introducing testing utility function that sets the setting and ensures the setting propagates to all nodes in the test cluster. Fixes #76806 Release Notes: none 77317: util/log: remove Safe in favor of redact.Safe r=yuzefovich a=yuzefovich My desire to make this change is to break the dependency of `treewindow` on `util/log` (which is a part of the effort to clean up the dependencies of `execgen`). Addresses: #77234. Release note: None Release justification: low risk change to clean up the dependencies a bit. Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>

tbg force-pushed the circ-break-unavailable-ranges branch from 8e40068 to e784823 Compare November 8, 2021 22:18

tbg force-pushed the circ-break-unavailable-ranges branch from e784823 to 8d559f0 Compare November 9, 2021 11:39

erikgrinaker reviewed Nov 9, 2021

View reviewed changes

tbg force-pushed the circ-break-unavailable-ranges branch 2 times, most recently from 0215e1b to dbd4ce3 Compare November 18, 2021 14:16

sumeerbhola mentioned this pull request Nov 24, 2021

kv: internal operations, especially writes, should not set deadline #65191

Closed

This was referenced Dec 9, 2021

rpc: use async-probing based circuit breakers #70485

Closed

kvserver: introduce sCfg.SlowReplicationThreshold #73639

Merged

tbg added a commit to tbg/cockroach that referenced this pull request Dec 9, 2021

kvserver: introduce sCfg.SlowReplicationThreshold

f955c65

This defaults to `base.SlowRequestThreshold` but is customizable, which will be helpful for testing cockroachdb#71806. Release note: None

tbg mentioned this pull request Dec 9, 2021

circuit: add probing-based circuit breaker #73641

Merged

tbg force-pushed the circ-break-unavailable-ranges branch from dbd4ce3 to 85486f2 Compare December 9, 2021 17:12

tbg force-pushed the circ-break-unavailable-ranges branch from 85486f2 to ba48f8b Compare December 23, 2021 16:17

tbg force-pushed the circ-break-unavailable-ranges branch from ba48f8b to d6c7cc6 Compare January 3, 2022 16:21

erikgrinaker reviewed Jan 4, 2022

View reviewed changes

tbg force-pushed the circ-break-unavailable-ranges branch from d6c7cc6 to 1671b8d Compare January 4, 2022 16:29

tbg mentioned this pull request Jan 28, 2022

kvserver: replace circuit breaker cancel goroutine with per-Replica registry #75224

Merged

tbg mentioned this pull request Mar 2, 2022

kvserver: allow circuit-breaker to serve reads #76858

Merged

erikgrinaker mentioned this pull request Mar 8, 2022

kvserver,sql: allow queries to ignore circuit breaker errors #77473

Open

rmloveland mentioned this pull request Apr 26, 2022

Add docs on replica circuit breakers cockroachdb/docs#13676

Merged

exalate-issue-sync bot mentioned this pull request Jun 10, 2022

ui: add circuit breaker metrics on Replication metrics page cockroachdb/docs#12928

Closed

sumeerbhola mentioned this pull request Aug 15, 2022

roachtest: kv0/enc=false/nodes=1/size=64kb/conc=4096 failed [admission control] #72375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: circuit-break requests to unavailable ranges #71806

kvserver: circuit-break requests to unavailable ranges #71806

tbg commented Oct 21, 2021 •

edited

Loading

cockroach-teamcity commented Oct 21, 2021

tbg commented Nov 8, 2021

erikgrinaker left a comment

tbg commented Dec 23, 2021

erikgrinaker Jan 4, 2022

erikgrinaker commented Jan 4, 2022

tbg commented Jan 5, 2022 •

edited

Loading

kvserver: circuit-break requests to unavailable ranges #71806

kvserver: circuit-break requests to unavailable ranges #71806

Conversation

tbg commented Oct 21, 2021 • edited Loading

cockroach-teamcity commented Oct 21, 2021

tbg commented Nov 8, 2021

erikgrinaker left a comment

Choose a reason for hiding this comment

tbg commented Dec 23, 2021

erikgrinaker Jan 4, 2022

Choose a reason for hiding this comment

erikgrinaker commented Jan 4, 2022

tbg commented Jan 5, 2022 • edited Loading

tbg commented Oct 21, 2021 •

edited

Loading

tbg commented Jan 5, 2022 •

edited

Loading