-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/changefeedccl: TestChangefeedHandlesDrainingNodes failed #76806
Comments
ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 1a89121d67fa02d5c674766ea01d3c78455fd28b:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ a7449866f3229872d92e1de973e209838f07de35:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 8e7248296a2b8b2a4d07dd82678c687a02d56c4f:
Help
See also: How To Investigate a Go Test Failure (internal)
|
ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 573b3b05458667308bafeb9db1611832dca3025f:
Help
See also: How To Investigate a Go Test Failure (internal)
|
76858: kvserver: allow circuit-breaker to serve reads r=erikgrinaker a=tbg This commit revamps an earlier implementation (#71806) of per-Replica circuit breakers (#33007). The earlier implementation relied on context cancellation and coarsely failed all requests addressing the Replica when the breaker was tripped. This had two downsides: First, there was a (small) performance overhead for implementing the cancellation that was paid even in the common case of a healthy Replica. Second, and more importantly, the coarseness meant that we'd potentially fail many requests that would otherwise succeed, and in particular follower reads. `@nvanbenschoten` suggested in #74799 that latching could be extended with the concept of "poisoning" and that this could result in fine-grained circuit breaker behavior where only requests that are truly affected by unavailability (at the replication layer) would be rejected. This commit implements that strategy: A request's latches are poisoned if its completion is predicated on the replication layer being healthy. In other words, when the breaker trips, all inflight proposals have their latches poisoned and new proposals are failed fast. However, and this is the big difference, reads can potentially still be accepted in either of two scenarios: - a valid follower read remains valid regardless of the circuit breaker status, and also regardless of inflight proposals (follower reads don't observe latches). - a read that can be served under the current lease and which does not conflict with any of the stuck proposals in the replication layer (= poisoned latches) can also be served. In short, reads only fail fast if they encounter a poisoned latch or need to request a lease. (If they opted out of fail-fast behavior, they behave as today). Latch poisoning is added as a first-class concept in the `concurrency` package, and a structured error `PoisonError` is introduced. This error in particular contains the span and timestamp of the poisoned latch that prompted the fail-fast. Lease proposals now always use `poison.Policy_Wait`, leaving the fail-fast behavior to the caller. This simplifies things since multiple callers with their own `poison.Policy` can multiplex onto a single inflight lease proposal. Addresses #74799. Release note: None Release justification: 22.1 project work 77221: changefeedccl: Fix flaky test. r=miretskiy a=miretskiy Fix flaky TestChangefeedHandlesDrainingNodes test. The source of the flake was that cluster setting updates propagate asynchronously to the other nodes in the cluster. Thus, it was possible for the test to flake because some of the nodes were observing the old value for the setting. The flake is fixed by introducing testing utility function that sets the setting and ensures the setting propagates to all nodes in the test cluster. Fixes #76806 Release Notes: none 77317: util/log: remove Safe in favor of redact.Safe r=yuzefovich a=yuzefovich My desire to make this change is to break the dependency of `treewindow` on `util/log` (which is a part of the effort to clean up the dependencies of `execgen`). Addresses: #77234. Release note: None Release justification: low risk change to clean up the dependencies a bit. Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 8d0a1773a6b55f06c04ea77e6a39426e32ba9a63:
Help
See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:
This test on roachdash | Improve this report!
Jira issue: CRDB-13280
The text was updated successfully, but these errors were encountered: