Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ccl/changefeedccl: TestChangefeedHandlesDrainingNodes failed #76806

Closed
cockroach-teamcity opened this issue Feb 19, 2022 · 4 comments · Fixed by #77221
Closed

ccl/changefeedccl: TestChangefeedHandlesDrainingNodes failed #76806

cockroach-teamcity opened this issue Feb 19, 2022 · 4 comments · Fixed by #77221
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cdc

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Feb 19, 2022

ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 8d0a1773a6b55f06c04ea77e6a39426e32ba9a63:

=== RUN   TestChangefeedHandlesDrainingNodes
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/a77002d7c9453d7cd2d382f907780e13/logTestChangefeedHandlesDrainingNodes2686424424
    test_log_scope.go:80: use -show-logs to present logs inline
    changefeed_test.go:4427: pq: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/dev/change-data-capture.html#enable-rangefeeds-to-reduce-latency
    panic.go:642: -- test log scope end --
--- FAIL: TestChangefeedHandlesDrainingNodes (7.64s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

/cc @cockroachdb/cdc

This test on roachdash | Improve this report!

Jira issue: CRDB-13280

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Feb 19, 2022
@blathers-crl blathers-crl bot added the T-cdc label Feb 19, 2022
@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 1a89121d67fa02d5c674766ea01d3c78455fd28b:

=== RUN   TestChangefeedHandlesDrainingNodes
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/a77002d7c9453d7cd2d382f907780e13/logTestChangefeedHandlesDrainingNodes3361977065
    test_log_scope.go:80: use -show-logs to present logs inline
    changefeed_test.go:4427: pq: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/dev/change-data-capture.html#enable-rangefeeds-to-reduce-latency
    panic.go:642: -- test log scope end --
--- FAIL: TestChangefeedHandlesDrainingNodes (8.91s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ a7449866f3229872d92e1de973e209838f07de35:

=== RUN   TestChangefeedHandlesDrainingNodes
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/a77002d7c9453d7cd2d382f907780e13/logTestChangefeedHandlesDrainingNodes4055445209
    test_log_scope.go:80: use -show-logs to present logs inline
    changefeed_test.go:4489: pq: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/dev/change-data-capture.html#enable-rangefeeds-to-reduce-latency
    panic.go:642: -- test log scope end --
--- FAIL: TestChangefeedHandlesDrainingNodes (12.71s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 8e7248296a2b8b2a4d07dd82678c687a02d56c4f:

=== RUN   TestChangefeedHandlesDrainingNodes
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/a77002d7c9453d7cd2d382f907780e13/logTestChangefeedHandlesDrainingNodes781529958
    test_log_scope.go:80: use -show-logs to present logs inline
    changefeed_test.go:4490: pq: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/dev/change-data-capture.html#enable-rangefeeds-to-reduce-latency
    panic.go:642: -- test log scope end --
--- FAIL: TestChangefeedHandlesDrainingNodes (3.55s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

@cockroach-teamcity
Copy link
Member Author

ccl/changefeedccl.TestChangefeedHandlesDrainingNodes failed with artifacts on master @ 573b3b05458667308bafeb9db1611832dca3025f:

=== RUN   TestChangefeedHandlesDrainingNodes
    test_log_scope.go:79: test logs captured to: /artifacts/tmp/_tmp/a77002d7c9453d7cd2d382f907780e13/logTestChangefeedHandlesDrainingNodes157410923
    test_log_scope.go:80: use -show-logs to present logs inline
    changefeed_test.go:4490: pq: rangefeeds require the kv.rangefeed.enabled setting. See https://www.cockroachlabs.com/docs/dev/change-data-capture.html#enable-rangefeeds-to-reduce-latency
    panic.go:642: -- test log scope end --
--- FAIL: TestChangefeedHandlesDrainingNodes (3.57s)
Help

See also: How To Investigate a Go Test Failure (internal)
Parameters in this failure:

  • TAGS=bazel,gss

This test on roachdash | Improve this report!

craig bot pushed a commit that referenced this issue Mar 3, 2022
76858: kvserver: allow circuit-breaker to serve reads r=erikgrinaker a=tbg

This commit revamps an earlier implementation (#71806) of per-Replica
circuit breakers (#33007). The earlier implementation relied on context
cancellation and coarsely failed all requests addressing the Replica
when the breaker was tripped.

This had two downsides: First, there was a (small) performance overhead
for implementing the cancellation that was paid even in the common case
of a healthy Replica. Second, and more importantly, the coarseness meant
that we'd potentially fail many requests that would otherwise succeed,
and in particular follower reads.

`@nvanbenschoten` suggested in #74799 that latching could be extended with
the concept of "poisoning" and that this could result in fine-grained
circuit breaker behavior where only requests that are truly affected by
unavailability (at the replication layer) would be rejected.

This commit implements that strategy:

A request's latches are poisoned if its completion is predicated on the
replication layer being healthy.  In other words, when the breaker
trips, all inflight proposals have their latches poisoned and new
proposals are failed fast. However, and this is the big difference,
reads can potentially still be accepted in either of two scenarios:

- a valid follower read remains valid regardless of the circuit breaker
  status, and also regardless of inflight proposals (follower reads
  don't observe latches).
- a read that can be served under the current lease and which does
  not conflict with any of the stuck proposals in the replication
  layer (= poisoned latches) can also be served.

In short, reads only fail fast if they encounter a poisoned latch or
need to request a lease. (If they opted out of fail-fast behavior,
they behave as today).

Latch poisoning is added as a first-class concept in the `concurrency`
package, and a structured error `PoisonError` is introduced. This error
in particular contains the span and timestamp of the poisoned latch that
prompted the fail-fast.

Lease proposals now always use `poison.Policy_Wait`, leaving the
fail-fast behavior to the caller. This simplifies things since multiple
callers with their own `poison.Policy` can multiplex onto a single
inflight lease proposal.

Addresses #74799.

Release note: None
Release justification: 22.1 project work

77221: changefeedccl: Fix flaky test. r=miretskiy a=miretskiy

Fix flaky TestChangefeedHandlesDrainingNodes test.
The source of the flake was that cluster setting updates propagate
asynchronously to the other nodes in the cluster.  Thus, it was possible
for the test to flake because some of the nodes were observing the
old value for the setting.

The flake is fixed by introducing testing utility function that
sets the setting and ensures the setting propagates to all nodes in
the test cluster.

Fixes #76806

Release Notes: none

77317: util/log: remove Safe in favor of redact.Safe r=yuzefovich a=yuzefovich

My desire to make this change is to break the dependency of `treewindow`
on `util/log` (which is a part of the effort to clean up the
dependencies of `execgen`).

Addresses: #77234.

Release note: None

Release justification: low risk change to clean up the dependencies
a bit.

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in 8850dc1 Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. T-cdc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants