Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] QLStressTest.OldLeaderCatchUpAfterNetworkPartition flaky in master #3465

Closed
bmatican opened this issue Jan 28, 2020 · 3 comments
Closed
Assignees
Labels
area/docdb YugabyteDB core features kind/failing-test Tests and testing infra

Comments

@bmatican
Copy link
Contributor

bmatican commented Jan 28, 2020

https://detective-gcp.dev.yugabyte.com/job/github-yugabyte-db-phabricator/23596/artifact/build/debug-clang-dynamic-ninja/yb-test-logs/tests-client__ql-stress-test/QLStressTest_OldLeaderCatchUpAfterNetworkPartition.log
 

src/yb/client/ql-stress-test.cc:673: Failure
1140 | Expected: pre_isolate_op_id.term
1141 | Which is: 2
1142 | To be equal to: 1
 ```

https://detective-gcp.dev.yugabyte.com/job/github-yugabyte-db-mac-master-clang-release/787/artifact/build/release-clang-dynamic-ninja/yb-test-logs/tests-client__ql-stress-test/QLStressTest_OldLeaderCatchUpAfterNetworkPartition.log

Expected: (pre_isolate_op_id.index) > (key), actual: 1050 vs 1050


https://detective-gcp.dev.yugabyte.com/job/github-yugabyte-db-mac-master-clang-debug/874/artifact/build/debug-clang-dynamic-ninja/yb-test-logs/tests-client__ql-stress-test/QLStressTest_OldLeaderCatchUpAfterNetworkPartition.log

Expected: (leader) != (nullptr), actual: NULL vs 8-byte object <00-00 00-00 00-00 00-00>

@bmatican bmatican added kind/failing-test Tests and testing infra area/docdb YugabyteDB core features labels Jan 28, 2020
@bmatican bmatican self-assigned this Jan 28, 2020
@bmatican bmatican assigned robertsami and unassigned bmatican Jul 30, 2020
@bmatican
Copy link
Contributor Author

bmatican commented Jul 30, 2020

passing this to you @robertsami since you were looking at another QLStressTest recently.

edit: actually, looking at the 100 buckets view, seems like stability for this has improved recently: https://detective-gcp.dev.yugabyte.com/stability/test?buckets=100&class=QLStressTest&name=OldLeaderCatchUpAfterNetworkPartition

not sure if this is still a concern

@bmatican
Copy link
Contributor Author

bmatican commented Oct 7, 2020

Can confirm this is still a problem, seems like 11/25 commit failure rate.

robertsami added a commit that referenced this issue Oct 19, 2020
Summary:
Prevoiusly this test would fail with two kinds of transient failures:
1) Assertion that chosen leader had an election term of 1
2) Assertion that disconnected leader's election term did not advance

To address 1), we relax this assertion, since it should be expected that the chosen leader's election term could be greater than 1 by the time we get to choosing it. This does not affect the validity of the test.

2) was caused by unbroken incoming connections to the "isolated" leader. When we set a tserver to isolated, we call TEST_Isolated, which had a bug whereby incoming connections were not properly terminated. This diff corrects that behavior.

Test Plan: `ybd --cxx-test client_ql-stress-test --gtest_filter QLStressTest.OldLeaderCatchUpAfterNetworkPartition --clang -n 100`

Reviewers: sergei

Reviewed By: sergei

Subscribers: ybase, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D9628
@robertsami robertsami reopened this Oct 20, 2020
@robertsami
Copy link
Contributor

last 15 commits seem good for this test after 1274b42 -- closing for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/failing-test Tests and testing infra
Projects
None yet
Development

No branches or pull requests

2 participants