Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestDockerCLI/test_demo_partitioning.tcl failed #96797

Closed
tbg opened this issue Feb 8, 2023 · 5 comments · Fixed by #106722
Closed

TestDockerCLI/test_demo_partitioning.tcl failed #96797

tbg opened this issue Feb 8, 2023 · 5 comments · Fixed by #106722
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). skipped-test T-kv KV Team

Comments

@tbg
Copy link
Member

tbg commented Feb 8, 2023

expect: does "/cockroach/cockroach demo --no-line-editor --geo-partitioned-replicas\r\n#\r\n# --geo-partitioned replicas operates on a 9 node cluster.\r\n# The cluster size has been changed from the default to 9 nodes.\r\n#\r\n# Welcome to the CockroachDB demo database!\r\n#\r\n# You are connected to a temporary, in-memory CockroachDB cluster of 9 nodes.\r\n#\r\n# You are connected to tenant "demo-tenant", but can connect to the system tenant with\r\n# '\connect' and the SQL url printed via '\demo ls'.\r\n#\r\n# This demo session will send telemetry to Cockroach Labs in the background.\r\n# To disable this behavior, set the environment variable\r\n# COCKROACH_SKIP_ENABLING_DIAGNOSTIC_REPORTING=true.\r\n*\r\n* ERROR: pq: update-setting: failed to send RPC: sending to all replicas failed; last error: routing information detected to be stale: [NotLeaseHolderError] lease held by different store; r59: replica (n4,s4):2 not lease holder; current lease is repl=(n2,s2):4VOTER_INCOMING seq=3 start=1675862669.825552515,0 epo=1 pro=1675862669.826726640,0\r\n*\r\nERROR: pq: update-setting: failed to send RPC: sending to all replicas failed; last error: routing information detected to be stale: [NotLeaseHolderError] lease held by different store; r59: replica (n4,s4):2 not lease holder; current lease is repl=(n2,s2):4VOTER_INCOMING seq=3 start=1675862669.825552515,0 epo=1 pro=1675862669.826726640,0\r\nFailed running "demo"\r\n:/# " (spawn_id exp3) match glob pattern "operation is unsupported in multi-tenancy mode"? no

ERROR: pq: update-setting: failed to send RPC: sending to all replicas failed; last error: routing information detected to be stale: [NotLeaseHolderError] lease held by different store; r59: replica (n4,s4):2 not lease holder; current lease is repl=(n2,s2):4VOTER_INCOMING seq=3 start=1675862669.825552515,0 epo=1 pro=1675862669.826726640,0

This error shouldn't bubble up to the client, I don't think.

https://teamcity.cockroachdb.com/viewLog.html?buildId=8633421&tab=buildResultsDiv&buildTypeId=Cockroach_Ci_Tests_Acceptance

from unrelated PR #96785.

Jira issue: CRDB-24338

@tbg tbg added C-test-failure Broken test (automatically or manually discovered). branch-master Failures and bugs on the master branch. T-kv KV Team labels Feb 8, 2023
@tbg
Copy link
Member Author

tbg commented Feb 9, 2023

Another one #94825 (comment)

herkolategan added a commit to herkolategan/cockroach that referenced this issue Feb 13, 2023
Refs: cockroachdb#96797

Reason: flaky test

Generated by bin/skip-test.

Release justification: non-production code changes

Release note: None
herkolategan added a commit to herkolategan/cockroach that referenced this issue Feb 13, 2023
Renamed `test_demo_partitioning.tcl` to `test_demo_partitioning.tcl.disabled`
which will cause TestDockerCLI to skip the test file.

Refs: cockroachdb#96797

Reason: flaky test

Epic: None

Release note: None
craig bot pushed a commit that referenced this issue Feb 13, 2023
96995: changefeedccl: Fix initial scan checkpointing r=miretskiy a=miretskiy

An over than 2 year old change
(#71848) that added support for checkpointing during backfill after schema change, inadvertently broke initial scan checkpointing functionality

Exacerbating the problem, the existing test
`TestChangefeedBackfillCheckpoint` continued to work fine. The reason why it was passing was because the test was looking for a checkpoint whose timestamp matched backfill timestamp. The bug involved incorrect initialize/use of 0 timestamp. It just so happens, that after initial scan completes, the rangefeed starts, and the very first thing it does is to generate a 0 timestamp checkpoint.  So, the test was observing this event, and continued to work.
This PR does not have a dedicated test because the existing tests work fine -- provided we ignore 0 timestamp checkpoint, which is what this PR does in addition to addressing the root cause of the bug.

Informs #96959

Release note (enterprise change): Fix a bug in changefeeds, where long running initial scans will fail to generate checkpoint. Failure to generate checkpoint is particularly bad if the changefeed restarts for whatever reason.  Without checkpoints, the changefeed will restart from the beginning, and in the worst case, when exporting substantially sized tables, changefeed initial scan may have hard time completing.

97037: acceptance: skip TestDockerCLI test_demo_partitioning.tcl only r=tbg a=herkolategan

Renamed `test_demo_partitioning.tcl` to `test_demo_partitioning.tcl.disabled` which will cause TestDockerCLI to skip the test file.

Refs: #96797

Reason: flaky test

Epic: None

Release note: None

Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com>
Co-authored-by: Herko Lategan <herko@cockroachlabs.com>
@github-actions
Copy link

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

@rafiss
Copy link
Collaborator

rafiss commented Jul 7, 2023

Still skipped

@rafiss rafiss reopened this Jul 7, 2023
craig bot pushed a commit that referenced this issue Jul 8, 2023
106463: interactive_tests: preserve demo logs in tests r=knz a=rafiss

Now we pass in the --log-dir option so that logs are saved when the test fails. Otherwise, the demo command does not log.

This also unskips two tests that should be working now.

informs #96450
informs #102257
informs #100319
informs #106462
informs #106461
informs #96797
informs #96239


Release note: None

Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
@aadityasondhi aadityasondhi self-assigned this Jul 12, 2023
@aadityasondhi
Copy link
Collaborator

Ran it locally with no failure.

.230712 16:17:19.N EXPECT TEST: START TEST: Expect partitioning succeeds
.230712 16:17:27.N EXPECT TEST: EOF TO FOREGROUND PROCESS
.230712 16:17:33.N EXPECT TEST: END TEST
.230712 16:17:33.N EXPECT TEST: START TEST: test multi-region setup
.230712 16:17:40.N EXPECT TEST: EOF TO FOREGROUND PROCESS
.230712 16:17:53.N EXPECT TEST: EOF TO FOREGROUND PROCESS
.230712 16:17:58.N EXPECT TEST: END TEST
.230712 16:17:59.N EXPECT TEST: START TEST: Ensure we can run licensed commands with -e
.230712 16:18:05.N EXPECT TEST: END TEST
.230712 16:18:05.N EXPECT TEST: START TEST: Ensure that licensed commands with -e error when license acquisition is disabled
.230712 16:18:10.N EXPECT TEST: END TEST
.230712 16:18:10.N EXPECT TEST: START TEST: Expect an error if geo-partitioning is requested with multitenant mode
.230712 16:18:20.N EXPECT TEST: END TEST
.230712 16:18:20.N EXPECT TEST: START TEST: Expect an error if geo-partitioning is requested and license acquisition is disabled
.230712 16:18:21.N EXPECT TEST: END TEST
.230712 16:18:21.N EXPECT TEST: EOF TO FOREGROUND PROCESS
.230712 16:18:21.N EXPECT TEST: END TEST

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Jul 31, 2023
@aadityasondhi
Copy link
Collaborator

I am having trouble reproducing this even using gce worker:

aadityasondhi@gceworker-aadityasondhi:~/go/src/github.com/cockroachdb/cockroach$ bazel test //pkg/acceptance:acceptance_test --test_filter test_demo_partitioning --runs_per_test=100
INFO: Invocation ID: c1f07d01-b110-43e2-842c-92b7ec7a9c5b
INFO: Analyzed target //pkg/acceptance:acceptance_test (0 packages loaded, 219 targets configured).
INFO: Found 1 test target...
Target //pkg/acceptance:acceptance_test up-to-date:
  _bazel/bin/pkg/acceptance/acceptance_test_/acceptance_test
INFO: Elapsed time: 134.512s, Critical Path: 3.27s
INFO: 1601 processes: 1 internal, 1600 linux-sandbox.
INFO: Build completed successfully, 1601 total actions
//pkg/acceptance:acceptance_test                                         PASSED in 3.2s
  Stats over 1600 runs: max = 3.2s, min = 1.2s, avg = 2.0s, dev = 0.3s

craig bot pushed a commit that referenced this issue Aug 17, 2023
106722: cli: unskip test_demo_partitioning interactive test r=AlexTalks a=aadityasondhi

Fixes #96797.

Release note: None

Co-authored-by: Aaditya Sondhi <20070511+aadityasondhi@users.noreply.github.com>
@craig craig bot closed this as completed in c175eb7 Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). skipped-test T-kv KV Team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants