Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: TestFullClusterBackup/ensure_that_jobs_are_restored flakes under span configs #75060

Closed
Tracked by #75639
irfansharif opened this issue Jan 18, 2022 · 3 comments · Fixed by #79228
Closed
Tracked by #75639
Assignees
Labels
A-zone-configs branch-master Failures and bugs on the master branch. branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). GA-blocker T-disaster-recovery

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jan 18, 2022

Describe the problem

Saw a TestFullClusterBackup/ensure_that_jobs_are_restored failure here that looks like fallout from #73876. This issue tracks the investigation for as to why.

------- Stdout: -------
=== RUN   TestFullClusterBackup/ensure_that_jobs_are_restored
    full_cluster_backup_restore_test.go:334: 
        	Error Trace:	full_cluster_backup_restore_test.go:334
        	Error:      	Not equal: 
        	            	expected: []string{"728894745672974337", "running", "2022-01-18 13:10:23.459824 +0000 +0000", "\n\x1freconciling span configurations\x12\x04root\x18\x9b\xc6\xcaﯻ\xf5\x02\xa0\x01\x01\xda\x01\x00", "\x10\xf5\xb3\xc6ﯻ\xf5\x02\xa8\x01\xf2\xf4\xc1\xe9\xb0ւ\xfes\xb2\x01\x02\n\x00", "NULL", "NULL", "1"}
        	            	actual  : []string{"728894745672974337", "running", "2022-01-18 13:10:23.459824 +0000 +0000", "\n\x1freconciling span configurations\x12\x04root\x18\x9b\xc6\xcaﯻ\xf5\x02\xa0\x01\x01\x82\x02\x89!\n\arunning\x10鳥\xfe\xaf\xbb\xf5\x02\x18֝\xe8\xfe\xaf\xbb\xf5\x02 \x01*\xe9 \x12\xe6 \n\xf9\x1a\x12\xf6\x1a\n\xe4\x18\x12\xe1\x18\n\xd3\x17\x12\xd0\x17\n\xb4\x02\n\xb1\x02\n(expected to delete 11 row(s), deleted 10\x12\x84\x02\n8git.luolix.top/cockroachdb/errors/errutil/*errutil.leafError\x12:\n8git.luolix.top/cockroachdb/errors/errutil/*errutil.leafError\x1a(expected to delete 11 row(s), deleted 10\"b\n4type.googleapis.com/cockroach.errorspb.StringPayload\x12*\n(expected to delete 11 row(s), deleted 10\x1a\x96\x15\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x12>\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x1a\x95\x14\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).updateSpanConfigEntriesWithTxn\n\t/go/src/github.com/cockroachd
b/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:186\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).UpdateSpanConfigEntries.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:128\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.runTxn.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:849\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.(*Txn).exec\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:972\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.runTxn\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:848\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.(*DB).Txn\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:830\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).UpdateSpanConfigEntries\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:127\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*fullReconciler).d
eleteExtraneousSpanConfigs\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:330\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*fullReconciler).reconcile\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:253\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*Reconciler).Reconcile\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:150\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob.(*resumer).Resume\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob/job.go:74\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1123\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1124\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.
(*Registry).runJob\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:401\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:323\ngit.luolix.top/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:488\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\x1a\x88\x01\nAgit.luolix.top/cockroachdb/errors/assert/*assert.withAssertionFailure\x12C\nAgit.luolix.top/cockroachdb/errors/assert/*assert.withAssertionFailure\x128non-cancelable: expected to delete 11 row(s), deleted 10\x1a\xd2\x01\n9git.luolix.top/cockroachdb/errors/errutil/*errutil.withPrefix\x12;\n9git.luolix.top/cockroachdb/errors/errutil/*errutil.withPrefix\x1a\x0enon-cancelable\"H\n4type.googleapis.com/cockroach.errorspb.StringPayload\x12\x10\n\x0enon-cancelable\x1a\xe7\x05\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x12>\n<github.com/cockroachdb/errors/withstack/*withs
tack.withStack\x1a\xe6\x04\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1148\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:401\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:323\ngit.luolix.top/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:488\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\xda\x01\x00", "\x10\x84\xfb\xa0\xfe\xaf\xbb\xf5\x02\xa8\x01\xec\xda\xea\xb3ó\xea\x9dv\xb2\x01\x02\n\x00", "NULL", "NULL", "1"}
        	            	
        	            	Diff:
        	            	--- Expected
        	            	+++ Actual
        	            	@@ -4,4 +4,4 @@
        	            	  (string) (len=38) "2022-01-18 13:10:23.459824 +0000 +0000",
        	            	- (string) (len=54) "\n\x1freconciling span configurations\x12\x04root\x18\x9b\xc6\xcaﯻ\xf5\x02\xa0\x01\x01\xda\x01\x00",
        	            	- (string) (len=25) "\x10\xf5\xb3\xc6ﯻ\xf5\x02\xa8\x01\xf2\xf4\xc1\xe9\xb0ւ\xfes\xb2\x01\x02\n\x00",
        	            	+ (string) (len=4291) "\n\x1freconciling span configurations\x12\x04root\x18\x9b\xc6\xcaﯻ\xf5\x02\xa0\x01\x01\x82\x02\x89!\n\arunning\x10鳥\xfe\xaf\xbb\xf5\x02\x18֝\xe8\xfe\xaf\xbb\xf5\x02 \x01*\xe9 \x12\xe6 \n\xf9\x1a\x12\xf6\x1a\n\xe4\x18\x12\xe1\x18\n\xd3\x17\x12\xd0\x17\n\xb4\x02\n\xb1\x02\n(expected to delete 11 row(s), deleted 10\x12\x84\x02\n8git.luolix.top/cockroachdb/errors/errutil/*errutil.leafError\x12:\n8git.luolix.top/cockroachdb/errors/errutil/*errutil.leafError\x1a(expected to delete 11 row(s), deleted 10\"b\n4type.googleapis.com/cockroach.errorspb.StringPayload\x12*\n(expected to delete 11 row(s), deleted 10\x1a\x96\x15\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x12>\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x1a\x95\x14\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).updateSpanConfigEntriesWithTxn\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:186\ngithu
b.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).UpdateSpanConfigEntries.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:128\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.runTxn.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:849\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.(*Txn).exec\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/txn.go:972\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.runTxn\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:848\ngit.luolix.top/cockroachdb/cockroach/pkg/kv.(*DB).Txn\n\t/go/src/github.com/cockroachdb/cockroach/pkg/kv/db.go:830\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor.(*KVAccessor).UpdateSpanConfigEntries\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigkvaccessor/kvaccessor.go:127\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*fullReconciler).deleteExtraneousSpanConfigs\n\t/go/src/github.com/cockroachdb/cockroach/p
kg/spanconfig/spanconfigreconciler/reconciler.go:330\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*fullReconciler).reconcile\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:253\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler.(*Reconciler).Reconcile\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigreconciler/reconciler.go:150\ngit.luolix.top/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob.(*resumer).Resume\n\t/go/src/github.com/cockroachdb/cockroach/pkg/spanconfig/spanconfigjob/job.go:74\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1123\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1124\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/
adopt.go:401\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:323\ngit.luolix.top/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:488\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\x1a\x88\x01\nAgit.luolix.top/cockroachdb/errors/assert/*assert.withAssertionFailure\x12C\nAgit.luolix.top/cockroachdb/errors/assert/*assert.withAssertionFailure\x128non-cancelable: expected to delete 11 row(s), deleted 10\x1a\xd2\x01\n9git.luolix.top/cockroachdb/errors/errutil/*errutil.withPrefix\x12;\n9git.luolix.top/cockroachdb/errors/errutil/*errutil.withPrefix\x1a\x0enon-cancelable\"H\n4type.googleapis.com/cockroach.errorspb.StringPayload\x12\x10\n\x0enon-cancelable\x1a\xe7\x05\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x12>\n<github.com/cockroachdb/errors/withstack/*withstack.withStack\x1a\xe6\x04\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*
Registry).stepThroughStateMachine\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1148\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:401\ngit.luolix.top/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func1\n\t/go/src/github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:323\ngit.luolix.top/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2\n\t/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:488\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1581\xda\x01\x00",
        	            	+ (string) (len=25) "\x10\x84\xfb\xa0\xfe\xaf\xbb\xf5\x02\xa8\x01\xec\xda\xea\xb3ó\xea\x9dv\xb2\x01\x02\n\x00",
        	            	  (string) (len=4) "NULL",
        	Test:       	TestFullClusterBackup/ensure_that_jobs_are_restored
    --- FAIL: TestFullClusterBackup/ensure_that_jobs_are_restored (0.03s)

To Reproduce

dev test pkg/ccl/backupccl -f=TestFullClusterBackup/ensure_that_jobs_are_restored

Jira issue: CRDB-12446

@irfansharif irfansharif added C-test-failure Broken test (automatically or manually discovered). A-zone-configs labels Jan 18, 2022
@irfansharif irfansharif changed the title backupccl: TestFullClusterBackup/ensure_that_jobs_are_restored flakes under span configs backupccl: TestFullClusterBackup/ensure_that_jobs_are_restored flakes under span configs Jan 18, 2022
@irfansharif
Copy link
Contributor Author

irfansharif commented Jan 20, 2022

Bit amazed it fails so frequently in #75089, I'm not having much luck with this test under stress on my GCE worker.

$ dev test pkg/ccl/backupccl -f=TestFullClusterBackup/ensure_that_jobs_are_restored -v --stress
[...]
385 runs so far, 0 failures, over 46m20s

@blathers-crl
Copy link

blathers-crl bot commented Jan 31, 2022

cc @cockroachdb/bulk-io

@irfansharif
Copy link
Contributor Author

PS: I'm not actively investigating this. Letting it sit on the bulk-IO board to remind ourselves to address it during stability at the latest.

@arulajmani arulajmani added branch-master Failures and bugs on the master branch. GA-blocker branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 labels Mar 25, 2022
irfansharif added a commit to irfansharif/cockroach that referenced this issue Apr 1, 2022
Closes cockroachdb#75060; speculating that cockroachdb#79222 fixed the underlying issue.
Previously the reconciliation job errored out with "expected to delete
11 row(s), deleted 10", which could conceivably occur if there were two
instances of the reconciliation job running post-restore (something
\cockroachdb#79222 fixes).

Release note: None
craig bot pushed a commit that referenced this issue Apr 1, 2022
79228: backupccl: run TestFullClusterBackup with span configs r=irfansharif a=irfansharif

Closes #75060; speculating that #79222 fixed the underlying issue.
Previously the reconciliation job errored out with "expected to delete
11 row(s), deleted 10", which could conceivably occur if there were two
instances of the reconciliation job running post-restore (something
\#79222 fixes).

Release note: None

---

First two commits are from #79139 and #79222 respectively.

Co-authored-by: David Taylor <tinystatemachine@gmail.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@craig craig bot closed this as completed in f475361 Apr 2, 2022
blathers-crl bot pushed a commit that referenced this issue Apr 2, 2022
Closes #75060; speculating that the previous commit fixed the underlying
issue. Previously the reconciliation job errored out with "expected to
delete 11 row(s), deleted 10", which could conceivably occur if there
were two instances of the reconciliation job running post-restore.

Release note: None
fqazi pushed a commit to fqazi/cockroach that referenced this issue Apr 4, 2022
Closes cockroachdb#75060; speculating that the previous commit fixed the underlying
issue. Previously the reconciliation job errored out with "expected to
delete 11 row(s), deleted 10", which could conceivably occur if there
were two instances of the reconciliation job running post-restore.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-zone-configs branch-master Failures and bugs on the master branch. branch-release-22.1 Used to mark GA and release blockers, technical advisories, and bugs for 22.1 C-test-failure Broken test (automatically or manually discovered). GA-blocker T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants