Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scaledata/distributed-semaphore/nodes=6 failed [unsafe tscache update] #60580

Closed
cockroach-teamcity opened this issue Feb 15, 2021 · 7 comments · Fixed by #60835 or #61130
Closed
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@5971ecb9dd1a25c81cd6012d6be1ff922802eae5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2687,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 1: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2675
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2683
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 1: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1666,context.go:140,cluster.go:1655,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2676655-1613372337-71-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		4: 4952
		2: 5311
		3: 5370
		6: 5127
		1: dead
		5: 5081
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 15, 2021
@irfansharif
Copy link
Contributor

Unsafe timestamp cache update! Cannot add timestamp 1613390687.219037956,0 to timestamp cache after evaluating PushTxn(940a67fe->872d32ef) [‹×›,‹×›) (resp=‹×›; err=) with local hlc clock at timestamp 1613390687.214958745,0. Non-synthetic timestamps should always lag the local hlc clock.

@nvanbenschoten, perhaps this might be of interest to you.

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@3c223f5f5162103110a790743b687ef2bf952489:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2687,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2675
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2683
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1666,context.go:140,cluster.go:1655,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2688408-1613631814-67-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		5: 4552
		2: 4555
		4: 4708
		1: 6162
		6: 4522
		3: dead
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@64c4aef909f4382523cd9248341ca9f4448d841a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2688,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 1: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 1: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2699361-1613891087-61-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		5: 5132
		2: 4510
		4: 6168
		3: 5510
		6: 5037
		1: dead
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@bf9744bad5a416a4b06907f0f3dd42896f7342f3:

		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		Wraps: (2) monitor failure
		Wraps: (3) attached stack trace
		  -- stack trace:
		  | main.(*monitor).wait.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2732
		Wraps: (4) monitor task failed
		Wraps: (5) attached stack trace
		  -- stack trace:
		  | main.(*cluster).RunE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2332
		  | main.runSqlapp.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:122
		  | main.(*monitor).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2666
		  | golang.org/x/sync/errgroup.(*Group).Go.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/golang.org/x/sync/errgroup/errgroup.go:57
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (6) output in run_121038.033_n7_distributedsemaphore_
		Wraps: (7) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2702231-1613977007-64-n7cpu4:7 -- ./distributed-semaphore  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.132:26257,10.128.0.110:26257,10.128.0.190:26257,10.128.0.208:26257,10.128.0.111:26257,10.128.0.112:26257'  returned
		  | stderr:
		  | <... some data truncated by circular buffer; go to artifacts for details ...>
		  | 158149124/0 pri=0.21091495 epo=17161 ts=1613997007.763139796,2 min=1613996170.562815598,0 seq=2} lock=true stat=PENDING rts=1613997007.703603175,2 wto=false gul=1613996171.062815598,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=4c16e320 key=/Table/54/1/635455754143531011/0 pri=0.24174647 epo=11620 ts=1613997007.791025951,2 min=1613996292.296595275,0 seq=2} lock=true stat=PENDING rts=1613997007.724798792,2 wto=false gul=1613996292.796595275,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=88e85c30 key=/Table/53/1/635454877612310534/0 pri=0.19086557 epo=10484 ts=1613997007.819815227,2 min=1613996179.881697853,0 seq=1} lock=true stat=PENDING rts=1613997007.755278227,2 wto=false gul=1613996180.381697853,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=ddf99e48 key=/Table/54/1/635455951158149124/0 pri=0.21091495 epo=17162 ts=1613997007.826840172,2 min=1613996170.562815598,0 seq=2} lock=true stat=PENDING rts=1613997007.763139796,2 wto=false gul=1613996171.062815598,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=88e85c30 key=/Table/53/1/635454877612310534/0 pri=0.19086557 epo=10485 ts=1613997007.876257225,2 min=1613996179.881697853,0 seq=1} lock=true stat=PENDING rts=1613997007.819815227,2 wto=false gul=1613996180.381697853,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=4c16e320 key=/Table/54/1/635455754143531011/0 pri=0.24174647 epo=11621 ts=1613997007.856747453,2 min=1613996292.296595275,0 seq=2} lock=true stat=PENDING rts=1613997007.791025951,2 wto=false gul=1613996292.796595275,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=ddf99e48 key=/Table/54/1/635455951158149124/0 pri=0.21091495 epo=17163 ts=1613997007.882569873,2 min=1613996170.562815598,0 seq=2} lock=true stat=PENDING rts=1613997007.826840172,2 wto=false gul=1613996171.062815598,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=88e85c30 key=/Table/53/1/635454877612310534/0 pri=0.19086557 epo=10486 ts=1613997007.927989078,2 min=1613996179.881697853,0 seq=1} lock=true stat=PENDING rts=1613997007.876257225,2 wto=false gul=1613996180.381697853,0
		  | 2021/02/22 12:30:10 error/Users/irfansharif/Software/src/github.com/cockroachdb/rksql/pkg/sqlapp/tx.go:30pq: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh): "sql txn" meta={id=4c16e320 key=/Table/54/1/635455754143531011/0 pri=0.24174647 epo=11622 ts=1613997007.903769051,2 min=1613996292.296595275,0 seq=2} lock=true stat=PENDING rts=1613997007.856747453,2 wto=false gul=1613996292.796595275,0
		  |
		  | stdout:
		  | pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1613995838.703547509,0 too old; wrote at 1613995838.721914295,1: "sql txn" meta={id=9be6084f pri=0.00334596 epo=0 ts=1613995838.721914295,1 min=1613995838.703547509,0 seq=3} lock=true stat=PENDING rts=1613995838.703547509,0 wto=false gul=1613995839.203547509,0
		  | pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1613995838.703583362,0 too old; wrote at 1613995838.721914295,1: "sql txn" meta={id=4b06d2c2 pri=0.02901480 epo=0 ts=1613995838.721914295,1 min=1613995838.703583362,0 seq=3} lock=true stat=PENDING rts=1613995838.703583362,0 wto=false gul=1613995839.203583362,0
		Wraps: (8) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (9) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *main.withCommandDetails (8) *secondary.withSecondaryError (9) *errors.errorString

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@5cfd7e5553a3072a1490d392390dddf968844215:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2688,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 1: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 1: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2707822-1614064242-62-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		2: 4957
		3: 5554
		5: 4477
		1: dead
		6: 5141
		4: 5183
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@ec011620c7cf299fdbb898db692b36454defc4a2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2688,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 6: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 6: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2712399-1614149800-71-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		3: 4404
		6: dead
		2: 4680
		5: 4468
		4: 5090
		1: 5133
		Error: UNCLASSIFIED_PROBLEM: 6: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 6: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg tbg changed the title roachtest: scaledata/distributed-semaphore/nodes=6 failed roachtest: scaledata/distributed-semaphore/nodes=6 failed [unsafe tscache update] Feb 25, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/distributed-semaphore/nodes=6 failed on master@c7e088826bc079620dfd3b5ae75d1c15cd9cd16d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/scaledata/distributed-semaphore/nodes=6/run_1
	cluster.go:2688,scaledata.go:126,scaledata.go:56,test_runner.go:767: monitor failure: unexpected node event: 3: dead
		(1) attached stack trace
		  -- stack trace:
		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runSqlapp
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:126
		  | main.registerScaleData.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/scaledata.go:56
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:767
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:140,cluster.go:1656,test_runner.go:848: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2716822-1614236552-65-n7cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: skipped
		3: dead
		2: 5058
		1: 4841
		4: 5971
		5: 5105
		6: 5167
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1852
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /scaledata/distributed-semaphore/nodes=6

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Feb 26, 2021
61113: ui: show replica type on the range report page r=aayushshah15 a=aayushshah15

Resolves #59677 

Release justification: observability improvement

Release note (ui change): the range report page on the admin ui will now
also show each of the replica's types

61128: jobs: introduce jobspb.JobID r=lucy-zhang a=lucy-zhang

This commit introduces a `jobspb.JobID` int64 type and uses it in most
places where we were previously using an int64.

Closes #61121.

Release justification: Low-risk change to existing functionality.

Release note: None

61129: geo/wkt: update parsing of dimensions for empty geometrycollections r=otan,rafiss a=andyyang890

Previously, the data structure used for storing geometry collections
was unable to store a layout, which made it impossible to distinguish
empty geometry collections of different layouts. That issue has since
been fixed and this patch updates the parser accordingly.

Resolves #61035.

Refs: #53091

Release justification: bug fix for new functionality
Release note: None

61130: kv: disable timestamp cache + current clock assertion r=nvanbenschoten a=nvanbenschoten

Closes #60580.
Closes #60736.
Closes #60779.
Closes #61060.

This was added in 218a5a3. The check was more of a sanity check that we have and
always have had an understand of the timestamps that can enter the timestamp
cache. The fact that it's failing is a clear indication that there were issues
in past releases, because a lease transfer used to only be safe if the outgoing
leaseholder's clock was above the time of any read in its timestamp cache. We
now ship a snapshot of the timestamp cache on lease transfers, so that invariant
is less important.

I'd still like to get to the bottom of this, but I'll do so on my own branch,
off of master where it's causing disruption.

Release justification: avoid assertion failures

61155: jobs: make sure we finish spans if canceled before starting job r=ajwerner a=ajwerner

Was seeing:
```
    testcluster.go:135: condition failed to evaluate within 45s: unexpectedly found active spans:
             0.000ms      0.000ms    === operation:job _unfinished:1 intExec:create-stats
        goroutine 84 [running]:
        runtime/debug.Stack(0xc0086b1890, 0x792e940, 0xc009ac37e0)
        	/usr/local/go/src/runtime/debug/stack.go:24 +0xab
```

In roachprod stressrace with a big cluster. This seemed to fix it.

Release justification: bug fixes and low-risk updates to new functionality.

Release note: None

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Lucy Zhang <lucy@cockroachlabs.com>
Co-authored-by: Andy Yang <ayang@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig craig bot closed this as completed in 54f29fe Feb 26, 2021
craig bot pushed a commit that referenced this issue Mar 31, 2021
60835: kv: bump timestamp cache to Pushee.MinTimestamp on PUSH_ABORT r=nvanbenschoten a=nvanbenschoten

Fixes #60779.
Fixes #60580.

We were only checking that the batch header timestamp was equal to or
greater than this pushee's min timestamp, so this is as far as we can
bump the timestamp cache.

62832: geo: minor performance improvement for looping over edges r=otan a=andyyang890

This patch slightly improves the performance of many
spatial builtins by storing the number of edges used
in the loop conditions of for loops into a variable.
We discovered this was taking a lot of time when
profiling the point-in-polygon optimization.

Release note: None

62838: kvserver: purge gc-able, unmigrated replicas during migrations r=irfansharif a=irfansharif

Fixes #58378.
Fixes #62267.

Previously it was possible for us to have replicas in-memory, with
pre-migrated state, even after a migration was finalized. This led to
the kind of badness we were observing in #62267, where it appeared that
a replica was not using the applied state key despite us having migrated
into it (see TruncatedAndRangeAppliedState, introduced in #58088).

---

To see how, consider the following set of events:

- Say r42 starts off on n1, n2, and n3
- n3 flaps and so we place a replica for r42 on n4
- n3's replica, r42/3, is now GC-able, but still un-GC-ed
- We run the applied state migration, first migrating all ranges into it
  and then purging outdated replicas
- Well, we should want to purge r42/3, cause it's un-migrated and
  evaluating anything on it (say a lease request) is unsound because
  we've bumped version gates that tell the kvserver to always expect
  post-migration state
- What happens when we try to purge r42/3? Previous to this PR if it
  didn't have a replica version, we'd skip over it (!)
- Was it possible for r42/3 to not have a replica version? Shouldn't it
  have been accounted for when we migrated all ranges? No, that's precisely
  why the migration infrastructure purge outdated replicas. The migrate
  request only returns once its applied on all followers; in our example
  that wouldn't include r42/3 since it was no longer one
- The stop-gap in #60429 made it so that we didn't GC r42/3, when we
  should've been doing the opposite. When iterating over a store's
  replicas for purging purposes, an empty replica version is fine and
  expected; we should interpret that as signal that we're dealing with a
  replica that was obviously never migrated (to even start using replica
  versions in the first place). Because it didn't have a valid replica
  version installed, we can infer that it's soon to be GC-ed (else we
  wouldn't have been able to finalize the applied state + replica
  version migration)
- The conditions above made it possible for us to evaluate requests on
  replicas with migration state out-of-date relative to the store's
  version
- Boom

Release note: None


62839: zonepb: make subzone DiffWithZone more accurate r=ajstorm a=otan

* Subzones may be defined in a different order. We did not take this
  into account which can cause bugs when e.g. ADD REGION adds a subzone
  in the end rather than in the old "expected" location in the subzones
  array. This has been fixed by comparing subzones using an unordered
  map.
* The ApplyZoneConfig we previously did overwrote subzone fields on the
  original subzone array element, meaning that if there was a mismatch
  it would not be reported through validation. This is now fixed by
  applying the expected zone config to *zonepb.NewZoneConfig() instead.
* Added logic to only check for zone config matches subzones from
  active subzone IDs.
* Improve the error messaging when a subzone config is mismatching -
  namely, add index and partitioning information and differentiate
  between missing fields and missing / extraneous zone configs

Resolves #62790

Release note (bug fix): Fixed validation bugs during ALTER TABLE ... SET
LOCALITY / crdb_internal.validate_multi_region_zone_config where
validation errors could occur when the database of a REGIONAL BY ROW
table has a new region added. Also fix a validation bug partition zone
mismatches configs were not caught.

62872: build: use -json for RandomSyntax test r=otan a=rafiss

I'm hoping this will help out with an issue where the test failures seem
to be missing helpful logs.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Andy Yang <ayang@cockroachlabs.com>
Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
Co-authored-by: Oliver Tan <otan@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
3 participants