Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemaChangeStep is skipped in acceptance/version-upgrade #58489

Closed
cockroach-teamcity opened this issue Jan 6, 2021 · 47 comments · Fixed by #87142 or #98855
Closed

roachtest: schemaChangeStep is skipped in acceptance/version-upgrade #58489

cockroach-teamcity opened this issue Jan 6, 2021 · 47 comments · Fixed by #87142 or #98855
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Jan 6, 2021

(roachtest).acceptance/version-upgrade failed on master@cee475331ca3629b503cd2e7c7919b72c98a5ca5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:258,versionupgrade.go:114,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:69,acceptance.go:110,test_runner.go:760: write tcp 172.17.0.3:52698->35.192.202.190:26257: write: broken pipe

	cluster.go:1637,context.go:140,cluster.go:1626,test_runner.go:841: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2561359-1609916496-07-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 3: 5485
		4: 5484
		2: 5508
		1: dead
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1850
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

Jira issue: CRDB-3386

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jan 6, 2021
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@339275585b7d30b9ee2d49b0c696b9ddb8d51ad4:

		  |  1587.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1587.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1588.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1588.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1589.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1589.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1590.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1590.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1591.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1591.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1592.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1592.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1593.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1593.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1594.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1594.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1595.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1595.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1596.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1596.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1597.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1597.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1598.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1598.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1599.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1599.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1600.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1600.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1601.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1601.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1602.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1602.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1603.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1603.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1604.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1604.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1605.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1605.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1606.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1606.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1607.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1607.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg
Copy link
Member

tbg commented Jan 12, 2021

The first failure has the same stack trace as the mixed version test #58523

I210106 07:05:35.481781 1 util/log/flags.go:194  stderr capture started
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1d57f92]

goroutine 6945 [running]:
panic(0x4225c00, 0x7684030)
	/usr/local/go/src/runtime/panic.go:1064 +0x545 fp=0xc000547430 sp=0xc000547368 pc=0x4e1b25
runtime.panicmem(...)
	/usr/local/go/src/runtime/panic.go:212
runtime.sigpanic()
	/usr/local/go/src/runtime/signal_unix.go:742 +0x413 fp=0xc000547460 sp=0xc000547430 pc=0x4f8873
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Replica).Version(0xc001899000, 0x0, 0x0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/replica.go:805 +0x72 fp=0xc0005474a0 sp=0xc000547460 pc=0x1d57f92
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).PurgeOutdatedReplicas.func1(0xc001899000, 0xc0013b7140)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2814 +0x8c fp=0xc000547540 sp=0xc0005474a0 pc=0x1e226ac
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*storeReplicaVisitor).Visit(0xc0013b7140, 0xc000547600)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:396 +0x151 fp=0xc0005475a8 sp=0xc000547540 pc=0x1dd7211
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).VisitReplicas(...)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2013
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).PurgeOutdatedReplicas(0xc000dca000, 0x55a3500, 0xc0013b7080, 0x200000014, 0xe00000000, 0xc001328f38, 0xc001ac43c0)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/store.go:2813 +0x1bf fp=0xc000547660 sp=0xc0005475a8 pc=0x1de32bf
github.com/cockroachdb/cockroach/pkg/server.(*migrationServer).PurgeOutdatedReplicas.func1(0xc000dca000, 0x4bad45, 0xc000547700)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/migration.go:184 +0x65 fp=0xc0005476a8 sp=0xc000547660 pc=0x3911065
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1(0x1, 0xc000dca000, 0xc000547700)
	/go/src/github.com/cockroachdb/cockroach/pkg/kv/kvserver/stores.go:131 +0x38 fp=0xc0005476d8 sp=0xc0005476a8 pc=0x1e26198
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range(0xc000925840, 0xc000547790)

@tbg tbg assigned irfansharif and unassigned aayushshah15 Jan 12, 2021
@tbg
Copy link
Member

tbg commented Jan 12, 2021

@irfansharif could you take a look at this?

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@0d6f0ddd0958a134887623be44da33f6726eac85:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:281,versionupgrade.go:434,versionupgrade.go:416,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:72,acceptance.go:113,test_runner.go:760: pq: operation "show cluster setting version" timed out after 2m0s: value differs between local setting ([18 8 8 20 16 2 24 0 32 14]) and KV ([18 8 8 20 16 2 24 0 32 0]); try again later (<nil> after 1m59.095130467s)

	cluster.go:1637,context.go:140,cluster.go:1626,test_runner.go:841: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 1: dead
		4: 17195
		3: 17084
		2: 17417
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1850
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@eba887a1c9bd96269cabc22a5b16b041a49699c5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:281,versionupgrade.go:434,versionupgrade.go:416,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:72,acceptance.go:113,test_runner.go:760: pq: operation "show cluster setting version" timed out after 2m0s: value differs between local setting ([18 8 8 20 16 2 24 0 32 14]) and KV ([18 8 8 20 16 2 24 0 32 0]); try again later (<nil> after 1m58.287310991s)

	cluster.go:1637,context.go:140,cluster.go:1626,test_runner.go:841: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 4: dead
		3: 17921
		2: 17699
		1: 17809
		Error: UNCLASSIFIED_PROBLEM: 4: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1850
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:204
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1374
		Wraps: (3) 4: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

Same thing as described in #58523 (comment).

@irfansharif irfansharif assigned tbg and unassigned irfansharif Jan 14, 2021
@RaduBerinde
Copy link
Member

Another failure here: https://teamcity.cockroachdb.com/viewLog.html?buildId=2585565&buildTypeId=Cockroach_UnitTests

	versionupgrade.go:281,versionupgrade.go:386,retry.go:197,versionupgrade.go:385,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:72,acceptance.go:113,test_runner.go:760: pq: operation "show cluster setting version" timed out after 2m0s: value differs between local setting ([18 8 8 20 16 2 24 0 32 14]) and KV ([18 8 8 20 16 2 24 0 32 0]); try again later (<nil> after 1m59.058255687s)

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@cd180429b80542b8fe0c66899bad21f6be5211af:

		  |  1748.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1748.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1748.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1749.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1749.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1749.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1750.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1750.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1750.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1751.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1751.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1751.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1752.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1752.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1752.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1753.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1753.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1753.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1754.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1754.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1754.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1755.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1755.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1755.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1756.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1756.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1756.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1757.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1757.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1757.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1758.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1758.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1758.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1759.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1759.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1759.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1760.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1760.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1760.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1761.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1761.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnOk
		  |  1761.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@RaduBerinde
Copy link
Member

This test is flaking quite a bit. I am hesitant to skip it since it feels like an important test (and not super sure how to skip roachtests). Do we think we will have a fix soon?

@tbg
Copy link
Member

tbg commented Jan 15, 2021 via email

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@cd54fb728636a4ec365dcd14ca921f39f067ea69:

		  |  1702.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1702.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1703.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1703.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1704.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1704.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1705.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1705.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1706.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1706.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1707.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1707.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1708.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1708.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1709.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1709.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1710.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1710.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1711.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1711.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1712.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1712.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1713.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1713.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1714.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1714.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1715.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1715.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1716.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1716.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1717.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1717.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1718.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1718.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1719.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1719.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1720.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1720.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1721.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1721.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1722.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1722.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@a51f2e212a71ee356a01f14510238baac65e76e7:

		  |  1756.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1756.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1757.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1757.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1758.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1758.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1759.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1759.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1760.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1760.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1761.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1761.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1762.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1762.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1763.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1763.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1764.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1764.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1765.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1765.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1766.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1766.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1767.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1767.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1768.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1768.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1769.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1769.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1770.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1770.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  |  1771.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1771.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1772.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1772.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1773.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1773.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1774.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1774.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1775.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1775.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		  |  1776.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 opOk
		  |  1776.0s        1            0.0            0.0      0.0      0.0      0.0      0.0 txnRbk
		Wraps: (4) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (5) context canceled
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *main.withCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg
Copy link
Member

tbg commented Jan 18, 2021

That last failure has no node crashes. I'll need to investigate it separately.

@tbg
Copy link
Member

tbg commented Jan 18, 2021

It looks like the "no node crashes" failure is very dominant. I get it 80% of the time (20% of the time it passes - I think the crash failure mode is rarer). The failure is always on

if err := db.QueryRow(`SELECT crdb_internal.node_executable_version();`).Scan(&sv); err != nil {
t.Fatal(err)
}

Right after a node has been updated. When I look at the cluster after, it is perfectly healthy. Since this is so frequent, and can't have been going on for very long, I will try my hand at a bisection.

Here's what the failure looks like:

versionupgrade.go:258,versionupgrade.go:341,versionupgrade.go:189,versionupgrade.go:177,acceptance.go:72,acceptance.go:113,test_runner.go:760: write tcp 192.168.178.69:45438->35.196.76.21:26257: write: broken pipe

I am mildly worried that some quirk of my local system is at play.

@fqazi
Copy link
Collaborator

fqazi commented Apr 14, 2022

Re-enabling this isn't trivial the schemachange workload needs version gates in multiple places. Additionally, even without that, this test is exposing real issues in older releases. The right thing here is probably to drop the GA blocker tag and focus on enabling it again on the next release

@andreimatei
Copy link
Contributor

@fqazi you've closed this, but the issue is still being referenced in the upgrade roachtest, and schemaChangeStep seems indeed to be unused. Did we forget to do something?
Reopening.

@andreimatei andreimatei reopened this Nov 21, 2022
fqazi added a commit to fqazi/cockroach that referenced this issue Mar 17, 2023
Previously, due to flakes we disabled schema changes inside
the version update test. This patch re-enables them, since
we are confident that the workload itslef is now stable in a
mixed version state.

Fixes: cockroachdb#58489
Release note: None
fqazi added a commit to fqazi/cockroach that referenced this issue Mar 17, 2023
Previously, due to flakes we disabled schema changes inside
the version update test. This patch re-enables them, since
we are confident that the workload itslef is now stable in a
mixed version state.

Fixes: cockroachdb#58489
Release note: None
craig bot pushed a commit that referenced this issue Mar 20, 2023
98792: kvserver: unskip `TestNewTruncateDecision` r=erikgrinaker a=erikgrinaker

Passed after 10k stress runs. Has been skipped since 2019, issue seems to have been fixed in the meanwhile.

Resolves #38584.

Epic: none
Release note: None

98855: roachtest: enable schema changes in acceptance/version-upgrade r=fqazi a=fqazi

Previously, due to flakes we disabled schema changes inside the version update test. This patch re-enables them, since we are confident that the workload itslef is now stable in a mixed version state.

Fixes: #58489
Release note: None

99023: kv: add log scope to BenchmarkSingleRoundtripWithLatency r=arulajmani a=nvanbenschoten

Informs #98887.

Avoids mixing logs with benchmark results, which breaks benchdiff.

Release note: None

99033: storepool: set last unavailable on gossip dead r=andrewbaptist a=kvoli

Previously, the `LastUnavailable` time was set in most parts of the storepool when a store was considered either `Unavailable`, `Dead`, `Decommissioned` or `Draining`. When `LastUnavailable` is within the last suspect duration (30s default), the node is treated as suspect by other nodes in the cluster.

`LastUnavailable` was not being set when a store was considered dead due to the store not gossiping its store descriptor. This commit updates the `status` storepool function to do just that.

Informs: #98928

Release note: None

99039: pkg/ccl/backupccl: Remove TestBackupRestoreControlJob r=benbardin a=benbardin

This test has was marked skipped for flakiness, in 2018.

Fixes: #24136

Release note: None

Co-authored-by: Erik Grinaker <grinaker@cockroachlabs.com>
Co-authored-by: Faizan Qazi <faizan@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Austen McClernon <austen@cockroachlabs.com>
Co-authored-by: Ben Bardin <bardin@cockroachlabs.com>
@craig craig bot closed this as completed in a4aa8ef Mar 20, 2023
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. skipped-test T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet