Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/version-upgrade failed because distsql is unexpectedly used #50000

Closed
cockroach-teamcity opened this issue Jun 9, 2020 · 24 comments · Fixed by #52624
Closed
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).acceptance/version-upgrade failed on master@3e74b0dd29fe9637fc3ebd941696df916d6a0d34:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:54,acceptance.go:90,test_runner.go:753: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jun 9, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jun 9, 2020
@asubiotto
Copy link
Contributor

#47024 (comment)

@asubiotto asubiotto changed the title roachtest: acceptance/version-upgrade failed roachtest: acceptance/version-upgrade failed because distsql is unexpectedly used Jun 11, 2020
@tbg tbg assigned asubiotto and unassigned andreimatei Jun 16, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@3426ece476777be51252f8ea22ea9f99c3027631:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:753: EOF

	cluster.go:1512,context.go:135,cluster.go:1501,test_runner.go:825: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 3: 26367
		2: 26357
		4: 26355
		1: dead
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@f21f36717cbd2a5dec2ab6abb2c895e32215d605:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:753: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@52b1686bfd727af48c09795039f1c0e82d58bf2d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:757: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@andreimatei
Copy link
Contributor

I ran into this too. I think maybe we should initialize sql.Server.cfg.InternalExecutor to one that disables distribution (just like we do for the executor we pass to the migrations runner), and then override it with a vanilla one after migrations are done.

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@0b65365fce6a9bdde6c611c92031877ade88a8b2:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:757: EOF

	cluster.go:1512,context.go:135,cluster.go:1501,test_runner.go:829: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 3: dead
		4: 25451
		2: 25612
		1: 25188
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@1d2809e1ceb5509b48434f6642acdb014112edcc:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:757: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@f61d73fe1e4773372f3800b715143305a3342ee7:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:95,test_runner.go:757: read tcp 127.0.0.1:47394->127.0.0.1:26259: read: connection reset by peer

	cluster.go:1512,context.go:135,cluster.go:1501,test_runner.go:829: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 2: dead
		4: 24025
		3: 23613
		1: 23904
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@91ae9bc70d00868e46c83139c0e5621e4e4971f3:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:66,acceptance.go:102,test_runner.go:757: read tcp 127.0.0.1:37808->127.0.0.1:26263: read: connection reset by peer

	cluster.go:1516,context.go:135,cluster.go:1505,test_runner.go:826: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 4: dead
		1: 21176
		2: 21045
		3: 21308
		Error: UNCLASSIFIED_PROBLEM: 4: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 4: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@1b70ae496e0a30ec89fa573ef9f3d1c801e3c5e9:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:261,versionupgrade.go:409,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:757: dial tcp 127.0.0.1:26261: connect: connection refused

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 3: dead
		2: 21692
		4: 21334
		1: 21453
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@ab32cf173c266987e79554dad0539d03262346fc:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:757: read tcp 127.0.0.1:38202->127.0.0.1:26257: read: connection reset by peer

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 1: dead
		3: 21744
		2: 22005
		4: 21875
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@tbg
Copy link
Member

tbg commented Jul 15, 2020

The new failure mode here is opaque - we're not seeing evidence of why the node died, but it definitely does die. None of the logs have anything about this.

I ran this 100x on my gceworker last night, and they all passed. I am starting to think that this is something specific to CI, maybe oom killer? But it's not like this test does any heavy lifting whatsoever.

We don't get dmesg here (probably b/c local), I will look into adding that.

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@5149293fb2c51d75448b6ae8db89e93a1a53b704:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:757: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@a0123f1bc050f67b942ff1e36181847f0edb3e10:

		  | main.(*testRunner).runTest.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:757
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (2) 2 safe details enclosed
		Wraps: (3) output in run_004411.065_n1_workload_run_schemachange
		Wraps: (4) /go/src/github.com/cockroachdb/cockroach/bin/roachprod run local:1 -- ./workload run schemachange --verbose=1 --tolerate-errors=true --max-ops 10 --concurrency 2 {pgurl:1-4} returned
		  | stderr:
		  | 20 01:13:09.456086 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:09.831973 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:10.228252 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:10.639270 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:11.062617 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:11.480028 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  | I200720 01:13:11.885756 1 workload/cli/run.go:362  retrying after error while creating load: dial tcp 127.0.0.1:26259: connect: connection refused
		  |
		  | stdout:
		Wraps: (5) secondary error attachment
		  | signal: killed
		  | (1) signal: killed
		  | Error types: (1) *exec.ExitError
		Wraps: (6) context canceled
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *secondary.withSecondaryError (6) *errors.errorString

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 1: 26297
		3: 25940
		2: dead
		4: 26059
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@89dda79e3b5afd4a3da36ce74512c6cc74af7b3b:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:757: dial tcp 127.0.0.1:26259: connect: connection refused

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:826: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 2: dead
		1: 23242
		3: 23002
		4: 23123
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@069d328c2d6d7d54d16c9f8ee24dc836851c96c5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:414,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:754: pq: no inbound stream connection

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@2c6d0d2317767809a3f5139e8dca9709b486779a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:238,versionupgrade.go:316,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:754: EOF

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 2: dead
		1: 20482
		4: 20603
		3: 20363
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@7e36e68a83d92e007b1394eadf6a73bf4519c586:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	versionupgrade.go:261,versionupgrade.go:409,versionupgrade.go:391,versionupgrade.go:167,versionupgrade.go:155,acceptance.go:59,acceptance.go:96,test_runner.go:754: dial tcp 127.0.0.1:26259: connect: connection refused

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 2: dead
		4: 30397
		3: 30158
		1: 30277
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@jordanlewis
Copy link
Member

@tbg is this still related to #51053? I'm confused about the state here.

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@c9c2aa454d5428d5327993a696464072d8224507:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 1: dead
		2: 317
		4: 32545
		3: 436
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@c9c2aa454d5428d5327993a696464072d8224507:

		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *secondary.withSecondaryError (6) *errors.errorString

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor local --oneshot --ignore-empty-nodes: exit status 1 4: 31126
		2: 30887
		3: dead
		1: dead
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) secondary error attachment
		  | 1: dead
		  | (1) attached stack trace
		  |   | main.glob..func13
		  |   | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  |   | main.wrap.func1
		  |   | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  |   | github.com/spf13/cobra.(*Command).execute
		  |   | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  |   | github.com/spf13/cobra.(*Command).ExecuteC
		  |   | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  |   | github.com/spf13/cobra.(*Command).Execute
		  |   | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  |   | main.main
		  |   | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  |   | runtime.main
		  |   | 	/usr/local/go/src/runtime/proc.go:203
		  |   | runtime.goexit
		  |   | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		  | Wraps: (2) 3 safe details enclosed
		  | Wraps: (3) 1: dead
		  | Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errors.errorString
		Wraps: (3) attached stack trace
		  | main.glob..func13
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (4) 3 safe details enclosed
		Wraps: (5) 3: dead
		Error types: (1) errors.Unclassified (2) *secondary.withSecondaryError (3) *withstack.withStack (4) *safedetails.withSafeDetails (5) *errors.errorString

More

Artifacts: /acceptance/version-upgrade
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/version-upgrade failed on master@7b7a63106e5fb11574b470929dab2c145ca8fd34:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: artifacts/acceptance/version-upgrade/run_1
	cluster.go:1658,test_runner.go:837: r262 (/Table/38) is inconsistent: RANGE_INCONSISTENT (n1,s1):1: checksum 647c99c621aa4b132b5e25adbf9afed1c05535727a90f3ccb6675b1b60086a8ca834eecce9f21d1c5d277f10c64462000b51347adee09583c631ce6b4bd124e9 [minority]
		(1) attached stack trace
		  | main.(*cluster).CheckReplicaDivergenceOnDB
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1609
		  | main.(*cluster).FailOnReplicaDivergence.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1655
		  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:135
		  | main.(*cluster).FailOnReplicaDivergence
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1652
		  | main.(*testRunner).runTest
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:837
		  | main.(*testRunner).runWorker
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:464
		  | main.(*testRunner).Run.func2
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:275
		  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunWorker.func1
		  | 	/go/src/github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:222
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (2) 5 safe details enclosed
		Wraps: (3) r262 (/Table/38) is inconsistent: RANGE_INCONSISTENT (n1,s1):1: checksum 647c99c621aa4b132b5e25adbf9afed1c05535727a90f3ccb6675b1b60086a8ca834eecce9f21d1c5d277f10c64462000b51347adee09583c631ce6b4bd124e9 [minority]
		  | - stats: contains_estimates:0 last_update_nanos:1596842364031079451 intent_age:0 gc_bytes_age:0 live_bytes:0 live_count:0 key_bytes:0 key_count:0 val_bytes:0 val_count:0 intent_bytes:0 intent_count:0 sys_bytes:723 sys_count:7 abort_span_bytes:1 
		  | - stats.Sub(recomputation): last_update_nanos:1596842364031079451 sys_bytes:723 sys_count:7 abort_span_bytes:1 
		  | (n2,s2):4: checksum 59da917862af75013106d0f08b0c6501ebeac2201c8233e20537eb5391569186398aa98a38d66ae025567424d44ab5fb0938bb298b8ddc05b1d1627eecb4e1a0
		  | - stats: contains_estimates:0 last_update_nanos:1596842364031079451 intent_age:0 gc_bytes_age:0 live_bytes:0 live_count:0 key_bytes:0 key_count:0 val_bytes:0 val_count:0 intent_bytes:0 intent_count:0 sys_bytes:723 sys_count:7 abort_span_bytes:0 
		  | - stats.Sub(recomputation): last_update_nanos:1596842364031079451 sys_bytes:723 sys_count:7 
		  | (n4,s4):2: checksum 59da917862af75013106d0f08b0c6501ebeac2201c8233e20537eb5391569186398aa98a38d66ae025567424d44ab5fb0938bb298b8ddc05b1d1627eecb4e1a0
		  | - stats: contains_estimates:0 last_update_nanos:1596842364031079451 intent_age:0 gc_bytes_age:0 live_bytes:0 live_count:0 key_bytes:0 key_count:0 val_bytes:0 val_count:0 intent_bytes:0 intent_count:0 sys_bytes:723 sys_count:7 abort_span_bytes:0 
		  | - stats.Sub(recomputation): last_update_nanos:1596842364031079451 sys_bytes:723 sys_count:7 
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errors.errorString

More

Artifacts: /acceptance/version-upgrade

See this test on roachdash
powered by pkg/cmd/internal/issues

@asubiotto
Copy link
Contributor

I looked more into this issue's no inbound stream connection failure mode. The initial conclusion that this was to do with distsql during migrations came to a dead end in #51053 when we saw that it was already the case that distsql was off. Reproducing this failure mode under stress showed that we were hitting these no inbound stream connection errors during normal query execution after a restart. An outbox would fail a connection attempt even though the connection seemed healthy (a SetupFlowRequest must have already been sent to that node). The connection would then become available after a couple hundred milliseconds.

In #52624, I added retries for this connection attempt and can no longer hit no inbound stream connection (although there are other failure modes).

@asubiotto
Copy link
Contributor

#52624 fixes the no inbound stream connection failure mode with the caveat that this is a mixed-version test, so I think it's likely this will fail again in the future due to the predecessor version not including that fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
6 participants