Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #73675

Closed
cockroach-teamcity opened this issue Dec 10, 2021 · 3 comments
Closed

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed #73675

cockroach-teamcity opened this issue Dec 10, 2021 · 3 comments
Labels
C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-1 High impact: many users impacted, serious risk of high unavailability or data loss T-kv KV Team X-stale
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Dec 10, 2021

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-21.1 @ 39f01904829cae0b9b75593d7fb97903a54d3e81:

		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:781
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:613
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:733
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 7: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:89,cluster.go:1656,test_runner.go:820: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3882352-1639120481-43-n12cpu4-geo --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		2: 13127
		3: 14072
		1: 13685
		8: skipped
		6: 12925
		12: skipped
		7: dead (exit status 137)
		5: 12417
		11: 12041
		10: 12217
		9: 12219
		Error: UNCLASSIFIED_PROBLEM: 7: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1889
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:255
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (3) 7: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

To reproduce, try:

# From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tpccbench/nodes=9/cpu=4/multi-region

Same failure on other branches

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-11687

@cockroach-teamcity cockroach-teamcity added branch-release-21.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Dec 10, 2021
@cockroach-teamcity cockroach-teamcity added this to the 21.1 milestone Dec 10, 2021
@tbg tbg removed the release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. label Jan 3, 2022
@tbg
Copy link
Member

tbg commented Jan 3, 2022

Removing release-blocker label to unblock 21.1 release. It's an oom from the looks of it:

[ 3001.882804] Memory cgroup out of memory: Killed process 13053 (cockroach) total-vm:16835236kB, anon-rss:14531992kB, file-rss:78380kB, shmem-rss:0kB, UID:1000 pgtables:31440kB oom_score_adj:0

@tbg tbg added the S-1 High impact: many users impacted, serious risk of high unavailability or data loss label Feb 1, 2022
@cockroach-teamcity
Copy link
Member Author

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on release-21.1 @ 9930ef174e01cb992b1996f94040b9b6f2bcbfcb:

		  | main.(*monitor).WaitE
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2676
		  | main.(*monitor).Wait
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2684
		  | main.runTPCCBench
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:781
		  | main.registerTPCCBenchSpec.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tpcc.go:613
		  | main.(*testRunner).runTest.func2
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:733
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (2) monitor failure
		Wraps: (3) unexpected node event: 3: dead (exit status 137)
		Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

	cluster.go:1667,context.go:89,cluster.go:1656,test_runner.go:820: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-4407713-1645167990-42-n12cpu4-geo --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		2: 13607
		1: 14254
		3: dead (exit status 137)
		8: skipped
		12: skipped
		7: 12681
		6: 13220
		5: 11918
		10: 11903
		11: 11914
		9: 11819
		Error: UNCLASSIFIED_PROBLEM: 3: dead (exit status 137)
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  -- stack trace:
		  | main.glob..func14
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1147
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:271
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1889
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:255
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1581
		Wraps: (3) 3: dead (exit status 137)
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

To reproduce, try:

# From https://go.crdb.dev/p/roachstress, perhaps edited lightly.
caffeinate ./roachstress.sh tpccbench/nodes=9/cpu=4/multi-region

Same failure on other branches

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

@AlexTalks AlexTalks added the T-kv KV Team label Feb 18, 2022
@github-actions
Copy link

github-actions bot commented Apr 4, 2022

We have marked this test failure issue as stale because it has been
inactive for 1 month. If this failure is still relevant, removing the
stale label or adding a comment will keep it active. Otherwise,
we'll close it in 5 days to keep the test failure queue tidy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). no-test-failure-activity O-roachtest O-robot Originated from a bot. S-1 High impact: many users impacted, serious risk of high unavailability or data loss T-kv KV Team X-stale
Projects
None yet
Development

No branches or pull requests

4 participants