Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scrub/all-checks/tpcc/w=1000 failed #37017

Closed
cockroach-teamcity opened this issue Apr 23, 2019 · 1 comment · Fixed by #37046
Closed

roachtest: scrub/all-checks/tpcc/w=1000 failed #37017

cockroach-teamcity opened this issue Apr 23, 2019 · 1 comment · Fixed by #37046
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/46f8608c4fe2d94b771beb37bcee19136040fd74

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scrub/all-checks/tpcc/w=1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1253450&tab=buildLog

The test failed on master:
	scrub.go:83,cluster.go:1667,errgroup.go:57: pq: communication error: rpc error: code = Canceled desc = context canceled
	cluster.go:1329,tpcc.go:168,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1253450-scrub-all-checks-tpcc-w-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    4129            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   14m6s     4129            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   14m6s     4129            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   14m6s     4129            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   14m6s     4129            0.0            0.1      0.0      0.0      0.0      0.0 payment
		   14m6s     4129            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		   14m7s     4129            0.0            0.0      0.0      0.0      0.0      0.0 delivery
		   14m7s     4129            0.0            0.0      0.0      0.0      0.0      0.0 newOrder
		   14m7s     4129            0.0            0.2      0.0      0.0      0.0      0.0 orderStatus
		   14m7s     4129            0.0            0.1      0.0      0.0      0.0      0.0 payment
		   14m7s     4129            0.0            0.0      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: Goexit() was called
	test.go:1225: test timed out (11h47m9.981301635s)
	test.go:995,asm_amd64.s:523,panic.go:513,log.go:219,cluster.go:1020,context.go:89,cluster.go:1008,test.go:1191,asm_amd64.s:522,panic.go:397,test.go:785,test.go:771,cluster.go:1688,tpcc.go:178,scrub.go:58,test.go:1237: write /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190422-1253450/scrub/all-checks/tpcc/w=1000/test.log: file already closed

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Apr 23, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Apr 23, 2019
@tbg
Copy link
Member

tbg commented Apr 23, 2019

See #35985

No actionable artifacts because roachtest doesn't collect logs on timeout, and the debug zip failed after 5 minutes.

bdarnell added a commit to bdarnell/cockroach that referenced this issue Apr 23, 2019
The scrub roachtest was previously running tpcc-1000 on a cluster of
12 total vcpus, which is not enough (it needs ~double that). This
exposed a lot of interesting issues like cockroachdb#35986, but it's only
incidental to the main purpose of this test (and it's also flaky due
to uninteresting problems associated with overloading).

Switch the test to tpcc-100 so it can be stable; we'll reintroduce a
test dedicated to overload conditions in the future (when we can make
it stable).

Fixes cockroachdb#35985
Fixes cockroachdb#37017

Release note: None
craig bot pushed a commit that referenced this issue Apr 23, 2019
37046: roachtest: Shrink scrub workloads r=lucy-zhang a=bdarnell

The scrub roachtest was previously running tpcc-1000 on a cluster of
12 total vcpus, which is not enough (it needs ~double that). This
exposed a lot of interesting issues like #35986, but it's only
incidental to the main purpose of this test (and it's also flaky due
to uninteresting problems associated with overloading).

Switch the test to tpcc-100 so it can be stable; we'll reintroduce a
test dedicated to overload conditions in the future (when we can make
it stable).

Fixes #35985
Fixes #37017

Release note: None

Co-authored-by: Ben Darnell <ben@bendarnell.com>
@craig craig bot closed this as completed in #37046 Apr 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants