Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: schemachange/indexrollback/tpcc-1000 failed #33335

Closed
cockroach-teamcity opened this issue Dec 22, 2018 · 19 comments · Fixed by #33703 or #34548
Closed

roachtest: schemachange/indexrollback/tpcc-1000 failed #33335

cockroach-teamcity opened this issue Dec 22, 2018 · 19 comments · Fixed by #33703 or #34548
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/98ef7abf32784b8e837d18d10173ef083010ad45

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1067238&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1067238-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   9m29s       69            2.0            3.6   2952.8   8589.9   8589.9   8589.9 delivery
		   9m29s       69           29.9           32.2   5905.6  85899.3 103079.2 103079.2 newOrder
		   9m29s       69            3.0            3.5    671.1   2415.9   2415.9   2415.9 orderStatus
		   9m29s       69           39.8           27.7  11274.3 103079.2 103079.2 103079.2 payment
		   9m29s       69            5.0            3.5   3758.1   4160.7   4160.7   4160.7 stockLevel
		   9m30s       69            2.0            3.6   4831.8   6979.3   6979.3   6979.3 delivery
		   9m30s       69           77.3           32.3   6979.3 103079.2 103079.2 103079.2 newOrder
		   9m30s       69            4.0            3.5   3892.3   4831.8   4831.8   4831.8 orderStatus
		   9m30s       69           59.2           27.8   5100.3 103079.2 103079.2 103079.2 payment
		   9m30s       69            8.0            3.5    192.9   3489.7   3489.7   3489.7 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: got non-success terminal status "failed"

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Dec 22, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Dec 22, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/cc2bf701b5acea511860a62a85f9e48fc00c7628

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1068402&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1068402-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		Error: read tcp 10.128.0.15:35704->10.128.0.19:26257: read: connection reset by peer
		Error:  exit status 1
		: exit status 1
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2194f363cec5f6930306ba87d57bc5a06ae48f23

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1069090&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1069090-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		l
		   4m20s       11            4.0            2.6   8321.5  18253.6  18253.6  18253.6 delivery
		   4m20s       11           24.0           21.9  12884.9 103079.2 103079.2 103079.2 newOrder
		   4m20s       11            3.0            2.8    637.5    939.5    939.5    939.5 orderStatus
		   4m20s       11           50.1           15.4  19327.4 103079.2 103079.2 103079.2 payment
		   4m20s       11            4.0            2.8    453.0  81604.4  81604.4  81604.4 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   4m21s       11            4.0            2.6   4160.7  11274.3  11274.3  11274.3 delivery
		   4m21s       11           42.9           22.0  22548.6  77309.4 103079.2 103079.2 newOrder
		   4m21s       11            2.0            2.8    260.0    385.9    385.9    385.9 orderStatus
		   4m21s       11           31.9           15.5  36507.2 103079.2 103079.2 103079.2 payment
		   4m21s       11            2.0            2.8    604.0  32212.3  32212.3  32212.3 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: got non-success terminal status "failed"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1da87b86059c282c2b2821d6b5b481f17c89d98e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1069764&tab=buildLog

The test failed on master:
	test.go:1101: test timed out (4h0m0s)
	test.go:703,cluster.go:1137,tpcc.go:122,schemachange.go:375: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1069764-schemachange-indexrollback-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I181226 21:51:25.742709 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 1.36226103s
		I181226 21:51:41.288689 1 workload/tpcc/tpcc.go:290  check 3.3.2.2 took 15.545901678s
		I181226 21:51:50.611145 1 workload/tpcc/tpcc.go:290  check 3.3.2.3 took 9.322404174s
		I181226 21:55:29.395606 1 workload/tpcc/tpcc.go:290  check 3.3.2.4 took 3m38.784387582s
		I181226 21:57:41.472865 1 workload/tpcc/tpcc.go:290  check 3.3.2.5 took 2m12.077199542s
		: signal: killed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3b0ea90eac2d94aba4d96537dbcc175707ebd259

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1070500&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3c80c5e06c4aa5ed25e1cc02b78037f6ec121939

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1072755&tab=buildLog

The test failed on release-2.1:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1072755-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		  0.0      0.0      0.0 newOrder
		  14m29s     2513            0.0            0.5      0.0      0.0      0.0      0.0 orderStatus
		  14m29s     2513            0.0            1.4      0.0      0.0      0.0      0.0 payment
		  14m29s     2513            1.0            0.4    104.9    104.9    104.9    104.9 stockLevel
		E181229 13:59:49.665582 1 workload/cli/run.go:402  error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		  14m30s     2521            0.0            0.5      0.0      0.0      0.0      0.0 delivery
		  14m30s     2521            2.0            2.9    209.7 103079.2 103079.2 103079.2 newOrder
		  14m30s     2521            0.0            0.5      0.0      0.0      0.0      0.0 orderStatus
		  14m30s     2521            0.0            1.4      0.0      0.0      0.0      0.0 payment
		  14m30s     2521            1.0            0.4     48.2     48.2     48.2     48.2 stockLevel
		E181229 13:59:50.890384 1 workload/cli/run.go:402  error in newOrder: ERROR: context deadline exceeded (SQLSTATE XX000)
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a8afcabd753eff930f2f1fef5e79718439fbf65a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1075204&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1075204-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		)
		  23m36s      257            9.0            4.7   6174.0  17179.9  17179.9  17179.9 delivery
		  23m36s      257           64.1           44.9   2952.8  90194.3 103079.2 103079.2 newOrder
		  23m36s      257            5.0            4.7    520.1    838.9    838.9    838.9 orderStatus
		  23m36s      257          102.1           43.4   1040.2 103079.2 103079.2 103079.2 payment
		  23m36s      257            4.0            4.6    234.9  16106.1  16106.1  16106.1 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  23m37s      257            5.0            4.7    159.4   3489.7   3489.7   3489.7 delivery
		  23m37s      257           18.0           44.9   4831.8   8589.9  12348.0  12348.0 newOrder
		  23m37s      257            0.0            4.7      0.0      0.0      0.0      0.0 orderStatus
		  23m37s      257           27.0           43.4   4160.7 103079.2 103079.2 103079.2 payment
		  23m37s      257            0.0            4.6      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: expected 'job canceled' error, but got <nil>

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e873b5faf481817ef0a576b96b627b03baa470ca

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1075912&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1484,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f3cf71db94327abb6a164ad67383c35a696ec7d8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1077687&tab=buildLog

The test failed on release-2.1:
	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1077687-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		           0.4     65.0     75.5     75.5     75.5 delivery
		  12m14s     2561            0.0            2.2      0.0      0.0      0.0      0.0 newOrder
		  12m14s     2561            0.0            0.4      0.0      0.0      0.0      0.0 orderStatus
		  12m14s     2561            3.0            0.7 103079.2 103079.2 103079.2 103079.2 payment
		  12m14s     2561            1.0            0.4 103079.2 103079.2 103079.2 103079.2 stockLevel
		E190103 14:33:09.898238 1 workload/cli/run.go:402  error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		  12m15s     2568            2.0            0.4     71.3     96.5     96.5     96.5 delivery
		  12m15s     2568            3.0            2.2  60129.5 103079.2 103079.2 103079.2 newOrder
		  12m15s     2568            2.0            0.4      8.1     15.7     15.7     15.7 orderStatus
		  12m15s     2568            1.0            0.7 103079.2 103079.2 103079.2 103079.2 payment
		  12m15s     2568            0.0            0.4      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:696,cluster.go:1484,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4f701b9355482c4da8fbf599e34f32699279ae38

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1077670&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1077670-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		     769            2.0            4.0     37.7   1744.8   1744.8   1744.8 stockLevel
		  43m19s      769            5.0            3.9    335.5  16643.0  16643.0  16643.0 delivery
		  43m19s      769           59.1           38.4   1342.2  11274.3 103079.2 103079.2 newOrder
		  43m19s      769            3.0            4.0     26.2    385.9    385.9    385.9 orderStatus
		  43m19s      769          118.2           37.8   1006.6  10737.4 103079.2 103079.2 payment
		  43m19s      769            3.0            4.0    872.4 103079.2 103079.2 103079.2 stockLevel
		  43m20s      769            2.0            3.9     58.7     88.1     88.1     88.1 delivery
		  43m20s      769           21.0           38.4    704.6  25769.8 103079.2 103079.2 newOrder
		  43m20s      769            8.0            4.0    838.9   1677.7   1677.7   1677.7 orderStatus
		  43m20s      769           14.0           37.8     33.6    226.5 103079.2 103079.2 payment
		  43m20s      769            5.0            4.0   1140.9   2952.8   2952.8   2952.8 stockLevel
		: signal: killed
	test.go:696,cluster.go:1484,tpcc.go:120,schemachange.go:375: got non-success terminal status "failed"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/431b1846249fd2d110706ad221504706014e8b70

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1079927&tab=buildLog

The test failed on release-2.1:
	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1079927-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		    3.0            3.4  66572.0  66572.0  66572.0  66572.0 newOrder
		  16m12s     2914            0.0            0.5      0.0      0.0      0.0      0.0 orderStatus
		  16m12s     2914            1.0            1.4 103079.2 103079.2 103079.2 103079.2 payment
		  16m12s     2914            0.0            0.5      0.0      0.0      0.0      0.0 stockLevel
		E190104 13:50:08.857769 1 workload/cli/run.go:402  error in newOrder: ERROR: context deadline exceeded (SQLSTATE XX000)
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  16m13s     2915            0.0            0.5      0.0      0.0      0.0      0.0 delivery
		  16m13s     2915            0.0            3.4      0.0      0.0      0.0      0.0 newOrder
		  16m13s     2915            0.0            0.5      0.0      0.0      0.0      0.0 orderStatus
		  16m13s     2915            0.0            1.4      0.0      0.0      0.0      0.0 payment
		  16m13s     2915            2.0            0.5    285.2 103079.2 103079.2 103079.2 stockLevel
		: signal: killed
	test.go:696,cluster.go:1484,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/99064bf2797cce49e2d56d0324a3344cf2312aca

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1082456&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:110,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1082456-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		  0.0 delivery
		   2m53s       46           17.0           18.5  19327.4 103079.2 103079.2 103079.2 newOrder
		   2m53s       46            3.0            2.6   1140.9   3087.0   3087.0   3087.0 orderStatus
		   2m53s       46           11.0           13.3   8321.5  23622.3  64424.5  64424.5 payment
		   2m53s       46            3.0            2.2   1006.6  62277.0  62277.0  62277.0 stockLevel
		   2m54s       46            4.0            2.2  16106.1  16106.1  16106.1  16106.1 delivery
		   2m54s       46           42.0           18.6  14495.5 103079.2 103079.2 103079.2 newOrder
		   2m54s       46            4.0            2.6     71.3    369.1    369.1    369.1 orderStatus
		   2m54s       46           30.0           13.4   2281.7  81604.4  98784.2  98784.2 payment
		   2m54s       46            3.0            2.2    536.9    906.0    906.0    906.0 stockLevel
		E190106 14:02:21.927343 1 workload/cli/run.go:402  error in payment: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		: signal: killed
	test.go:696,cluster.go:1511,tpcc.go:120,schemachange.go:375: got non-success terminal status "failed"

@vivekmenezes
Copy link
Contributor

I'm looking at this. If I don't make much progress tomorrow I'll disable it since it's a new test that never really got off the ground.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dadcaa493bca02996ebd87c0c85106e566893ce2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1083956&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1511,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/12e28159b1d8b63b56d6a48f22ebbb5c75e8ee5c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1085451&tab=buildLog

The test failed on provisional_201901081731_v2.2.0-alpha-20190114:
	test.go:696,cluster.go:1164,tpcc.go:122,schemachange.go:375: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1085451-schemachange-indexrollback-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190109 07:24:50.168329 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 4.008374315s
		Error: check failed: 3.3.2.1: 3 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

@vivekmenezes
Copy link
Contributor

fixing this is a priority for me

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d0e2f78183d2ce0a6b803127ee80143571c9cd4f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1086790&tab=buildLog

The test failed on release-2.1:
	test.go:696,cluster.go:1164,tpcc.go:110,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1086790-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		 0.0 delivery
		  18m21s     1658            0.0            4.8      0.0      0.0      0.0      0.0 newOrder
		  18m21s     1658            0.0            0.6      0.0      0.0      0.0      0.0 orderStatus
		  18m21s     1658            5.0            2.9 103079.2 103079.2 103079.2 103079.2 payment
		  18m21s     1658            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		E190110 13:36:56.024504 1 workload/cli/run.go:402  error in newOrder: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		  18m22s     1659            1.0            0.6    637.5    637.5    637.5    637.5 delivery
		  18m22s     1659            3.0            4.8   6979.3  15569.3  15569.3  15569.3 newOrder
		  18m22s     1659            0.0            0.6      0.0      0.0      0.0      0.0 orderStatus
		  18m22s     1659            0.0            2.9      0.0      0.0      0.0      0.0 payment
		  18m22s     1659            0.0            0.7      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:696,cluster.go:1511,tpcc.go:120,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d0e2f78183d2ce0a6b803127ee80143571c9cd4f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1088865&tab=buildLog

The test failed on release-2.1:
	test.go:696,cluster.go:1164,tpcc.go:120,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1088865-schemachange-indexrollback-tpcc-1000:5 -- ./workload run tpcc --warehouses=1000 --histograms=logs/stats.json --wait=false --tolerate-errors --ramp=5m0s --duration=2h0m0s {pgurl:1-4} returned:
		stderr:
		
		stdout:
		          0.5    604.0    604.0    604.0    604.0 delivery
		  14m55s     3002            4.0            2.3  12884.9 103079.2 103079.2 103079.2 newOrder
		  14m55s     3002            0.0            0.4      0.0      0.0      0.0      0.0 orderStatus
		  14m55s     3002            4.0            0.9 103079.2 103079.2 103079.2 103079.2 payment
		  14m55s     3002            1.0            0.4  36507.2  36507.2  36507.2  36507.2 stockLevel
		E190111 13:20:42.314617 1 workload/cli/run.go:402  error in newOrder: ERROR: context deadline exceeded (SQLSTATE XX000)
		  14m56s     3005            0.0            0.5      0.0      0.0      0.0      0.0 delivery
		  14m56s     3005            1.0            2.3 103079.2 103079.2 103079.2 103079.2 newOrder
		  14m56s     3005            0.0            0.4      0.0      0.0      0.0      0.0 orderStatus
		  14m56s     3005            0.0            0.9      0.0      0.0      0.0      0.0 payment
		  14m56s     3005            0.0            0.4      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	test.go:696,cluster.go:1511,tpcc.go:130,schemachange.go:375: timeout with current status: "running"

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/766a53defc27940d41ae8987792ecd79e76d2b65

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1090948&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:132,schemachange.go:375: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1090948-schemachange-indexrollback-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190113 15:54:31.365333 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 5.225766269s
		Error: check failed: 3.3.2.1: 3 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jan 14, 2019
Speed up the test by reducing the fraction of data that is processed
by the backfill before issuing a cancel.

Make the test easier to understand.

Wait enough time for the rollback to complete processing.

fixes cockroachdb#33335

Release note: None
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fe6fbbb99f51f414804daaeb704635ee0ff17b28

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=schemachange/indexrollback/tpcc-1000 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1091924&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:132,schemachange.go:375: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1091924-schemachange-indexrollback-tpcc-1000:5 -- ./workload check tpcc --warehouses=1000 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190114 15:37:36.722977 1 workload/tpcc/tpcc.go:290  check 3.3.2.1 took 6.996058334s
		Error: check failed: 3.3.2.1: 10 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

vivekmenezes added a commit to vivekmenezes/cockroach that referenced this issue Jan 16, 2019
Speed up the test by reducing the fraction of data that is processed
by the backfill before issuing a cancel.

Make the test easier to understand.

Wait enough time for the rollback to complete processing.

fixes cockroachdb#33335

Release note: None
craig bot pushed a commit that referenced this issue Jan 16, 2019
33703:  roachtest: fix schema change cancel test r=vivekmenezes a=vivekmenezes

Speed up the test by reducing the fraction of data that is processed
by the backfill before issuing a cancel.

Make the test easier to understand.

Wait enough time for the rollback to complete processing.

fixes #33335

Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
@craig craig bot closed this as completed in #33703 Jan 16, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 5, 2019
Fixes cockroachdb#34025.
Fixes cockroachdb#33624.
Fixes cockroachdb#33335.
Fixes cockroachdb#33151.
Fixes cockroachdb#33149.
Fixes cockroachdb#34159.
Fixes cockroachdb#34293.
Fixes cockroachdb#32813.
Fixes cockroachdb#30886.
Fixes cockroachdb#34228.
Fixes cockroachdb#34321.

It is rare but possible for a replica to become a leaseholder but not
learn about this until it applies a snapshot. Immediately upon the
snapshot application's `ReplicaState` update, the replica will begin
operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger
in-memory side-effects to be performed. This could result in a regression
in the new leaseholder's timestamp cache compared to the previous
leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could
presumably result in other anomalies as well, because all of the
steps in `leasePostApply` were skipped.

This PR fixes this bug by detecting lease updates when applying
snapshots and making sure to react correctly to them. It also likely
fixes the referenced issue. The new test demonstrated that without
this fix, the serializable violation speculated about in the issue
was possible.

Release note (bug fix): Fix bug where lease transfers passed through
Snapshots could forget to update in-memory state on the new leaseholder,
allowing write-skew between read-modify-write operations.
craig bot pushed a commit that referenced this issue Feb 5, 2019
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten

Fixes #34025.
Fixes #33624.
Fixes #33335.
Fixes #33151.
Fixes #33149.
Fixes #34159.
Fixes #34293.
Fixes #32813.
Fixes #30886.
Fixes #34228.
Fixes #34321.

It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)).

This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
2 participants