Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: tpcc/nodes=3/w=max failed #30886

Closed
cockroach-teamcity opened this issue Oct 2, 2018 · 53 comments · Fixed by #34548
Closed

roachtest: tpcc/nodes=3/w=max failed #30886

cockroach-teamcity opened this issue Oct 2, 2018 · 53 comments · Fixed by #34548
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/0f5a289d49105cd317a2713b71546d84c25547f6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=937459&tab=buildLog

The test failed on master:
	test.go:500,cluster.go:890,tpcc.go:78,cluster.go:1199,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-937459-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --duration=120m {pgurl:1-3} --histograms=logs/stats.json returned:
		stderr:
		
		stdout:
		ckLevel
		    3m6s        0            0.0           23.5      0.0      0.0      0.0      0.0 delivery
		    3m6s        0            0.0          221.1      0.0      0.0      0.0      0.0 newOrder
		    3m6s        0            0.0           24.3      0.0      0.0      0.0      0.0 orderStatus
		    3m6s        0            0.0          243.9      0.0      0.0      0.0      0.0 payment
		    3m6s        0            0.0           23.8      0.0      0.0      0.0      0.0 stockLevel
		    3m7s        0            0.0           23.4      0.0      0.0      0.0      0.0 delivery
		    3m7s        0            0.0          219.9      0.0      0.0      0.0      0.0 newOrder
		    3m7s        0            0.0           24.2      0.0      0.0      0.0      0.0 orderStatus
		    3m7s        0            0.0          242.6      0.0      0.0      0.0      0.0 payment
		    3m7s        0            0.0           23.7      0.0      0.0      0.0      0.0 stockLevel
		Error: error in delivery: pq: context deadline exceeded
		Error:  exit status 1
		: exit status 1
	test.go:500,cluster.go:1220,tpcc.go:81,tpcc.go:95: Goexit() was called

@cockroach-teamcity cockroach-teamcity added this to the 2.2 milestone Oct 2, 2018
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot. labels Oct 2, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/f78cc21bb4d7bd3a79e44b4608c04684edf28179

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=939960&tab=buildLog

The test failed on master:
	test.go:500,cluster.go:889,tpcc.go:78,cluster.go:1198,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-939960-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --duration=120m {pgurl:1-3} --histograms=logs/stats.json returned:
		stderr:
		
		stdout:
		ockLevel
		   2m11s        0            0.0           17.4      0.0      0.0      0.0      0.0 delivery
		   2m11s        0            6.0          149.4  33286.0  94489.3  94489.3  94489.3 newOrder
		   2m11s        0            0.0           17.6      0.0      0.0      0.0      0.0 orderStatus
		   2m11s        0            0.0          175.2      0.0      0.0      0.0      0.0 payment
		   2m11s        0            0.0           17.6      0.0      0.0      0.0      0.0 stockLevel
		   2m12s        0            0.0           17.2      0.0      0.0      0.0      0.0 delivery
		   2m12s        0            4.0          148.3  38654.7  51539.6  51539.6  51539.6 newOrder
		   2m12s        0            0.0           17.5      0.0      0.0      0.0      0.0 orderStatus
		   2m12s        0            0.0          173.9      0.0      0.0      0.0      0.0 payment
		   2m12s        0            0.0           17.5      0.0      0.0      0.0      0.0 stockLevel
		Error: error in payment: pq: context deadline exceeded
		Error:  exit status 1
		: exit status 1
	test.go:500,cluster.go:1219,tpcc.go:81,tpcc.go:95: Goexit() was called

@tbg tbg modified the milestones: 2.2, 2.1 Oct 3, 2018
@tbg
Copy link
Member

tbg commented Oct 3, 2018

@nvanbenschoten, could you take a look at this one? The error is an opaque "context deadline exceeded". Weird that this would just come up today (or did we rename tests?)

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Oct 3, 2018

Weird that this would just come up today

I think this just shows that #30804 is actually fixed now and that we're hitting a new issue.

@nvanbenschoten
Copy link
Member

One issue I see is that we're not passing a --ramp flag to workload like we do in tpccbench. I'll try adding that and see if it helps.

@nvanbenschoten
Copy link
Member

I can reproduce the failure without the ramp flag but can't with it. I'll send a fix.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 3, 2018
craig bot pushed a commit that referenced this issue Oct 4, 2018
30941: roachtest: add ramp period to tpcc/nodes=3/w=max r=nvanbenschoten a=nvanbenschoten

Fixes #30886.

Release note: None

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in #30941 Oct 4, 2018
@nvanbenschoten nvanbenschoten reopened this Oct 4, 2018
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6a236533824c797948eaddbd6bedf5e12eee1a13

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=942507&tab=buildLog

The test failed on master:
	test.go:500,cluster.go:897,tpcc.go:97,tpcc.go:101,tpcc.go:131: /home/agent/work/.go/bin/roachprod run teamcity-942507-tpcc-nodes-3-w-max:4 -- ./workload fixtures load tpcc --warehouses=2000 {pgurl:1} returned:
		stderr:
		
		stdout:
		60 ccl/workloadccl/fixture.go:454  loaded warehouse (12m3s, 2000 rows, 0 index entries, 106 KiB)
		I181004 15:51:56.127364 114 ccl/workloadccl/fixture.go:454  loaded item (12m22s, 100000 rows, 0 index entries, 7.8 MiB)
		I181004 15:53:36.148543 63 ccl/workloadccl/fixture.go:454  loaded history (14m2s, 60000000 rows, 120000000 index entries, 7.7 GiB)
		I181004 15:53:40.094688 64 ccl/workloadccl/fixture.go:454  loaded order (14m6s, 60000000 rows, 120000000 index entries, 3.5 GiB)
		I181004 16:19:18.338448 62 ccl/workloadccl/fixture.go:454  loaded customer (39m45s, 60000000 rows, 60000000 index entries, 34 GiB)
		I181004 16:48:35.691290 115 ccl/workloadccl/fixture.go:454  loaded stock (1h9m2s, 200000000 rows, 200000000 index entries, 63 GiB)
		I181004 16:51:19.595154 116 ccl/workloadccl/fixture.go:454  loaded order_line (1h11m46s, 600029271 rows, 600029271 index entries, 44 GiB)
		Error: restoring fixture: PostLoad hook: pq: foreign key requires an existing index on columns ("ol_supply_w_id", "ol_i_id")
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bd9de99ff5569d9c21df027544b4fdb17ffc4a55

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=946966&tab=buildLog

The test failed on master:
	test.go:500,cluster.go:897,tpcc.go:110,cluster.go:1206,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-946966-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		ckLevel
		    7m3s        0           31.8           29.1    268.4    436.2    872.4    872.4 delivery
		    7m3s        0          279.7          281.1    352.3    738.2   1476.4   1543.5 newOrder
		    7m3s        0           34.8           29.2     83.9    453.0    771.8    771.8 orderStatus
		    7m3s        0          253.8          291.6    226.5    570.4   1006.6   1610.6 payment
		    7m3s        0           38.8           29.2    352.3    671.1    771.8    771.8 stockLevel
		    7m4s        0           33.0           29.1    385.9    536.9    738.2    738.2 delivery
		    7m4s        0          258.3          281.1    352.3    771.8   1140.9   1409.3 newOrder
		    7m4s        0           20.0           29.2    104.9    285.2    385.9    385.9 orderStatus
		    7m4s        0          259.3          291.6    243.3    637.5    906.0   1610.6 payment
		    7m4s        0           28.0           29.2    335.5    805.3    805.3    805.3 stockLevel
		Error: error in delivery: pq: context deadline exceeded
		Error:  exit status 1
		: exit status 1
	test.go:500,cluster.go:1227,tpcc.go:117,tpcc.go:131: Goexit() was called

@tbg
Copy link
Member

tbg commented Oct 8, 2018

@nvanbenschoten something new then?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/dc0e73c728e533fdb3bec63e53eec174e920ff22

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=952220&tab=buildLog

The test failed on release-2.1:
	test.go:570,cluster.go:975,cluster.go:1038,tpcc.go:55,tpcc.go:131: /home/agent/work/.go/bin/roachprod run teamcity-952220-tpcc-nodes-3-w-max:1-4 -- sudo umount /mnt/data1 ; sudo mount -o discard,defaults,nobarrier /dev/disk/by-id/google-local-ssd-0 /mnt/data1 returned:
		stderr:
		
		stdout:
		teamcity-952220-tpcc-nodes-3-w-max: sudo umount /mnt/data1 ; su.....................................................................................................................................
		   1: 
		   2: ssh: connect to host 104.198.137.57 port 22: Connection timed out
		exit status 255
		   3: 
		   4: 
		Error:  exit status 255
		: exit status 1

@tbg
Copy link
Member

tbg commented Oct 9, 2018

Last one is a fluke, but the "context deadline exceeded" above might be new.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2215217e8ee38d28a14eb9fd2fe9af8b0b702e7d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stressrace instead of stress and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stress TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-stderr=false -maxtime 20m -timeout 10m'

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=958181&tab=buildLog

The test failed on master:
	test.go:575,cluster.go:987,tpcc.go:110,cluster.go:1309,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-958181-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		ckLevel
		   15m6s        0            0.0           21.3      0.0      0.0      0.0      0.0 delivery
		   15m6s        0            0.0          208.1      0.0      0.0      0.0      0.0 newOrder
		   15m6s        0            0.0           21.2      0.0      0.0      0.0      0.0 orderStatus
		   15m6s        0            0.0          212.7      0.0      0.0      0.0      0.0 payment
		   15m6s        0            0.0           21.3      0.0      0.0      0.0      0.0 stockLevel
		   15m7s        0            0.0           21.2      0.0      0.0      0.0      0.0 delivery
		   15m7s        0            0.0          207.9      0.0      0.0      0.0      0.0 newOrder
		   15m7s        0            0.0           21.1      0.0      0.0      0.0      0.0 orderStatus
		   15m7s        0            0.0          212.5      0.0      0.0      0.0      0.0 payment
		   15m7s        0            0.0           21.3      0.0      0.0      0.0      0.0 stockLevel
		Error: error in delivery: pq: context deadline exceeded
		Error:  exit status 1
		: exit status 1
	test.go:575,cluster.go:1330,tpcc.go:117,tpcc.go:131: Goexit() was called

@jordanlewis
Copy link
Member

I can't remember what the latest with context deadline exceeded is. Do we need more instrumentation to discover what's going on here?

@nvanbenschoten
Copy link
Member

nvanbenschoten commented Oct 11, 2018

I'm hoping it's one of the stalls I'm seeing in #30064. If a write got blocked then eventually everything else would pile on behind it and also block. I'll revisit this issue once that one's fixed.

@jordanlewis
Copy link
Member

Is your hypothesis that the context deadline exceeded is coming from the intent resolver, then? I'll add some instrumentation around that if so. I'm tired of getting these naked errors with no context (no pun intended) - if we're pretty sure these are coming from intents, let's just put it to bed forever by adding instrumentation.

@nvanbenschoten
Copy link
Member

I don't have a real theory about the context deadlines themselves, more about the stall we see right before it. If you feel up to it, looking into the source of the error would be very helpful. Thanks!

@jordanlewis
Copy link
Member

I see this in the logs:

W181011 22:22:48.018862 7883692 storage/intent_resolver.go:745  [n1,s1,r32/1:/Table/56/1/{0/1/0-1999/11}] failed to cleanup transaction intents: failed to resolve intents: the batch experienced mixed success and failure

Suspicious at all?

@jordanlewis
Copy link
Member

There's also this guy:

W181011 22:11:30.612794 2586044 storage/intent_resolver.go:745  [n1,s1] failed to cleanup transaction intents: failed to resolve intents: context deadline exceeded

But I don't think that context deadline exceeded is the same one that gets returned to the user, since it's getting logged by an async task that doesn't return anything.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b0a3aada2869f44bac4cf75e0c4b2d09659767f9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1004113&tab=buildLog

The test failed on master:
	test.go:639,cluster.go:1118,tpcc.go:123,tpcc.go:138: /home/agent/work/.go/bin/roachprod run teamcity-1004113-tpcc-nodes-3-w-max:4 -- ./workload check tpcc --warehouses=1400 {pgurl:1} returned:
		stderr:
		
		stdout:
		I181109 23:03:24.555089 1 workload/tpcc/tpcc.go:291  check 3.3.2.1 took 841.104424ms
		I181109 23:04:12.258550 1 workload/tpcc/tpcc.go:291  check 3.3.2.2 took 47.703329627s
		I181109 23:04:22.439996 1 workload/tpcc/tpcc.go:291  check 3.3.2.3 took 10.181399854s
		: signal: interrupt

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b01468f592f6b12ba147e1c19ea6d8ad7484ff6e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1007980&tab=buildLog

The test failed on release-2.1:
	test.go:639,cluster.go:1118,tpcc.go:111,cluster.go:1440,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-1007980-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		     0            0.0           21.7      0.0      0.0      0.0      0.0 delivery
		  20m58s        0            0.0          214.0      0.0      0.0      0.0      0.0 newOrder
		  20m58s        0            0.0           21.7      0.0      0.0      0.0      0.0 orderStatus
		  20m58s        0            0.0          215.8      0.0      0.0      0.0      0.0 payment
		  20m58s        0            0.0           21.6      0.0      0.0      0.0      0.0 stockLevel
		  20m59s        0            0.0           21.7      0.0      0.0      0.0      0.0 delivery
		  20m59s        0            1.0          213.8  32212.3  32212.3  32212.3  32212.3 newOrder
		  20m59s        0            4.0           21.7  31138.5  31138.5  31138.5  31138.5 orderStatus
		  20m59s        0            4.0          215.6  34359.7  36507.2  36507.2  36507.2 payment
		  20m59s        0            1.0           21.6  31138.5  31138.5  31138.5  31138.5 stockLevel
		Error: error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:639,cluster.go:1461,tpcc.go:121,tpcc.go:138: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/04f39007a9aecf14e0f989400127d8dd16067dbd

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1010528&tab=buildLog

The test failed on release-2.1:
	test.go:639,cluster.go:1159,tpcc.go:111,cluster.go:1481,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-1010528-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		    0           38.1           26.8   2415.9   7247.8   8053.1   8053.1 delivery
		  16m15s        0          253.4          264.8   2818.6   5100.3  11811.2  23622.3 newOrder
		  16m15s        0           27.0           27.1    402.7   1040.2   1040.2   1040.2 orderStatus
		  16m15s        0          270.4          269.6   3087.0   6710.9  36507.2  42949.7 payment
		  16m15s        0           23.0           27.0    771.8   1140.9   1409.3   1409.3 stockLevel
		  16m16s        0           28.0           26.8   1811.9   5368.7   5905.6   5905.6 delivery
		  16m16s        0          354.3          264.8   2281.7   3892.3   4831.8  34359.7 newOrder
		  16m16s        0           23.0           27.1    503.3    939.5    939.5    939.5 orderStatus
		  16m16s        0          258.2          269.6   2415.9   4160.7   6979.3  27917.3 payment
		  16m16s        0           23.0           27.0    436.2   1140.9   1409.3   1409.3 stockLevel
		Error: error in delivery: ERROR: context deadline exceeded (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:639,cluster.go:1502,tpcc.go:121,tpcc.go:138: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/eb8345b19cf6c15b3e4fcb9c156136357d83cb2d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1015656&tab=buildLog

The test failed on master:
	test.go:645,cluster.go:1141,tpcc.go:111,cluster.go:1463,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-1015656-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 1275.1   4295.0   5637.1   5637.1 delivery
		1h29m20s        0          173.0          294.2   1476.4   3355.4   3623.9   3623.9 newOrder
		1h29m20s        0           27.0           29.5     44.0    218.1    251.7    251.7 orderStatus
		1h29m20s        0          298.0          294.8   1543.5   2818.6   3355.4   4160.7 payment
		1h29m20s        0           26.0           29.5    503.3   2080.4   2684.4   2684.4 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h29m21s        0           32.0           29.5   2147.5   6979.3   9126.8   9126.8 delivery
		1h29m21s        0          476.5          294.2   1677.7   3623.9   4563.4   5905.6 newOrder
		1h29m21s        0           10.0           29.5     54.5    268.4    268.4    268.4 orderStatus
		1h29m21s        0          382.6          294.8   1476.4   2818.6   4026.5   5637.1 payment
		1h29m21s        0           15.0           29.5    402.7    973.1   1543.5   1543.5 stockLevel
		Error: error in payment: EOF
		Error:  exit status 1
		: exit status 1
	test.go:645,cluster.go:1484,tpcc.go:121,tpcc.go:138: unexpected node event: 3: dead

@nvanbenschoten
Copy link
Member

panic: inverted range: key lens = [14,9), diff @ index 0

Looks like #32149. Something must have changed recently to trigger this much more easily. I'll try reproducing with #32424.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/cfd53f62f038a06e34570f11ec02bc74dabd2aab

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1021046&tab=buildLog

The test failed on master:
	test.go:645,cluster.go:1141,tpcc.go:111,cluster.go:1463,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-1021046-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		2s        0          200.1          293.8    142.6    352.3    402.7    520.1 newOrder
		  28m52s        0           32.0           29.6     28.3     96.5    113.2    113.2 orderStatus
		  28m52s        0          231.1          295.8    100.7    285.2    335.5    385.9 payment
		  28m52s        0           34.0           29.6    201.3    318.8    335.5    335.5 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		  28m53s        0           25.1           29.5    570.4    973.1   1140.9   1140.9 delivery
		  28m53s        0          307.7          293.8    486.5    771.8    906.0   1275.1 newOrder
		  28m53s        0           32.1           29.6     22.0     58.7     83.9     83.9 orderStatus
		  28m53s        0          222.5          295.8    402.7    671.1    805.3    872.4 payment
		  28m53s        0           37.1           29.6    209.7    385.9    453.0    453.0 stockLevel
		Error: error in delivery: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:645,cluster.go:1484,tpcc.go:121,tpcc.go:138: Goexit() was called

@jordanlewis
Copy link
Member

  Error: error in delivery: ERROR: no inbound stream connection (SQLSTATE XX000)

cc @asubiotto

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/96f03c95d5078ebad7167c5cdb145e365978a008

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1022695&tab=buildLog

The test failed on master:
	test.go:645,cluster.go:1141,tpcc.go:111,cluster.go:1463,errgroup.go:58: /home/agent/work/.go/bin/roachprod run teamcity-1022695-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 0           27.9           29.5    536.9    838.9   1342.2   1342.2 delivery
		1h53m23s        0          307.8          294.7    469.8    738.2    805.3    906.0 newOrder
		1h53m23s        0           30.9           29.5     44.0    134.2    151.0    151.0 orderStatus
		1h53m23s        0          308.8          295.3    402.7    637.5    738.2   1140.9 payment
		1h53m23s        0           32.9           29.5    285.2    486.5    570.4    570.4 stockLevel
		1h53m24s        0           19.0           29.5    520.1    805.3    973.1    973.1 delivery
		1h53m24s        0          204.5          294.6    570.4    973.1   1073.7   1140.9 newOrder
		1h53m24s        0           34.1           29.5     26.2    151.0    234.9    234.9 orderStatus
		1h53m24s        0          195.5          295.3    486.5    805.3   1073.7   1476.4 payment
		1h53m24s        0           27.1           29.5    285.2    503.3    520.1    520.1 stockLevel
		Error: error in delivery: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:645,cluster.go:1484,tpcc.go:121,tpcc.go:138: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/153fcf347cd6a53517f607ec1e6c7c14193640dd

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1052961&tab=buildLog

The test failed on release-2.1:
	test.go:630,test.go:642: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod create teamcity-1052961-tpcc-nodes-3-w-max -n 4 --gce-machine-type=n1-highcpu-16 --gce-zones=us-central1-b,us-west1-b,europe-west2-b returned:
		stderr:
		
		stdout:
		e: Command: gcloud [compute instances create --subnet default --maintenance-policy MIGRATE --scopes default,storage-rw --image ubuntu-1604-xenial-v20181030 --image-project ubuntu-os-cloud --boot-disk-size 10 --boot-disk-type pd-ssd --service-account 21965078311-compute@developer.gserviceaccount.com --local-ssd interface=SCSI --machine-type n1-highcpu-16 --labels lifetime=12h0m0s --metadata-from-file startup-script=/home/agent/temp/buildTmp/gce-startup-script648566580 --project cockroach-ephemeral]
		Output: WARNING: Some requests generated warnings:
		 - The resource 'projects/ubuntu-os-cloud/global/images/ubuntu-1604-xenial-v20181030' is deprecated. A suggested replacement is 'projects/ubuntu-os-cloud/global/images/ubuntu-1604-xenial-v20181204'.
		
		ERROR: (gcloud.compute.instances.create) Could not fetch resource:
		 - The zone 'projects/cockroach-ephemeral/zones/us-central1-b' does not have enough resources available to fulfill the request.  Try a different zone, or try again later.
		
		: exit status 1
		Cleaning up...
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e7dc507fa0ecc7dc5ed597ca5c6cdeb48086428c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1057350&tab=buildLog

The test failed on master:
	test.go:628,cluster.go:1139,tpcc.go:110,cluster.go:1465,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1057350-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 0           41.0           29.0   1140.9   2550.1   3758.1   3758.1 delivery
		 1h0m45s        0          268.0          289.1   1610.6   2818.6   3489.7   4026.5 newOrder
		 1h0m45s        0           36.0           29.0     96.5    318.8    402.7    402.7 orderStatus
		 1h0m45s        0          349.0          290.2   1342.2   2684.4   3355.4   4160.7 payment
		 1h0m45s        0           25.0           29.0    637.5   1073.7   1610.6   1610.6 stockLevel
		 1h0m46s        0           32.0           29.0    906.0   2013.3   2281.7   2281.7 delivery
		 1h0m46s        0          211.2          289.1   1744.8   3087.0   3758.1   4831.8 newOrder
		 1h0m46s        0           24.0           29.0    134.2    671.1    738.2    738.2 orderStatus
		 1h0m46s        0          284.3          290.2   1140.9   2415.9   3489.7   4295.0 payment
		 1h0m46s        0           28.0           29.0    805.3   1140.9   1879.0   1879.0 stockLevel
		Error: error in delivery: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:628,cluster.go:1486,tpcc.go:120,tpcc.go:136: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1da87b86059c282c2b2821d6b5b481f17c89d98e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1069764&tab=buildLog

The test failed on master:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1069764-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		l
		1h50m24s        0           47.0           29.6    436.2    906.0   1677.7   1677.7 delivery
		1h50m24s        0          359.1          295.8    352.3    570.4    704.6   1040.2 newOrder
		1h50m24s        0           26.0           29.6     35.7     71.3     83.9     83.9 orderStatus
		1h50m24s        0          361.1          296.4    285.2    637.5    738.2    973.1 payment
		1h50m24s        0           29.0           29.6    234.9    352.3    436.2    436.2 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h50m25s        0           25.9           29.6    352.3    671.1    704.6    704.6 delivery
		1h50m25s        0          261.4          295.8    369.1    671.1    704.6    738.2 newOrder
		1h50m25s        0           34.9           29.7     25.2     67.1     75.5     75.5 orderStatus
		1h50m25s        0          266.4          296.4    260.0    570.4    671.1    805.3 payment
		1h50m25s        0           29.9           29.6    234.9    536.9    570.4    570.4 stockLevel
		: signal: killed
	test.go:703,cluster.go:1484,tpcc.go:120,tpcc.go:136: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3c80c5e06c4aa5ed25e1cc02b78037f6ec121939

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1072755&tab=buildLog

The test failed on release-2.1:
	test.go:703,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1072755-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		    0            0.0           22.6      0.0      0.0      0.0      0.0 delivery
		  18m49s        0            0.0          222.3      0.0      0.0      0.0      0.0 newOrder
		  18m49s        0            0.0           22.6      0.0      0.0      0.0      0.0 orderStatus
		  18m49s        0            0.0          225.0      0.0      0.0      0.0      0.0 payment
		  18m49s        0            0.0           22.6      0.0      0.0      0.0      0.0 stockLevel
		  18m50s        0            0.0           22.6      0.0      0.0      0.0      0.0 delivery
		  18m50s        0            0.0          222.1      0.0      0.0      0.0      0.0 newOrder
		  18m50s        0            0.0           22.5      0.0      0.0      0.0      0.0 orderStatus
		  18m50s        0            0.0          224.8      0.0      0.0      0.0      0.0 payment
		  18m50s        0            0.0           22.6      0.0      0.0      0.0      0.0 stockLevel
		Error: error in newOrder: ERROR: context deadline exceeded (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:703,cluster.go:1484,tpcc.go:120,tpcc.go:136: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d6adf24cae788d7cd967feadae8e9c0388ce5273

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1079910&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1137,tpcc.go:110,cluster.go:1463,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1079910-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		           30.0           29.6    402.7    637.5    671.1    671.1 delivery
		 1h3m45s        0          334.3          294.9    352.3    604.0    704.6    738.2 newOrder
		 1h3m45s        0           31.0           29.6     35.7     96.5    167.8    167.8 orderStatus
		 1h3m45s        0          302.3          295.9    318.8    486.5    604.0    738.2 payment
		 1h3m45s        0           22.0           29.6    318.8    604.0    704.6    704.6 stockLevel
		 1h3m46s        0           31.9           29.6    570.4    838.9   1073.7   1073.7 delivery
		 1h3m46s        0          206.6          294.9    570.4    838.9    973.1   1073.7 newOrder
		 1h3m46s        0           43.9           29.6     60.8    130.0    151.0    151.0 orderStatus
		 1h3m46s        0          240.5          295.9    436.2    704.6    872.4    973.1 payment
		 1h3m46s        0           29.9           29.6    503.3    838.9    939.5    939.5 stockLevel
		Error: error in stockLevel: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:696,cluster.go:1484,tpcc.go:120,tpcc.go:136: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/99064bf2797cce49e2d56d0324a3344cf2312aca

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1082456&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:110,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1082456-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 0           30.1           29.8    369.1    704.6    738.2    738.2 delivery
		   19m3s        0          275.5          294.0    419.4    671.1    805.3    939.5 newOrder
		   19m3s        0           24.0           29.7     26.2     79.7     96.5     96.5 orderStatus
		   19m3s        0          287.5          297.0    352.3    570.4    771.8    939.5 payment
		   19m3s        0           32.1           29.8    192.9    302.0    318.8    318.8 stockLevel
		   19m4s        0           22.0           29.8    486.5    805.3    906.0    906.0 delivery
		   19m4s        0          317.6          294.0    419.4    704.6   1006.6   1208.0 newOrder
		   19m4s        0           35.0           29.7     35.7    104.9    134.2    134.2 orderStatus
		   19m4s        0          306.7          297.0    419.4    738.2    838.9   1140.9 payment
		   19m4s        0           31.0           29.8    184.5    369.1    738.2    738.2 stockLevel
		Error: error in newOrder: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:696,cluster.go:1511,tpcc.go:120,tpcc.go:136: Goexit() was called

@nvanbenschoten
Copy link
Member

Last few failures have been #31361.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/766a53defc27940d41ae8987792ecd79e76d2b65

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1090948&tab=buildLog

The test failed on master:
	test.go:696,cluster.go:1164,tpcc.go:120,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1090948-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 0           18.0           29.5    352.3    704.6   2147.5   2147.5 delivery
		  52m35s        0          266.0          294.2    402.7    738.2   1275.1   1677.7 newOrder
		  52m35s        0           23.0           29.5     30.4    100.7    117.4    117.4 orderStatus
		  52m35s        0          256.0          295.3    402.7    771.8    939.5   1006.6 payment
		  52m35s        0           23.0           29.5    302.0    570.4    805.3    805.3 stockLevel
		  52m36s        0           33.9           29.5    302.0    637.5    738.2    738.2 delivery
		  52m36s        0          305.3          294.3    385.9    771.8    906.0   1275.1 newOrder
		  52m36s        0           36.9           29.5     19.9     41.9     50.3     50.3 orderStatus
		  52m36s        0          379.1          295.3    369.1    771.8    939.5   1140.9 payment
		  52m36s        0           32.9           29.5    285.2    520.1    604.0    604.0 stockLevel
		Error: error in delivery: ERROR: no inbound stream connection (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:696,cluster.go:1511,tpcc.go:130,tpcc.go:146: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3f22b976765a7cc58b97a9fd9866f2f26493c987

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1097785&tab=buildLog

The test failed on release-2.1:
	test.go:713,cluster.go:1164,tpcc.go:120,cluster.go:1490,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1097785-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		     0            0.0           22.3      0.0      0.0      0.0      0.0 delivery
		  19m30s        0            0.0          218.4      0.0      0.0      0.0      0.0 newOrder
		  19m30s        0            0.0           22.3      0.0      0.0      0.0      0.0 orderStatus
		  19m30s        0            0.0          221.6      0.0      0.0      0.0      0.0 payment
		  19m30s        0            0.0           22.2      0.0      0.0      0.0      0.0 stockLevel
		  19m31s        0            0.0           22.3      0.0      0.0      0.0      0.0 delivery
		  19m31s        0            5.0          218.2  22548.6  22548.6  22548.6  22548.6 newOrder
		  19m31s        0            0.0           22.3      0.0      0.0      0.0      0.0 orderStatus
		  19m31s        0           18.1          221.4  22548.6  38654.7  42949.7  42949.7 payment
		  19m31s        0            1.0           22.2  22548.6  22548.6  22548.6  22548.6 stockLevel
		Error: error in payment: ERROR: context deadline exceeded (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:713,cluster.go:1511,tpcc.go:130,tpcc.go:146: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/94dcfdef9e9fff49bf85900c1c73603073cb9fab

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1101244&tab=buildLog

The test failed on release-2.1:
	test.go:727,cluster.go:1203,tpcc.go:118,cluster.go:1541,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1101244-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		  1476.4   2147.5   2147.5 delivery
		1h23m10s        0          244.6          246.9    939.5  13421.8  60129.5 103079.2 newOrder
		1h23m10s        0           27.1           24.7     30.4    838.9    838.9    838.9 orderStatus
		1h23m10s        0          315.8          247.6   1073.7   3623.9  31138.5 103079.2 payment
		1h23m10s        0           19.0           24.8    939.5   1543.5   5637.1   5637.1 stockLevel
		1h23m11s        0           23.0           24.8    503.3   1208.0   1409.3   1409.3 delivery
		1h23m11s        0          198.8          246.9   1677.7  15032.4  47244.6 103079.2 newOrder
		1h23m11s        0           30.0           24.7     39.8    872.4    872.4    872.4 orderStatus
		1h23m11s        0          196.8          247.6   1140.9  13421.8  49392.1 103079.2 payment
		1h23m11s        0           32.0           24.8   1006.6   1946.2  10737.4  10737.4 stockLevel
		Error: error in newOrder: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:727,cluster.go:1562,tpcc.go:128,tpcc.go:144: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9f084ad576e85756c5c5a7e41335d9aa2d3eee30

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1112101&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1195,tpcc.go:118,cluster.go:1533,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1112101-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		 98784.2 103079.2 103079.2 delivery
		  35m10s        0          151.3          271.4   3355.4   6442.5   8589.9  11811.2 newOrder
		  35m10s        0           26.0           27.3    318.8    486.5    503.3    503.3 orderStatus
		  35m10s        0          208.4          272.9   1610.6   6710.9  11274.3  17179.9 payment
		  35m10s        0           25.0           27.4    704.6   3355.4   4026.5   4026.5 stockLevel
		  35m11s        0           13.0           27.0  13958.6 103079.2 103079.2 103079.2 delivery
		  35m11s        0          231.6          271.3   4026.5   6710.9   8589.9  12348.0 newOrder
		  35m11s        0           20.1           27.3    570.4    771.8   1208.0   1208.0 orderStatus
		  35m11s        0          157.4          272.8   1744.8   5100.3   7784.6  30064.8 payment
		  35m11s        0           23.1           27.4    973.1   3355.4   3489.7   3489.7 stockLevel
		Error: error in delivery: ERROR: TransactionStatusError: transaction deadline exceeded (REASON_UNKNOWN) (SQLSTATE XX000)
		Error:  exit status 1
		: exit status 1
	test.go:743,cluster.go:1554,tpcc.go:128,tpcc.go:144: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e10fb557b11b5ff1b8609aa963da23c37a1143c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1113854&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:118,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1113854-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		2212.3  34359.7 newOrder
		   14m4s        0           18.3           26.9    369.1    973.1   1006.6   1006.6 orderStatus
		   14m4s        0          160.4          267.8   2281.7  24696.1  40802.2  51539.6 payment
		   14m4s        0           28.4           27.0    771.8   3758.1   7247.8   7247.8 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   14m5s        0           27.1           27.0   1208.0   2818.6   2952.8   2952.8 delivery
		   14m5s        0          315.3          264.8   1811.9  15032.4  26843.5  38654.7 newOrder
		   14m5s        0           22.1           26.9    151.0    973.1   1006.6   1006.6 orderStatus
		   14m5s        0          306.3          267.8   1811.9  13958.6  40802.2  73014.4 payment
		   14m5s        0           23.1           27.0    637.5   6710.9  11811.2  11811.2 stockLevel
		Error: error in newOrder: ERROR: duplicate key value (o_w_id,o_d_id,o_id)=(295,2,3022) violates unique constraint "primary" (SQLSTATE 23505)
		Error:  exit status 1
		: exit status 1
	test.go:743,cluster.go:1585,tpcc.go:128,tpcc.go:144: Goexit() was called

@nvanbenschoten
Copy link
Member

Last failure was the same as #34025.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/395d842feb97c5bd8cad2b32b71a5156c03061eb

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1115923&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:118,cluster.go:1564,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1115923-tpcc-nodes-3-w-max:4 -- ./workload run tpcc --warehouses=1400 --histograms=logs/stats.json  --ramp=5m0s --duration=2h0m0s {pgurl:1-3} returned:
		stderr:
		
		stdout:
		348.0  34359.7 newOrder
		1h37m36s        0           26.0           29.2    104.9    285.2    318.8    318.8 orderStatus
		1h37m36s        0          200.1          291.6   1610.6  13421.8  34359.7  47244.6 payment
		1h37m36s        0           35.0           29.2    453.0   1409.3   2281.7   2281.7 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		1h37m37s        0           25.9           29.2   1476.4   6174.0  10200.5  10200.5 delivery
		1h37m37s        0          252.9          291.1   1409.3   4831.8  10737.4  28991.0 newOrder
		1h37m37s        0           23.9           29.2    100.7    285.2    302.0    302.0 orderStatus
		1h37m37s        0          332.6          291.6   1342.2   9126.8  23622.3  36507.2 payment
		1h37m37s        0           21.9           29.2    402.7   1275.1   1342.2   1342.2 stockLevel
		Error: error in newOrder: ERROR: duplicate key value (o_w_id,o_d_id,o_id)=(949,10,3119) violates unique constraint "primary" (SQLSTATE 23505)
		Error:  exit status 1
		: exit status 1
	test.go:743,cluster.go:1585,tpcc.go:128,tpcc.go:144: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b9bd958fccddc699d47eccbbec80db75c10eab46

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpcc/nodes=3/w=max PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1122796&tab=buildLog

The test failed on master:
	test.go:743,cluster.go:1226,tpcc.go:130,tpcc.go:144: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1122796-tpcc-nodes-3-w-max:4 -- ./workload check tpcc --warehouses=1400 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190204 19:25:13.990792 1 workload/tpcc/tpcc.go:288  check 3.3.2.1 took 659.450086ms
		Error: check failed: 3.3.2.1: 1 rows returned, expected zero
		Error:  exit status 1
		: exit status 1

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Feb 5, 2019
Fixes cockroachdb#34025.
Fixes cockroachdb#33624.
Fixes cockroachdb#33335.
Fixes cockroachdb#33151.
Fixes cockroachdb#33149.
Fixes cockroachdb#34159.
Fixes cockroachdb#34293.
Fixes cockroachdb#32813.
Fixes cockroachdb#30886.
Fixes cockroachdb#34228.
Fixes cockroachdb#34321.

It is rare but possible for a replica to become a leaseholder but not
learn about this until it applies a snapshot. Immediately upon the
snapshot application's `ReplicaState` update, the replica will begin
operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger
in-memory side-effects to be performed. This could result in a regression
in the new leaseholder's timestamp cache compared to the previous
leaseholder, allowing write-skew like we saw in cockroachdb#34025. This could
presumably result in other anomalies as well, because all of the
steps in `leasePostApply` were skipped.

This PR fixes this bug by detecting lease updates when applying
snapshots and making sure to react correctly to them. It also likely
fixes the referenced issue. The new test demonstrated that without
this fix, the serializable violation speculated about in the issue
was possible.

Release note (bug fix): Fix bug where lease transfers passed through
Snapshots could forget to update in-memory state on the new leaseholder,
allowing write-skew between read-modify-write operations.
craig bot pushed a commit that referenced this issue Feb 5, 2019
34548: storage: apply lease change side-effects on snapshot recipients r=nvanbenschoten a=nvanbenschoten

Fixes #34025.
Fixes #33624.
Fixes #33335.
Fixes #33151.
Fixes #33149.
Fixes #34159.
Fixes #34293.
Fixes #32813.
Fixes #30886.
Fixes #34228.
Fixes #34321.

It is rare but possible for a replica to become a leaseholder but not learn about this until it applies a snapshot. Immediately upon the snapshot application's `ReplicaState` update, the replica will begin operating as a standard leaseholder.

Before this change, leases acquired in this way would not trigger in-memory side-effects to be performed. This could result in a regression in the new leaseholder's timestamp cache compared to the previous leaseholder's cache, allowing write-skew like we saw in #34025. This could presumably result in other anomalies as well, because all of the steps in `leasePostApply` were skipped (as theorized by #34025 (comment)).

This PR fixes this bug by detecting lease updates when applying snapshots and making sure to react correctly to them. It also likely fixes the referenced issue. The new test demonstrates that without this fix, the serializable violation speculated about in the issue was possible.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
@craig craig bot closed this as completed in #34548 Feb 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants