Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: scaledata/filesystem_simulator/nodes=3 failed #36981

Closed
cockroach-teamcity opened this issue Apr 21, 2019 · 48 comments
Closed

roachtest: scaledata/filesystem_simulator/nodes=3 failed #36981

cockroach-teamcity opened this issue Apr 21, 2019 · 48 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/df200cbf3f407dbf349aa601ff9036b4dff88e83

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1252822&tab=buildLog

The test failed on release-2.1:
	cluster.go:1688,scaledata.go:126,scaledata.go:53,test.go:1237: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1252822-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.5:26257,10.142.0.117:26257,10.142.0.116:26257'  returned:
		stderr:
		6:34 RobustDB.RandomDB chose DB at index 1
		2019/04/21 11:46:34 Created file 0_624 with uuid 683e1ab4-be75-4d75-8406-569692813cc1 and parent /default
		2019/04/21 11:46:34 RobustDB.RandomDB chose DB at index 2
		2019/04/21 11:50:22 ExecuteTx retry attempt 2 failed, started at 2019-04-21 11:44:18.635267946 +0000 UTC m=+210.876186714, now = 2019-04-21 11:50:22.694290749 +0000 UTC m=+574.935209571, took 6m4.059022857s
		2019/04/21 11:50:22 Attempt failed with error driver: bad connection: ... Retrying after sleeping 10ns
		2019/04/21 11:50:22 ExecuteTx retry attempt 2 failed, started at 2019-04-21 11:44:18.635129135 +0000 UTC m=+210.876047904, now = 2019-04-21 11:50:22.694318724 +0000 UTC m=+574.935237582, took 6m4.059189678s
		2019/04/21 11:50:22 Attempt failed with error driver: bad connection: ... Retrying after sleeping 10ns
		2019/04/21 11:50:22 Aborting Retries because retry duration of 300 seconds expired : *errors.errorString : driver: bad connection
		2019/04/21 11:50:22 driver: bad connection
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Apr 21, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Apr 21, 2019
@nvanbenschoten
Copy link
Member

2019/04/21 11:50:22 ExecuteTx retry attempt 2 failed, started at 2019-04-21 11:44:18.635267946 +0000 UTC m=+210.876186714, now = 2019-04-21 11:50:22.694290749 +0000 UTC m=+574.935209571, took 6m4.059022857s
2019/04/21 11:50:22 Attempt failed with error driver: bad connection: ... Retrying after sleeping 10ns
2019/04/21 11:50:22 Aborting Retries because retry duration of 300 seconds expired : *errors.errorString : driver: bad connection

Looks like a transaction got stuck for 6 minutes doing ... something.

@tbg
Copy link
Member

tbg commented Apr 23, 2019

The debug zip is not super useful because it's 2.1, unfortunately.

@tbg
Copy link
Member

tbg commented Apr 23, 2019

@nvanbenschoten what do you think we should do here? The txn started at 11:44:18.

From the log file dates we can see that the nodes were (stopped and) restarted as follows:

n1 at 11:44:50, 11:46:51, 11:48:52
n2 at 11:50:26 (down around 11:50:16)
n3 never down

The 6 minute txn couldn't have run against n1 or it would've been forced to retry at 11:46:51 or before. So it was likely stuck on n2 and got unstuck ("bad connection) when n2 was taken down. Needless to say, there's nothing in the logs on n2.

image

So we have some statement that sometimes gets stuck forever. In fact we have precisely two such statements that started at around the same time (right?). Is there some deadlock? Who knows. What can we do to find out?

cc @andreimatei

@nvanbenschoten
Copy link
Member

The symptoms here look similar to #32204 (see #32204 (comment)).

Interestingly, the stuck request began at 11:44:18.635267946 and we see from the CHAOS log that node 1 was killed at the same time CHAOS: 11:44:18 chaos.go:94: killing :1. It's likely that we're seeing some bad interaction between n1 being killed and an RPC being addressed to n1.

In #32204 we saw a number of requests get stuck for over 2 minutes. I'm going to drop the timeout on these retries to 2 minutes and see if I can more easily reproduce what we're seeing here.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/84dc682eca4b11e6abaf390fc8883f32afe81fb4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1283539&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cluster.go:1833,scaledata.go:126,scaledata.go:53,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1283539-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.88:26257,10.142.0.51:26257,10.142.0.94:26257'  returned:
		stderr:
		 Retrying after sleeping 5ns
		2019/05/10 13:28:46 ExecuteTx retry attempt 1 failed, started at 2019-05-10 13:28:46.064586877 +0000 UTC m=+453.302680388, now = 2019-05-10 13:28:46.697410051 +0000 UTC m=+453.935503563, took 632.823175ms
		2019/05/10 13:28:46 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/05/10 13:28:46 ExecuteTx retry attempt 1 failed, started at 2019-05-10 13:28:44.739367961 +0000 UTC m=+451.977461511, now = 2019-05-10 13:28:46.697489685 +0000 UTC m=+453.935583221, took 1.95812171s
		2019/05/10 13:28:46 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/05/10 13:28:46 ExecuteTx retry attempt 1 failed, started at 2019-05-10 13:28:44.772476247 +0000 UTC m=+452.010569759, now = 2019-05-10 13:28:46.697537666 +0000 UTC m=+453.935631195, took 1.925061436s
		2019/05/10 13:28:46 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/05/10 13:28:46 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ba5c092a726134b73e789c2047f7ec151be7c1a1

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1288263&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1833,scaledata.go:126,scaledata.go:53,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1288263-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.18:26257,10.142.0.17:26257,10.142.0.19:26257'  returned:
		stderr:
		y 0
		debug3: receive packet: type 96
		debug2: channel 0: rcvd eof
		debug2: channel 0: output open -> drain
		debug2: channel 0: obuf empty
		debug2: channel 0: close_write
		debug2: channel 0: output drain -> closed
		debug3: receive packet: type 97
		debug2: channel 0: rcvd close
		debug3: channel 0: will not send data after close
		debug2: channel 0: almost dead
		debug2: channel 0: gc: notify user
		debug2: channel 0: gc: user detached
		debug2: channel 0: send close
		debug3: send packet: type 97
		debug2: channel 0: is dead
		debug2: channel 0: garbage collecting
		debug1: channel 0: free: client-session, nchannels 1
		debug3: channel 0: status: The following connections are open:
		  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cc -1)
		
		debug3: send packet: type 1
		debug1: fd 0 clearing O_NONBLOCK
		debug1: fd 1 clearing O_NONBLOCK
		debug1: fd 2 clearing O_NONBLOCK
		Transferred: sent 4512, received 8250036 bytes, in 332.8 seconds
		Bytes per second: sent 13.6, received 24788.5
		debug1: Exit status 255
		: exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d01a95b1ee71dfb36eed374619a8ed30de057ed2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312970&tab=buildLog

The test failed on branch=release-2.1, cloud=gce:
	test.go:1237: test timed out (20m0s)
	cluster.go:1875,scaledata.go:126,scaledata.go:53,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1312970-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.201:26257,10.142.0.199:26257,10.142.1.1:26257'  returned:
		stderr:
		DB.RandomDB chose DB at index 1
		2019/05/29 12:20:59 ExecuteTx retry attempt 1 failed, started at 2019-05-29 12:20:59.264418163 +0000 UTC m=+595.664748625, now = 2019-05-29 12:20:59.264769358 +0000 UTC m=+595.665099864, took 351.239µs
		2019/05/29 12:20:59 Attempt failed with error dial tcp 10.142.0.199:26257: connect: connection refused: ... Retrying after sleeping 5ns
		2019/05/29 12:20:59 RobustDB.RandomDB chose DB at index 1
		2019/05/29 12:20:59 ExecuteTx retry attempt 2 failed, started at 2019-05-29 12:20:59.265132676 +0000 UTC m=+595.665463158, now = 2019-05-29 12:20:59.265661383 +0000 UTC m=+595.665991877, took 528.719µs
		2019/05/29 12:20:59 Attempt failed with error dial tcp 10.142.0.199:26257: connect: connection refused: ... Retrying after sleeping 10ns
		2019/05/29 12:20:59 RobustDB.RandomDB chose DB at index 2
		2019/05/29 12:20:59 Consistency Test 13_310 @ 1559132456723734365.0000000000: sizes :- files - 9001, childRelations - 9000, stripes - 1424
		2019/05/29 12:20:59 RobustDB.RandomDB chose DB at index 0
		
		stdout:
		: signal: killed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e6366f3ac39652a763f38948fccf4b2dab363034

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1347608&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cluster.go:1872,scaledata.go:123,scaledata.go:50,test.go:1251: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1347608-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.17:26257,10.142.0.3:26257,10.142.0.5:26257'  returned:
		stderr:
		 1560951724407066756.0000000000: sizes :- files - 11531, childRelations - 11530, stripes - 1807
		2019/06/19 13:42:05 Consistency Test 12_601 @ 1560951724475987370.0000000000: sizes :- files - 11530, childRelations - 11529, stripes - 1806
		2019/06/19 13:42:05 RobustDB.RandomDB chose DB at index 2
		2019/06/19 13:42:05 RobustDB.RandomDB chose DB at index 0
		2019/06/19 13:42:05 RobustDB.RandomDB chose DB at index 2
		2019/06/19 13:42:05 Consistency Test 9_597 @ 1560951724387137665.0000000000: sizes :- files - 11531, childRelations - 11530, stripes - 1807
		2019/06/19 13:42:05 ExecuteTx retry attempt 1 failed, started at 2019-06-19 13:42:04.438913546 +0000 UTC m=+574.264108901, now = 2019-06-19 13:42:05.684488901 +0000 UTC m=+575.509684309, took 1.245575408s
		2019/06/19 13:42:05 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/06/19 13:42:05 unexpected EOF
		Error:  ssh verbose log retained in /root/.roachprod/debug/ssh_34.74.202.225_2019-06-19T13:32:29Z: exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/86154ae6ae36e286883d8a6c9a4111966198201d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1367379&tab=buildLog

The test failed on branch=master, cloud=gce:
	cluster.go:1870,scaledata.go:121,scaledata.go:48,test.go:1249: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1367379-scaledata-filesystem-simulator-nodes-3:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.142.0.15:26257,10.142.0.8:26257,10.142.0.41:26257'  returned:
		stderr:
		.457194489, took 654.652916ms
		2019/06/30 19:00:32 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/06/30 19:00:32 ExecuteTx retry attempt 1 failed, started at 2019-06-30 19:00:31.880155526 +0000 UTC m=+331.788542347, now = 2019-06-30 19:00:32.548852588 +0000 UTC m=+332.457239415, took 668.697068ms
		2019/06/30 19:00:32 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/06/30 19:00:32 ExecuteTx retry attempt 1 failed, started at 2019-06-30 19:00:31.808302889 +0000 UTC m=+331.716689701, now = 2019-06-30 19:00:32.548904379 +0000 UTC m=+332.457291250, took 740.601549ms
		2019/06/30 19:00:32 RobustDB.RandomDB chose DB at index 0
		2019/06/30 19:00:32 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/06/30 19:00:32 unexpected EOF
		2019/06/30 19:00:32 RobustDB.RandomDB chose DB at index 0
		Error:  ssh verbose log retained in /root/.roachprod/debug/ssh_34.74.0.7_2019-06-30T18:54:59Z: exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1ad0ecc8cbddf82c9fedb5a5c5e533e72a657ff7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1399000&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190722-1399000/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2090,scaledata.go:121,scaledata.go:48,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563776264-06-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.72:26257,10.128.0.62:26257,10.128.0.29:26257'  returned:
		stderr:
		.. Retrying after sleeping 5ns
		2019/07/22 06:50:47 ExecuteTx retry attempt 1 failed, started at 2019-07-22 06:50:47.010759229 +0000 UTC m=+332.335343150, now = 2019-07-22 06:50:47.647704 +0000 UTC m=+332.972287938, took 636.944788ms
		2019/07/22 06:50:47 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/07/22 06:50:47 ExecuteTx retry attempt 1 failed, started at 2019-07-22 06:50:47.498521727 +0000 UTC m=+332.823105663, now = 2019-07-22 06:50:47.648028424 +0000 UTC m=+332.972612381, took 149.506718ms
		2019/07/22 06:50:47 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/07/22 06:50:47 ExecuteTx retry attempt 1 failed, started at 2019-07-22 06:50:45.965662497 +0000 UTC m=+331.290246460, now = 2019-07-22 06:50:47.648282637 +0000 UTC m=+332.972866739, took 1.682620279s
		2019/07/22 06:50:47 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/07/22 06:50:47 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7111a67b2ea3a19c2f312f8d214b8823f431cac0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190723-1400942/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2090,scaledata.go:121,scaledata.go:48,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563862417-03-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.17:26257,10.128.0.41:26257,10.128.0.14:26257'  returned:
		stderr:
		on. Original error: pq: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1563863913.106241633,1 too old; wrote at 1563863913.164570119,1.: ... Retrying after sleeping 5ns
		2019/07/23 06:38:33 ExecuteTx retry attempt 1 failed, started at 2019-07-23 06:38:32.008582198 +0000 UTC m=+575.552779285, now = 2019-07-23 06:38:33.328151393 +0000 UTC m=+576.872348488, took 1.319569203s
		2019/07/23 06:38:33 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/07/23 06:38:33 ExecuteTx retry attempt 1 failed, started at 2019-07-23 06:38:31.727445209 +0000 UTC m=+575.271642284, now = 2019-07-23 06:38:33.328166913 +0000 UTC m=+576.872364011, took 1.600721727s
		2019/07/23 06:38:33 RobustDB.RandomDB chose DB at index 0
		2019/07/23 06:38:33 RobustDB.RandomDB chose DB at index 2
		2019/07/23 06:38:33 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/07/23 06:38:33 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/3b9a95bd7eb2cfa6d544fe7217852a85ec3b76f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1422703&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190805-1422703/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2090,scaledata.go:121,scaledata.go:48,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564984076-01-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.72:26257,10.128.0.101:26257,10.128.0.118:26257'  returned:
		stderr:
		omDB chose DB at index 1
		2019/08/05 06:12:19 Consistency Test 0_346 @ 1564985538570874798.0000000000: sizes :- files - 16937, childRelations - 16936, stripes - 2750
		2019/08/05 06:12:19 ExecuteTx retry attempt 2 failed, started at 2019-08-05 06:12:19.51664064 +0000 UTC m=+576.677438922, now = 2019-08-05 06:12:19.520841831 +0000 UTC m=+576.681640135, took 4.201213ms
		2019/08/05 06:12:19 Attempt failed with error dial tcp 10.128.0.72:26257: connect: connection refused: ... Retrying after sleeping 10ns
		2019/08/05 06:12:19 Consistency Test 5_342 @ 1564985538547010775.0000000000: sizes :- files - 16937, childRelations - 16936, stripes - 2750
		2019/08/05 06:12:19 ExecuteTx retry attempt 1 failed, started at 2019-08-05 06:12:18.591647382 +0000 UTC m=+575.752445673, now = 2019-08-05 06:12:19.52688076 +0000 UTC m=+576.687679102, took 935.233429ms
		2019/08/05 06:12:19 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/08/05 06:12:19 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/98d6832e9f9edb7e554aaa90d9d4296bb00af16e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1433695&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190810-1433695/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2099,scaledata.go:121,scaledata.go:48,test_runner.go:691: unexpected node event: 1: dead

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/40f8f0eb00f4b3bf5bac11fb5ae132e33a492713

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1452154&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190824-1452154/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2104,scaledata.go:121,scaledata.go:48,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566627477-04-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.60:26257,10.128.0.59:26257,10.128.0.14:26257'  returned:
		stderr:
		. Retrying after sleeping 5ns
		2019/08/24 06:48:09 ExecuteTx retry attempt 1 failed, started at 2019-08-24 06:48:07.329403959 +0000 UTC m=+452.667644449, now = 2019-08-24 06:48:09.561763712 +0000 UTC m=+454.900004229, took 2.23235978s
		2019/08/24 06:48:09 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/08/24 06:48:09 ExecuteTx retry attempt 1 failed, started at 2019-08-24 06:48:09.375255534 +0000 UTC m=+454.713496010, now = 2019-08-24 06:48:09.559975321 +0000 UTC m=+454.898215823, took 184.719813ms
		2019/08/24 06:48:09 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/08/24 06:48:09 ExecuteTx retry attempt 1 failed, started at 2019-08-24 06:48:07.338730521 +0000 UTC m=+452.676970997, now = 2019-08-24 06:48:09.559922079 +0000 UTC m=+454.898162587, took 2.22119159s
		2019/08/24 06:48:09 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/08/24 06:48:09 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/474795f33383e562e500ad71a774ff7ba92ae3c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1485061&tab=buildLog

The test failed on branch=release-2.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190913-1485061/scaledata/filesystem_simulator/nodes=3/run_1
	test_runner.go:703: test timed out (20m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/6b14c0aa3ed1b4ba6d5f937e9352c5383afe1c37

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1502387&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=provisional_201909231358_v19.1.5, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190923-1502387/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2114,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1569266905-01-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.37:26257,10.128.0.88:26257,10.128.0.91:26257'  returned:
		stderr:
		ions - 11597, stripes - 1786
		2019/09/23 19:40:14 ExecuteTx retry attempt 1 failed, started at 2019-09-23 19:40:12.824115688 +0000 UTC m=+453.685858725, now = 2019-09-23 19:40:14.006443618 +0000 UTC m=+454.868186696, took 1.182327971s
		2019/09/23 19:40:14 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/09/23 19:40:14 ExecuteTx retry attempt 1 failed, started at 2019-09-23 19:40:12.91118358 +0000 UTC m=+453.772926620, now = 2019-09-23 19:40:14.006774179 +0000 UTC m=+454.868517248, took 1.095590628s
		2019/09/23 19:40:14 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/09/23 19:40:14 ExecuteTx retry attempt 1 failed, started at 2019-09-23 19:40:12.321869423 +0000 UTC m=+453.183612461, now = 2019-09-23 19:40:14.006963946 +0000 UTC m=+454.868706992, took 1.685094531s
		2019/09/23 19:40:14 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/09/23 19:40:14 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/77f26d185efb436aaac88243de19a27caa5da9b6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1509340&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190926-1509340/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2143,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1569512843-10-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.41:26257,10.128.0.33:26257,10.128.0.129:26257'  returned:
		stderr:
		235, now = 2019-09-26 16:01:00.089626807 +0000 UTC m=+576.572012901, took 1.793554666s
		2019/09/26 16:01:00 Attempt failed with error restarting txn failed. ROLLBACK TO SAVEPOINT encountered error: driver: bad connection. Original error: pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1569513658.314828851,0 encountered previous write with future timestamp 1569513658.341324679,0 within uncertainty interval `t <= 1569513658.814828851,0`; observed timestamps: [{1 1569513658.314828851,0} {2 1569513660.060280462,0} {3 1569513658.319108619,0}].: ... Retrying after sleeping 5ns
		2019/09/26 16:01:00 ExecuteTx retry attempt 1 failed, started at 2019-09-26 16:00:59.163764562 +0000 UTC m=+575.646150626, now = 2019-09-26 16:01:00.089615935 +0000 UTC m=+576.572002033, took 925.851407ms
		2019/09/26 16:01:00 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/09/26 16:01:00 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@jordanlewis
Copy link
Member

  2019/09/26 16:01:00 Attempt failed with error restarting txn failed. ROLLBACK TO SAVEPOINT encountered error: driver: bad connection. Original error: pq: restart transaction: TransactionRetryWithProtoRefreshError: ReadWithinUncertaintyIntervalError: read at time 1569513658.314828851,0 encountered previous write with future timestamp 1569513658.341324679,0 within uncertainty interval `t <= 1569513658.814828851,0`; observed timestamps: [{1 1569513658.314828851,0} {2 1569513660.060280462,0} {3 1569513658.319108619,0}].: ... Retrying after sleeping 5ns

@ajwerner ajwerner mentioned this issue Sep 26, 2019
18 tasks
@tbg
Copy link
Member

tbg commented Sep 27, 2019

I think this is the real error here: 2019/09/26 16:01:00 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF

@tbg
Copy link
Member

tbg commented Sep 27, 2019

BTW the logs have this

E190926 16:01:00.089905 251199 sql/sqltelemetry/report.go:56  [n3,client=10.128.0.49:50036,user=root] encountered internal error:
assertion failure
  - error with attached stack trace:
    github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror.CatchVectorizedRuntimeError.func1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror/error.go:70
    runtime.gopanic
    	/usr/local/go/src/runtime/panic.go:522
    github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror.VectorizedInternalPanic
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror/error.go:155
    github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc.(*Inbox).Next.func2
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/inbox.go:283
    runtime.gopanic
    	/usr/local/go/src/runtime/panic.go:522
    github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror.VectorizedInternalPanic
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror/error.go:155
    github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc.(*Inbox).Next
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colflow/colrpc/inbox.go:316
    github.com/cockroachdb/cockroach/pkg/sql/colexec.(*UnorderedSynchronizer).init.func1.1
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/unorderedsynchronizer.go:138
    github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror.CatchVectorizedRuntimeError
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/execerror/error.go:91
    github.com/cockroachdb/cockroach/pkg/sql/colexec.(*UnorderedSynchronizer).init.func2
    	/go/src/github.com/cockroachdb/cockroach/pkg/sql/colexec/unorderedsynchronizer.go:158
    runtime.goexit
    	/usr/local/go/src/runtime/asm_amd64.s:1337
  - error with embedded safe details: unexpected error from the vectorized runtime: %+v
    -- arg 1: <*status.statusError>
  - unexpected error from the vectorized runtime: rpc error: code = Canceled desc = context canceled
E190926 16:01:00.089955 251199 sql/sqltelemetry/report.go:56  encountered internal error:
assertion failure

and also log spam that at the very least needs to be throttled:

E190926 16:00:30.231536 211004 sql/colflow/colrpc/outbox.go:284  [n3,streamID=0] Outbox Recv connection error: rpc error: code = Unknown desc = context canceled: readerCtx in Inbox stream handler (local reader canceled)
E190926 16:00:30.525474 210991 sql/colflow/colrpc/outbox.go:284  [n3,streamID=0] Outbox Recv connection error: rpc error: code = Unknown desc = context canceled: readerCtx in Inbox stream handler (local reader canceled)
E190926 16:00:30.536460 211009 sql/colflow/colrpc/outbox.go:284  [n3,streamID=0] Outbox Recv connection error: rpc error: code = Unknown desc = context canceled: readerCtx in Inbox stream handler (local reader canceled)
E190926 16:00:30.543502 210983 sql/colflow/colrpc/outbox.go:284  [n3,streamID=0] Outbox Recv connection error: rpc error: code = Unknown desc = context canceled: readerCtx in Inbox stream handler (local reader canceled)
E190926 16:00:30.559556 211055 sql/colflow/colrpc/outbox.go:284  [n3,streamID=0] Outbox Recv connection error: rpc error: code = Unknown desc = context canceled: readerCtx in Inbox stream handler (local reader canceled)

@tbg
Copy link
Member

tbg commented Sep 27, 2019

regarding the "unexpected EOF" error, this test runs under chaos so that error is expected to occur (duh), and we also see that a node got killed just before the error occurred. Maybe the driver used in this test isn't properly handling this error when it occurs during a ROLLBACK?

@jordanlewis
Copy link
Member

@asubiotto ping on this logging stuff - we should probably make these messages look less scary if they're really expected.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4dcfd7d8899d90dd1816153bf59b5df647c9bbd3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1516560&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191002-1516560/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2143,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1569993086-10-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.19:26257,10.128.0.24:26257,10.128.0.23:26257'  returned:
		stderr:
		 Retrying after sleeping 5ns
		2019/10/02 05:23:05 ExecuteTx retry attempt 1 failed, started at 2019-10-02 05:23:04.377800284 +0000 UTC m=+209.532654638, now = 2019-10-02 05:23:05.839953734 +0000 UTC m=+210.994808097, took 1.462153459s
		2019/10/02 05:23:05 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/02 05:23:05 ExecuteTx retry attempt 1 failed, started at 2019-10-02 05:23:04.400899637 +0000 UTC m=+209.555753997, now = 2019-10-02 05:23:05.839956224 +0000 UTC m=+210.994810589, took 1.439056592s
		2019/10/02 05:23:05 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/02 05:23:05 ExecuteTx retry attempt 1 failed, started at 2019-10-02 05:23:05.25676013 +0000 UTC m=+210.411614481, now = 2019-10-02 05:23:05.840111776 +0000 UTC m=+210.994966133, took 583.351652ms
		2019/10/02 05:23:05 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/10/02 05:23:05 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1c99165c39c3714f1ce9986bff75ce517f977630

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1522970&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191005-1522970/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2143,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1570252270-02-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.58:26257,10.128.0.51:26257,10.128.0.50:26257'  returned:
		stderr:
		:22:47 RobustDB.RandomDB chose DB at index 1
		2019/10/05 05:22:47 Consistency Test 9_286 @ 1570252964651617212.0000000000: sizes :- files - 15070, childRelations - 15069, stripes - 2377
		2019/10/05 05:22:47 RobustDB.RandomDB chose DB at index 1
		2019/10/05 05:22:47 Removing &{bc9eea1c-93f3-4cf7-8d92-f0eb7644fe08 1 0 209 default}
		2019/10/05 05:22:47 RobustDB.RandomDB chose DB at index 0
		2019/10/05 05:22:47 RobustDB.RandomDB chose DB at index 1
		2019/10/05 05:22:47 Consistency Test 5_283 @ 1570252964517257508.0000000000: sizes :- files - 15070, childRelations - 15069, stripes - 2377
		2019/10/05 05:22:47 RobustDB.RandomDB chose DB at index 2
		2019/10/05 05:22:47 ExecuteTx retry attempt 1 failed, started at 2019-10-05 05:22:46.092050238 +0000 UTC m=+453.425985670, now = 2019-10-05 05:22:47.578814804 +0000 UTC m=+454.912750261, took 1.486764591s
		2019/10/05 05:22:47 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/10/05 05:22:47 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/b519bf7f81795d360fcf458b5d5e031b3fc43e9e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1531436&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191010-1531436/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2146,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1570684301-09-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.231:26257,10.128.1.3:26257,10.128.0.241:26257'  returned:
		stderr:
		 retry attempt 1 failed, started at 2019-10-10 05:23:34.212534673 +0000 UTC m=+452.926320847, now = 2019-10-10 05:23:36.071969361 +0000 UTC m=+454.785755546, took 1.859434699s
		2019/10/10 05:23:36 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/10 05:23:36 ExecuteTx retry attempt 1 failed, started at 2019-10-10 05:23:36.013136151 +0000 UTC m=+454.726922309, now = 2019-10-10 05:23:36.071883283 +0000 UTC m=+454.785669466, took 58.747157ms
		2019/10/10 05:23:36 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/10 05:23:36 RobustDB.RandomDB chose DB at index 2
		2019/10/10 05:23:36 ExecuteTx retry attempt 1 failed, started at 2019-10-10 05:23:34.221713517 +0000 UTC m=+452.935499675, now = 2019-10-10 05:23:36.072091823 +0000 UTC m=+454.785878012, took 1.850378337s
		2019/10/10 05:23:36 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/10/10 05:23:36 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/41bac2fd99a91d92cb3ff6426789f7e64dd6b14a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1534498&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191011-1534498/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2146,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1570770659-11-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.65:26257,10.128.0.69:26257,10.128.0.36:26257'  returned:
		stderr:
		d-198d3a8a20cc
		2019/10/11 05:15:18 Removing &{4e9c1ad4-333e-4da5-b834-ccc16dc06a04 1 1 49 default}
		2019/10/11 05:15:18 Deleted child_relations for uuid 98b13c50-fb0b-413e-922d-198d3a8a20cc
		2019/10/11 05:15:18 Deleted stripes for uuid 495bef8e-2237-4706-b4bc-b8d1e31212c5
		2019/10/11 05:15:18 Removing &{e304d3dd-028c-419b-92cc-0b84cb153f6e 1 1 63 default}
		2019/10/11 05:15:18 Deleted &{98b13c50-fb0b-413e-922d-198d3a8a20cc 1 1 5 default}
		2019/10/11 05:15:18 Created file 1_120 with uuid 2f468e35-b336-4c08-85a6-a96f5a21f7f0 and parent /default
		2019/10/11 05:15:18 Deleted child_relations for uuid 6d842973-0e98-4835-91d3-322f025d30b7
		2019/10/11 05:15:18 Deleted stripes for uuid 4e9c1ad4-333e-4da5-b834-ccc16dc06a04
		2019/10/11 05:15:18 Writing new stripe 0
		2019/10/11 05:15:18 &{5e73c7ea-b502-4a2e-a267-61419b374f43 0 default}
		2019/10/11 05:15:18 Deleted child_relations for uuid 495bef8e-2237-4706-b4bc-b8d1e31212c5
		2019/10/11 05:15:18 Removing &{a5971e99-6970-45d9-b700-61c17acc4739 1 3 18 default}
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a40cefbb5bd5e9d34f5db930334a385d0934d2d0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1538863&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191015-1538863/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2146,scaledata.go:121,scaledata.go:48,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1571116260-01-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.125:26257,10.128.0.124:26257,10.128.0.144:26257'  returned:
		stderr:
		tions - 3200, stripes - 505
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 0
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 0
		2019/10/15 05:16:32 Consistency Test 4_116 @ 1571116591491488154.0000000000: sizes :- files - 3199, childRelations - 3198, stripes - 502
		2019/10/15 05:16:32 Consistency Test 7_122 @ 1571116591510536520.0000000000: sizes :- files - 3201, childRelations - 3200, stripes - 502
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 1
		2019/10/15 05:16:32 Consistency Test 10_120 @ 1571116591504568025.0000000000: sizes :- files - 3201, childRelations - 3200, stripes - 502
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 2
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 0
		2019/10/15 05:16:32 Created file 5_519 with uuid f0bb1b2f-dfc3-41e8-9d55-715197882109 and parent /default
		2019/10/15 05:16:32 RobustDB.RandomDB chose DB at index 0
		2019/10/15 05:16:32 Created file 4_395 with uuid c76bc2f3-e7ca-4c76-9182-407c81e38392 and parent /default
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a57647381a4714b48f6ec6dec0bf766eaa6746dd

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1561660&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191029-1561660/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1561660-1572330618-12-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.155:26257,10.128.0.157:26257,10.128.0.191:26257'  returned:
		stderr:
		e4-429a-89be-d4adc7b39fef
		2019/10/29 06:35:49 Deleted &{448a3d68-4de4-429a-89be-d4adc7b39fef 1 1 238 default}
		2019/10/29 06:35:49 RobustDB.RandomDB chose DB at index 0
		2019/10/29 06:35:49 Created file 1_427 with uuid d6b8632d-db4f-45b5-8f48-7b359f94b248 and parent /default
		2019/10/29 06:35:49 RobustDB.RandomDB chose DB at index 1
		2019/10/29 06:35:49 ExecuteTx retry attempt 1 failed, started at 2019-10-29 06:35:40.956881227 +0000 UTC m=+88.744286578, now = 2019-10-29 06:35:49.97334267 +0000 UTC m=+97.760748060, took 9.016461482s
		2019/10/29 06:35:49 pq error - Error code : 58C01, Error class : 58
		2019/10/29 06:35:49 pq error - Error code : 58C01, Error class : 58
		2019/10/29 06:35:49 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/10/29 06:35:49 postgres error code is 58C01 and class is 58
		2019/10/29 06:35:49 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a4d88c2c5ab6131878d2b4552446d94fd93b1553

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1563612&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191030-1563612/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563612-1572415545-15-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.219:26257,10.128.0.149:26257,10.128.15.214:26257'  returned:
		stderr:
		10/30 06:19:19 ExecuteTx retry attempt 1 failed, started at 2019-10-30 06:19:19.292571554 +0000 UTC m=+576.628238544, now = 2019-10-30 06:19:19.554581873 +0000 UTC m=+576.890248882, took 262.010338ms
		2019/10/30 06:19:19 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/30 06:19:19 ExecuteTx retry attempt 1 failed, started at 2019-10-30 06:19:19.550422736 +0000 UTC m=+576.886089741, now = 2019-10-30 06:19:19.567285412 +0000 UTC m=+576.902952428, took 16.862687ms
		2019/10/30 06:19:19 Attempt failed with error dial tcp 10.128.0.149:26257: connect: connection refused: ... Retrying after sleeping 5ns
		2019/10/30 06:19:19 ExecuteTx retry attempt 1 failed, started at 2019-10-30 06:19:18.281410842 +0000 UTC m=+575.617077847, now = 2019-10-30 06:19:19.567371933 +0000 UTC m=+576.903039008, took 1.285961161s
		2019/10/30 06:19:19 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/10/30 06:19:19 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/262e6f2499e34eb4373d0450fa9f6a820a609b2c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1565222&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=provisional_201910301435_v19.2.0-rc.3, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191030-1565222/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1565222-1572457193-11-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.155:26257,10.128.0.151:26257,10.128.0.191:26257'  returned:
		stderr:
		. Retrying after sleeping 5ns
		2019/10/30 17:53:48 ExecuteTx retry attempt 1 failed, started at 2019-10-30 17:53:48.496258402 +0000 UTC m=+576.361123323, now = 2019-10-30 17:53:48.924699946 +0000 UTC m=+576.789564903, took 428.44158ms
		2019/10/30 17:53:48 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/30 17:53:48 ExecuteTx retry attempt 1 failed, started at 2019-10-30 17:53:48.76433401 +0000 UTC m=+576.629198946, now = 2019-10-30 17:53:48.925599438 +0000 UTC m=+576.790464397, took 161.265451ms
		2019/10/30 17:53:48 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/10/30 17:53:48 ExecuteTx retry attempt 1 failed, started at 2019-10-30 17:53:46.018576247 +0000 UTC m=+573.883441170, now = 2019-10-30 17:53:48.926518051 +0000 UTC m=+576.791383004, took 2.907941834s
		2019/10/30 17:53:48 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/10/30 17:53:48 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@solongordon solongordon mentioned this issue Oct 31, 2019
18 tasks
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4c52530a33367c58e434111324fcda8c3d73582a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1568299&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191101-1568299/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1568299-1572589630-05-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.239:26257,10.128.0.237:26257,10.128.0.218:26257'  returned:
		stderr:
		.. Retrying after sleeping 5ns
		2019/11/01 06:40:58 ExecuteTx retry attempt 1 failed, started at 2019-11-01 06:40:57.01561249 +0000 UTC m=+576.837860531, now = 2019-11-01 06:40:58.02328009 +0000 UTC m=+577.845528148, took 1.007667617s
		2019/11/01 06:40:58 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/01 06:40:58 ExecuteTx retry attempt 1 failed, started at 2019-11-01 06:40:57.346405568 +0000 UTC m=+577.168653606, now = 2019-11-01 06:40:58.02382747 +0000 UTC m=+577.846075545, took 677.421939ms
		2019/11/01 06:40:58 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/01 06:40:58 ExecuteTx retry attempt 1 failed, started at 2019-11-01 06:40:54.708651563 +0000 UTC m=+574.530899611, now = 2019-11-01 06:40:58.023617819 +0000 UTC m=+577.845865879, took 3.314966268s
		2019/11/01 06:40:58 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/01 06:40:58 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/62801ce77d9055c00b0e30010f5998ea2cd86686

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1569533&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=provisional_201911010137_v19.2.0-rc.3, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191101-1569533/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1569533-1572613020-15-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.33:26257,10.128.0.100:26257,10.128.0.212:26257'  returned:
		stderr:
		Retrying after sleeping 5ns
		2019/11/01 13:09:05 ExecuteTx retry attempt 1 failed, started at 2019-11-01 13:09:05.627118044 +0000 UTC m=+454.554174304, now = 2019-11-01 13:09:05.792931542 +0000 UTC m=+454.719987851, took 165.813547ms
		2019/11/01 13:09:05 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/01 13:09:05 ExecuteTx retry attempt 1 failed, started at 2019-11-01 13:09:04.313796737 +0000 UTC m=+453.240853007, now = 2019-11-01 13:09:05.793212156 +0000 UTC m=+454.720268454, took 1.479415447s
		2019/11/01 13:09:05 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/01 13:09:05 ExecuteTx retry attempt 1 failed, started at 2019-11-01 13:09:05.146337381 +0000 UTC m=+454.073393641, now = 2019-11-01 13:09:05.793476035 +0000 UTC m=+454.720532338, took 647.138697ms
		2019/11/01 13:09:05 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/01 13:09:05 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8b9f54761adc58eb9aecbf9b26f1a7987d8a01e5

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1573251&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191105-1573251/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1573251-1572938926-02-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.81:26257,10.128.0.112:26257,10.128.0.148:26257'  returned:
		stderr:
		 at 2019-11-05 07:42:21.688549409 +0000 UTC m=+576.392671890, now = 2019-11-05 07:42:22.017617731 +0000 UTC m=+576.721740245, took 329.068355ms
		2019/11/05 07:42:22 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/05 07:42:22 RobustDB.RandomDB chose DB at index 2
		2019/11/05 07:42:22 ExecuteTx retry attempt 1 failed, started at 2019-11-05 07:42:21.520374574 +0000 UTC m=+576.224497047, now = 2019-11-05 07:42:22.018319431 +0000 UTC m=+576.722441924, took 497.944877ms
		2019/11/05 07:42:22 Aborting Retries because this error of type *crdb.AmbiguousCommitError is not retryable : driver: bad connection
		2019/11/05 07:42:22 ExecuteTx retry attempt 1 failed, started at 2019-11-05 07:42:19.974169711 +0000 UTC m=+574.678292187, now = 2019-11-05 07:42:22.020209842 +0000 UTC m=+576.724332354, took 2.046040167s
		2019/11/05 07:42:22 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/05 07:42:22 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ff13356a005e446f3f11bd37cc9772f568d8f41f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1578961&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191107-1578961/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1578961-1573111891-10-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.231:26257,10.128.0.251:26257,10.128.1.4:26257'  returned:
		stderr:
		Retrying after sleeping 5ns
		2019/11/07 07:44:15 ExecuteTx retry attempt 1 failed, started at 2019-11-07 07:44:14.603472845 +0000 UTC m=+454.206177854, now = 2019-11-07 07:44:15.329397645 +0000 UTC m=+454.932102697, took 725.924843ms
		2019/11/07 07:44:15 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/07 07:44:15 ExecuteTx retry attempt 1 failed, started at 2019-11-07 07:44:13.801791804 +0000 UTC m=+453.404496819, now = 2019-11-07 07:44:15.331268277 +0000 UTC m=+454.933973341, took 1.529476522s
		2019/11/07 07:44:15 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/07 07:44:15 ExecuteTx retry attempt 1 failed, started at 2019-11-07 07:44:13.065927854 +0000 UTC m=+452.668632866, now = 2019-11-07 07:44:15.330682733 +0000 UTC m=+454.933387794, took 2.264754928s
		2019/11/07 07:44:15 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/07 07:44:15 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e26c2f26ca95796f84e7396b832b80f5d53605ae

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1589114&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191113-1589114/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1589114-1573630028-11-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.206:26257,10.128.0.253:26257,10.128.15.213:26257'  returned:
		stderr:
		727698 +0000 UTC m=+453.981671181, now = 2019-11-13 07:39:29.073786142 +0000 UTC m=+455.082687549, took 1.101016368s
		2019/11/13 07:39:29 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/13 07:39:29 RobustDB.RandomDB chose DB at index 0
		2019/11/13 07:39:29 ExecuteTx retry attempt 1 failed, started at 2019-11-13 07:39:26.312855291 +0000 UTC m=+452.321756670, now = 2019-11-13 07:39:29.073496355 +0000 UTC m=+455.082397767, took 2.760641097s
		2019/11/13 07:39:29 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/13 07:39:29 RobustDB.RandomDB chose DB at index 2
		2019/11/13 07:39:29 ExecuteTx retry attempt 1 failed, started at 2019-11-13 07:39:27.980946735 +0000 UTC m=+453.989848111, now = 2019-11-13 07:39:29.073502458 +0000 UTC m=+455.082403875, took 1.092555764s
		2019/11/13 07:39:29 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/13 07:39:29 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/549d5bb865eaf9f5233cf8a068034637587b4373

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616574&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191128-1616574/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1616574-1574924525-11-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.113:26257,10.128.0.76:26257,10.128.0.120:26257'  returned:
		stderr:
		andomDB chose DB at index 2
		2019/11/28 07:15:27 ExecuteTx retry attempt 1 failed, started at 2019-11-28 07:15:26.430184659 +0000 UTC m=+575.136869097, now = 2019-11-28 07:15:27.947006096 +0000 UTC m=+576.653690544, took 1.516821447s
		2019/11/28 07:15:27 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/28 07:15:27 ExecuteTx retry attempt 1 failed, started at 2019-11-28 07:15:27.293606009 +0000 UTC m=+576.000290434, now = 2019-11-28 07:15:27.947759563 +0000 UTC m=+576.654444008, took 654.153574ms
		2019/11/28 07:15:27 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/11/28 07:15:27 ExecuteTx retry attempt 1 failed, started at 2019-11-28 07:15:26.506870171 +0000 UTC m=+575.213554595, now = 2019-11-28 07:15:27.948423119 +0000 UTC m=+576.655107567, took 1.441552972s
		2019/11/28 07:15:27 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/11/28 07:15:27 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/39bf64d28ee5a0ab79081b8b0b29230749ea0fff

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616610&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-2.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191128-1616610/scaledata/filesystem_simulator/nodes=3/run_1
	test_runner.go:712: test timed out (20m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a60c4680e82b390ab058634a58626f56c80b27ab

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1616556&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191128-1616556/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1616556-1574926349-09-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.211:26257,10.128.15.217:26257,10.128.15.204:26257'  returned:
		stderr:
		-11-28 07:40:10.827013142 +0000 UTC m=+104.021469059, now = 2019-11-28 07:40:10.827442949 +0000 UTC m=+104.021898896, took 429.837µs
		2019/11/28 07:40:10 Attempt failed with error dial tcp 10.128.15.217:26257: connect: connection refused: ... Retrying after sleeping 5ns
		2019/11/28 07:40:10 RobustDB.RandomDB chose DB at index 2
		2019/11/28 07:40:10 ExecuteTx retry attempt 1 failed, started at 2019-11-28 07:39:54.731236144 +0000 UTC m=+87.925692066, now = 2019-11-28 07:40:10.836267758 +0000 UTC m=+104.030723737, took 16.105031671s
		2019/11/28 07:40:10 pq error - Error code : 58C01, Error class : 58
		2019/11/28 07:40:10 pq error - Error code : 58C01, Error class : 58
		2019/11/28 07:40:10 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/11/28 07:40:10 postgres error code is 58C01 and class is 58
		2019/11/28 07:40:10 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ed717cbaf741e3a32c76db25b16a59dc2a8221d7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1624103&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191204-1624103/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1624103-1575445328-05-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.11:26257,10.128.1.23:26257,10.128.1.27:26257'  returned:
		stderr:
		omDB chose DB at index 0
		2019/12/04 07:50:24 Deleted &{d9d8deb4-5a0f-41ce-ab9d-a3458e873b75 1 1 118 default}
		2019/12/04 07:50:24 Writing new stripe 0
		2019/12/04 07:50:24 &{5a854c48-17ef-45f4-b9a4-da50d8d38a46 0 default}
		2019/12/04 07:50:24 Writing new stripe 0
		2019/12/04 07:50:24 &{7c0c329a-0b41-42c0-9414-e77521911bd3 0 default}
		2019/12/04 07:50:24 ExecuteTx retry attempt 1 failed, started at 2019-12-04 07:50:18.314273719 +0000 UTC m=+88.973511286, now = 2019-12-04 07:50:24.678877311 +0000 UTC m=+95.338114894, took 6.364603608s
		2019/12/04 07:50:24 pq error - Error code : 58C01, Error class : 58
		2019/12/04 07:50:24 pq error - Error code : 58C01, Error class : 58
		2019/12/04 07:50:24 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/04 07:50:24 postgres error code is 58C01 and class is 58
		2019/12/04 07:50:24 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9ad9eb5fb8806e4b74546910ca8bda66786d4288

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1626352&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-2.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191205-1626352/scaledata/filesystem_simulator/nodes=3/run_1
	test_runner.go:712: test timed out (20m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e4fa0b8b8e674d19c7957f03ca3a2d1f716f1f1d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1629591&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191206-1629591/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1629591-1575616044-04-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.119:26257,10.128.0.197:26257,10.128.0.185:26257'  returned:
		stderr:
		7:20:58 RobustDB.RandomDB chose DB at index 0
		2019/12/06 07:20:58 Deleted child_relations for uuid 3dbe7708-3d34-44d8-b7ba-7c09235010ff
		2019/12/06 07:20:58 Consistency Test 15_339 @ 1575616855966044925.0000000000: sizes :- files - 16171, childRelations - 16170, stripes - 2309
		2019/12/06 07:20:58 RobustDB.RandomDB chose DB at index 1
		2019/12/06 07:20:58 ExecuteTx retry attempt 1 failed, started at 2019-12-06 07:20:56.093837754 +0000 UTC m=+574.678500358, now = 2019-12-06 07:20:58.190328701 +0000 UTC m=+576.774991383, took 2.096491025s
		2019/12/06 07:20:58 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/12/06 07:20:58 ExecuteTx retry attempt 1 failed, started at 2019-12-06 07:20:55.818327365 +0000 UTC m=+574.402989987, now = 2019-12-06 07:20:58.191295338 +0000 UTC m=+576.775957990, took 2.372968003s
		2019/12/06 07:20:58 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/12/06 07:20:58 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/60412eb85271ecd1539971fcc6ea3bf11f1ca7a6

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1629609&tab=artifacts#/scaledata/filesystem_simulator/nodes=3

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191206-1629609/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:697: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1629609-1575618094-12-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.100:26257,10.128.1.116:26257,10.128.1.64:26257'  returned:
		stderr:
		ons - 18995, stripes - 3072
		2019/12/06 07:55:39 ExecuteTx retry attempt 1 failed, started at 2019-12-06 07:55:39.392197581 +0000 UTC m=+576.084070692, now = 2019-12-06 07:55:39.874036944 +0000 UTC m=+576.565910094, took 481.839402ms
		2019/12/06 07:55:39 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/12/06 07:55:39 ExecuteTx retry attempt 1 failed, started at 2019-12-06 07:55:39.426466214 +0000 UTC m=+576.118339324, now = 2019-12-06 07:55:39.875024898 +0000 UTC m=+576.566898041, took 448.558717ms
		2019/12/06 07:55:39 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/12/06 07:55:39 ExecuteTx retry attempt 1 failed, started at 2019-12-06 07:55:37.719144404 +0000 UTC m=+574.411017530, now = 2019-12-06 07:55:39.876037855 +0000 UTC m=+576.567911004, took 2.156893474s
		2019/12/06 07:55:39 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/12/06 07:55:39 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@734e357412dadafcd6084b8eab8e251e44e86b4a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191210-1634867/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1634867-1575963949-07-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.123:26257,10.128.1.126:26257,10.128.1.121:26257'  returned:
		stderr:
		2019-12-10 07:51:44.792568686 +0000 UTC m=+89.259075069, now = 2019-12-10 07:51:44.792769678 +0000 UTC m=+89.259276071, took 201.002µs
		2019/12/10 07:51:44 Attempt failed with error dial tcp 10.128.1.126:26257: connect: connection refused: ... Retrying after sleeping 10ns
		2019/12/10 07:51:44 RobustDB.RandomDB chose DB at index 1
		2019/12/10 07:51:44 ExecuteTx retry attempt 1 failed, started at 2019-12-10 07:51:44.307051496 +0000 UTC m=+88.773557875, now = 2019-12-10 07:51:44.792900884 +0000 UTC m=+89.259407276, took 485.849401ms
		2019/12/10 07:51:44 pq error - Error code : 58C01, Error class : 58
		2019/12/10 07:51:44 pq error - Error code : 58C01, Error class : 58
		2019/12/10 07:51:44 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/10 07:51:44 postgres error code is 58C01 and class is 58
		2019/12/10 07:51:44 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1

details

Artifacts: /scaledata/filesystem_simulator/nodes=3

make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on release-19.1@4fa6d707fb3b52693638b1438ba7dc46227684f5:

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191211-1637611/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1637611-1576047943-05-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.1.185:26257,10.128.1.163:26257,10.128.1.153:26257'  returned:
		stderr:
		 at 2019-12-11 07:20:25.76414488 +0000 UTC m=+451.553350758, now = 2019-12-11 07:20:29.129577004 +0000 UTC m=+454.918782918, took 3.36543216s
		2019/12/11 07:20:29 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/12/11 07:20:29 ExecuteTx retry attempt 1 failed, started at 2019-12-11 07:20:28.424651771 +0000 UTC m=+454.213857655, now = 2019-12-11 07:20:29.129360905 +0000 UTC m=+454.918566818, took 704.709163ms
		2019/12/11 07:20:29 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/12/11 07:20:29 unexpected EOF
		2019/12/11 07:20:29 ExecuteTx retry attempt 1 failed, started at 2019-12-11 07:20:29.108134177 +0000 UTC m=+454.897340057, now = 2019-12-11 07:20:29.129775832 +0000 UTC m=+454.918981732, took 21.641675ms
		2019/12/11 07:20:29 Attempt failed with error dial tcp 10.128.1.163:26257: connect: connection refused: ... Retrying after sleeping 5ns
		2019/12/11 07:20:29 RobustDB.RandomDB chose DB at index 0
		Error:  exit status 255
		
		stdout:
		: exit status 1

details

Artifacts: /scaledata/filesystem_simulator/nodes=3

make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on release-19.2@3cbd05602d4aeaebbccea18d66ad0fdf8db482a5:

The test failed on branch=release-19.2, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191211-1637629/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1637629-1576049754-04-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.15.231:26257,10.128.1.247:26257,10.128.1.223:26257'  returned:
		stderr:
		5 RobustDB.RandomDB chose DB at index 1
		2019/12/11 07:45:05 Deleted stripes for uuid 0aced0c6-2e5f-4487-a090-34d11956e7f6
		2019/12/11 07:45:05 Deleted stripes for uuid 8e6b269b-d864-4633-aa48-fed06f08cc9e
		2019/12/11 07:45:05 Deleted child_relations for uuid 61f78f7e-4691-4d41-9dae-f4a47d6c549c
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 2
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 1
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 1
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 1
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 1
		2019/12/11 07:45:05 RobustDB.RandomDB chose DB at index 0
		2019/12/11 07:45:05 ExecuteTx retry attempt 1 failed, started at 2019-12-11 07:45:03.414352124 +0000 UTC m=+209.154405698, now = 2019-12-11 07:45:05.429279427 +0000 UTC m=+211.169333106, took 2.014927408s
		2019/12/11 07:45:05 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/12/11 07:45:05 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1

details

Artifacts: /scaledata/filesystem_simulator/nodes=3

make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on provisional_201912132008_v20.1.0-alpha20191216@e9e2a80361a25fd9f9b179f84be4c5c3d7e7d8cb:

The test failed on branch=provisional_201912132008_v20.1.0-alpha20191216, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191213-1642926/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1642926-1576273385-08-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.75:26257,10.128.0.101:26257,10.128.0.114:26257'  returned:
		stderr:
		r from the vectorized runtime: rpc error: code = Canceled desc = context canceled: ... Retrying after sleeping 5ns
		2019/12/13 21:55:17 ExecuteTx retry attempt 1 failed, started at 2019-12-13 21:55:17.297677443 +0000 UTC m=+454.194471731, now = 2019-12-13 21:55:17.967668852 +0000 UTC m=+454.864463156, took 669.991425ms
		2019/12/13 21:55:17 pq error - Error code : XX000, Error class : XX
		2019/12/13 21:55:17 Attempt failed with error pq: internal error: unexpected error from the vectorized runtime: rpc error: code = Canceled desc = context canceled: ... Retrying after sleeping 5ns
		2019/12/13 21:55:17 RobustDB.RandomDB chose DB at index 0
		2019/12/13 21:55:17 ExecuteTx retry attempt 1 failed, started at 2019-12-13 21:55:16.809187112 +0000 UTC m=+453.705981413, now = 2019-12-13 21:55:17.967136742 +0000 UTC m=+454.863931047, took 1.157949634s
		2019/12/13 21:55:17 Aborting Retries because this error of type *errors.errorString is not retryable : unexpected EOF
		2019/12/13 21:55:17 unexpected EOF
		Error:  exit status 255
		
		stdout:
		: exit status 1
Repro

Artifacts: /scaledata/filesystem_simulator/nodes=3

make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).scaledata/filesystem_simulator/nodes=3 failed on master@beb69e089b6eadc0fde6c92eb533b08d248938fe:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191216-1644933/scaledata/filesystem_simulator/nodes=3/run_1
	cluster.go:2163,scaledata.go:121,scaledata.go:48,test_runner.go:700: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1644933-1576481346-05-n4cpu4:4 -- ./filesystem_simulator  --duration_secs=600 --num_workers=16 --cockroach_ip_addresses_csv='10.128.0.123:26257,10.128.0.90:26257,10.128.0.149:26257'  returned:
		stderr:
		3576 +0000 UTC m=+88.154104786, now = 2019-12-16 07:34:54.208130017 +0000 UTC m=+89.233141270, took 1.079036484s
		2019/12/16 07:34:54 Attempt failed with error driver: bad connection: ... Retrying after sleeping 5ns
		2019/12/16 07:34:54 RobustDB.RandomDB chose DB at index 2
		2019/12/16 07:34:54 RobustDB.RandomDB chose DB at index 1
		2019/12/16 07:34:54 ExecuteTx retry attempt 1 failed, started at 2019-12-16 07:34:52.980552007 +0000 UTC m=+88.005563231, now = 2019-12-16 07:34:54.209621555 +0000 UTC m=+89.234632788, took 1.229069557s
		2019/12/16 07:34:54 pq error - Error code : 58C01, Error class : 58
		2019/12/16 07:34:54 pq error - Error code : 58C01, Error class : 58
		2019/12/16 07:34:54 Aborting Retries because this error of type *pq.Error is not retryable : pq: communication error: rpc error: code = Canceled desc = context canceled
		2019/12/16 07:34:54 postgres error code is 58C01 and class is 58
		2019/12/16 07:34:54 pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 255
		
		stdout:
		: exit status 1
Repro

Artifacts: /scaledata/filesystem_simulator/nodes=3

make stressrace TESTS=scaledata/filesystem_simulator/nodes=3 PKG=./pkg/roachtest TESTTIMEOUT=5m STRESSFLAGS=-timeout 5m' 2>&1

powered by pkg/cmd/internal/issues

nvanbenschoten added a commit to nvanbenschoten/rksql that referenced this issue Dec 17, 2019
Fixes cockroachdb/cockroach#36981.
Fixes cockroachdb/cockroach#39618.
Fixes cockroachdb/cockroach#40552.
Fixes cockroachdb/cockroach#41735.

cockroachdb/cockroach#41451 switched two forms
of errors that can be thrown during chaos events over to a new error code
class - 58, internal system errors. This commit updates `pqConnectionError`
to consider this error code class as retry-worthy.
@nvanbenschoten
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

4 participants