Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: backup2TB/n10cpu4 failed #38794

Closed
cockroach-teamcity opened this issue Jul 10, 2019 · 33 comments
Closed

roachtest: backup2TB/n10cpu4 failed #38794

cockroach-teamcity opened this issue Jul 10, 2019 · 33 comments
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/813b0146e763e8f44c4cb45e498e68e6ace9bc2d

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1380768&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190710-1380768/backup2TB/n10cpu4/run_1
	test_runner.go:693: test timed out (10h0m0s)
	cluster.go:1725,backup.go:47,test_runner.go:678: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1562739424-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190710 07:30:35.455439 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		: signal: killed

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone Jul 10, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Jul 10, 2019
@andreimatei andreimatei removed their assignment Jul 11, 2019
@andreimatei
Copy link
Contributor

timed out (10h) in workload fixtures import. There's test logs and artifacts.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7111a67b2ea3a19c2f312f8d214b8823f431cac0

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1400942&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190723-1400942/backup2TB/n10cpu4/run_1
	cluster.go:1726,backup.go:47,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1563862417-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190723 06:43:00.380554 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/26edea51118a0e16b61748c08068bfa6f76543ca

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1404886&tab=buildLog

The test failed on branch=provisional_201907241708_v19.2.0-alpha.20190729, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190725-1404886/backup2TB/n10cpu4/run_1
	cluster.go:1726,backup.go:47,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564034590-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190725 07:16:28.510262 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/92fef12128c997233d985d1c19e11faac005073f

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1413388&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190731-1413388/backup2TB/n10cpu4/run_1
	cluster.go:1726,backup.go:47,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564553448-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190731 07:24:45.211898 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/175c5ada040fd0cbbf178636b1c551d5c2229ec4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1417597&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190802-1417597/backup2TB/n10cpu4/run_1
	cluster.go:1726,backup.go:47,test_runner.go:691: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1564726582-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190802 06:23:03.386869 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/ca8fa726de54a0feea9f33ad000e883a4168ef39

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1442766&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190817-1442766/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:674: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566023009-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190817 06:35:29.527798 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.62:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8ebcdac113118ae5fbcaddeecd269f59399aea8c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1443904&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190819-1443904/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:674: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566195768-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190819 07:07:59.727877 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.142:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/01ee0704865391599abef3bbc89f462117f8007a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1445527&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190820-1445527/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566282291-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190820 07:00:56.688188 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.71:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/9a982e902638e116ed6a76f4fa635a0a1445d88a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1447054&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447054/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566367544-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190821 06:15:39.956270 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bd27eb358f558bb7598945318240335ebcfcdf13

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1447014&tab=buildLog

The test failed on branch=provisional_201908202216_v19.2.0-beta.20190826, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447014/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566364342-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190821 05:20:23.092534 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.186:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/bd27eb358f558bb7598945318240335ebcfcdf13

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1446993&tab=buildLog

The test failed on branch=provisional_201908202216_v19.2.0-beta.20190826, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1446993/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566364282-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190821 05:19:31.944989 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.81:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/93860e69f96aa3a86bd8bb42f310fb2629d53f39

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1447036&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190821-1447036/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566368490-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190821 07:32:02.237526 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: dial tcp 10.128.0.32:26257: connect: connection refused
		Error:  exit status 1
		: exit status 1

@dt
Copy link
Member

dt commented Aug 21, 2019

Looks like there is a panic somewhere in the distsql execution during the IMPORT before the test gets to BACKUP.

panic: send on closed channel

goroutine 3924 [running]:
panic(0x3b69720, 0x48325c0)
    /usr/local/go/src/runtime/panic.go:565 +0x2c5 fp=0xc00323fce8 sp=0xc00323fc58 pc=0x78d4a5
runtime.chansend(0xc0086830e0, 0xc00323fdb8, 0x1, 0x2827956, 0x0)
    /usr/local/go/src/runtime/chan.go:187 +0x5f5 fp=0xc00323fd68 sp=0xc00323fce8 pc=0x765245
runtime.chansend1(0xc0086830e0, 0xc00323fdb8)
    /usr/local/go/src/runtime/chan.go:127 +0x35 fp=0xc00323fda0 sp=0xc00323fd68 pc=0x764c45
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.(*RowChannel).Push(0xc007d1b140, 0x0, 0x0, 0x0, 0xc0089d4690, 0x0)
    /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/base.go:409 +0x76 fp=0xc00323fde8 sp=0xc00323fda0 pc=0x2827956
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.processProducerMessage(0x48ed580, 0xc007e92690, 0x49334e0, 0xc0039f1290, 0x48c8c00, 0xc007d1b140, 0xc007302e00, 0xc0002871b8, 0xc006797d40, 0x0, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/inbound.go:236 +0x1b5 fp=0xc00323fec0 sp=0xc00323fde8 pc=0x2849ec5
github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.processInboundStreamHelper.func2(0xc007167560, 0x49334e0, 0xc0039f1290, 0xc008fe7940, 0xc0086d36e0, 0x48ed580, 0xc007e92690, 0x48c8c00, 0xc007d1b140, 0xc007302e00, ...)
    /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/inbound.go:164 +0x122 fp=0xc00323ff88 sp=0xc00323fec0 pc=0x289b382
runtime.goexit()
    /usr/local/go/src/runtime/asm_amd64.s:1337 +0x1 fp=0xc00323ff90 sp=0xc00323ff88 pc=0x7bd241
created by github.com/cockroachdb/cockroach/pkg/sql/distsqlrun.processInboundStreamHelper
    /go/src/github.com/cockroachdb/cockroach/pkg/sql/distsqlrun/inbound.go:146 +0x1b3

It looks like it has to do with metadata response handling but I don't really know much more about what's going on here:

	case DrainRequested:
		// If we're draining, only forward metadata.
		if meta != nil {
			rc.dataChan <- RowChannelMsg{Meta: meta}
		}

Note that IMPORT recently started sending remote producer meta back for progress tracking, so I'm guessing that is somehow related, but this looks like this panic is something internal to the distsql flow?

@dt dt mentioned this issue Aug 21, 2019
14 tasks
@asubiotto
Copy link
Contributor

This panic essentially signals that the flow is in a bad state. A Push should never happen after ProducerDone so this makes me think that the IMPORT is probably misbehaving. A RowChannel is initialized with a certain number of senders, and each time ProducerDone is called, the number of senders atomically decreases, and when it reaches 0, the dataChan will be closed.
I would check:

  1. How many senders are we initializing the row channel with? Is that the expected number?
  2. The ProducerDone stack trace to see who's calling it in what case.
    By the way, there's a ProcessorBase struct that any processor implementation can embed that ensures that they behave "properly". Do the processors involved in an IMPORT use it?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7ca0a86b8595c097fd8f27581b1509c47f17e8a3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1450654&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190823-1450654/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566541739-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190823 07:07:18.144488 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: http://localhost:8081/csv/bank/bank?payload-bytes=10240&ranges=0&row-end=26040&row-start=19530&rows=65104166&seed=1&version=1.0.0: row 1000: reading CSV record: unexpected EOF
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/40f8f0eb00f4b3bf5bac11fb5ae132e33a492713

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1452154&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190824-1452154/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566627477-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190824 06:36:53.301714 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: http://localhost:8081/csv/bank/bank?payload-bytes=10240&ranges=0&row-end=65100&row-start=58590&rows=65104166&seed=1&version=1.0.0: row 1000: reading CSV record: unexpected EOF
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/497167b1c596eda2b70bed91c51ebf39b4356c33

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1453099&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190825-1453099/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1566714671-09-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190825 07:19:50.836929 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: http://localhost:8081/csv/bank/bank?payload-bytes=10240&ranges=0&row-end=58590&row-start=52080&rows=65104166&seed=1&version=1.0.0: row 1000: reading CSV record: unexpected EOF
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/7be1e524888cebeafda94858c073cd796b13b429

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1457444&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190828-1457444/backup2TB/n10cpu4/run_1
	test_runner.go:688: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/66bd279c9aa682c2b7adcec87ec0c639b8039a33

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1461635&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190830-1461635/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:47,test_runner.go:673: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1567146353-17-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190830 06:39:52.369477 1 ccl/workloadccl/fixture.go:316  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e8faca611a902766154ed82581d6d3a7483ad231

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1460982&tab=buildLog

The test failed on branch=provisional_201908291837_v19.2.0-beta.20190903, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190829-1460982/backup2TB/n10cpu4/run_1
	test_runner.go:688: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e8faca611a902766154ed82581d6d3a7483ad231

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1462518&tab=buildLog

The test failed on branch=provisional_201908291837_v19.2.0-beta.20190903, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190830-1462518/backup2TB/n10cpu4/run_1
	test_runner.go:688: test timed out (10h0m0s)

@jordanlewis
Copy link
Member

@dt did you manage to find a resolution to this? Do you need things from SQL Execution?

@dt
Copy link
Member

dt commented Sep 4, 2019

I just ran roachtest run backup2TB and it seemed like it was doing OK -- the IMPORT succeeded in 1h and the backup was making progress when I cancelled it, so I think whatever is hanging/flaking isn't deterministic. I know there've been a couple core bugs around splits recently so maybe this is just going to sort itself out? I'll try running again tomorrow to see if I can catch it in the act.

@cockroach-teamcity
Copy link
Member Author

cockroach-teamcity commented Sep 5, 2019

NOTE: Branch provisional_201909042143_v2.1.9

SHA: https://github.com/cockroachdb/cockroach/commits/179f29b066c266d14cfeac33ce29b2d18ba86c63

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1468506&tab=buildLog

The test failed on branch=provisional_201909042143_v2.1.9, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190905-1468506/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:70,cluster.go:2091,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1567656826-13-n10cpu4:1 -- ./cockroach sql --insecure -e "
						BACKUP bank.bank TO 'gs://cockroachdb-backup-testing/teamcity-1567656826-13-n10cpu4'" returned:
		stderr:
		
		stdout:
		: signal: killed

@dt
Copy link
Member

dt commented Sep 5, 2019

^ that one looks like 2.1 so not really interesting w.r.t. current flaky-or-not determination.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/404cbf9085a55c9d05455ac3dd2ada1719833150

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1484994&tab=buildLog

The test failed on branch=provisional_201909101743_v19.2.0-beta.20190913, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190913-1484994/backup2TB/n10cpu4/run_1
	test_runner.go:703: test timed out (10h0m0s)

@andy-kimball
Copy link
Contributor

@dt this test is still flaking out, including on our latest beta release Nightlies: https://teamcity.cockroachdb.com/viewLog.html?buildId=1484996&. Should this be added to the Release Blockers list, given that we're seeing occasional panics during import?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e8faca611a902766154ed82581d6d3a7483ad231

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1484919&tab=buildLog

The test failed on branch=provisional_201908291837_v19.2.0-beta.20190903, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190913-1484919/backup2TB/n10cpu4/run_1
	test_runner.go:703: test timed out (10h0m0s)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/42d307e191ff6787a45e058be164fa452c47f368

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1493878&tab=buildLog

The test failed on branch=provisional_201909171729_v2.1.9, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190917-1493878/backup2TB/n10cpu4/run_1
	cluster.go:1735,backup.go:70,cluster.go:2091,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1568751784-16-n10cpu4:1 -- ./cockroach sql --insecure -e "
						BACKUP bank.bank TO 'gs://cockroachdb-backup-testing/teamcity-1568751784-16-n10cpu4'" returned:
		stderr:
		
		stdout:
		: signal: killed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e4bd7a41c2b76dfe8dc08607865a3a424962aa2b

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1507769&tab=artifacts#/backup2TB/n10cpu4

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190926-1507769/backup2TB/n10cpu4/run_1
	cluster.go:1764,backup.go:47,test_runner.go:689: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1569474589-17-n10cpu4:1 -- ./workload fixtures import bank --db=bank --payload-bytes=10240 --ranges=0 --csv-server http://localhost:8081 --rows=65104166 --seed=1 {pgurl:1} returned:
		stderr:
		
		stdout:
		I190926 05:22:53.814036 1 ccl/workloadccl/fixture.go:317  starting import of 1 tables
		Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		Error:  exit status 1
		: exit status 1

@jordanlewis jordanlewis assigned dt and unassigned jordanlewis Sep 26, 2019
@jordanlewis
Copy link
Member

Assigning back to @dt.

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/09d51e9f6265ed70caf49385be905606ebf722c7

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=backup2TB/n10cpu4 PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1515124&tab=artifacts#/backup2TB/n10cpu4

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191001-1515124/backup2TB/n10cpu4/run_1
	test_runner.go:704: test timed out (10h0m0s)

@dt
Copy link
Member

dt commented Oct 3, 2019

optimistically closing as fixed by #41263

@dt dt closed this as completed Oct 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

No branches or pull requests

6 participants