Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/crdb-chaos/rangefeed=true failed [skipped] #37716

Closed
cockroach-teamcity opened this issue May 22, 2019 · 31 comments · Fixed by #54201
Closed

roachtest: cdc/crdb-chaos/rangefeed=true failed [skipped] #37716

cockroach-teamcity opened this issue May 22, 2019 · 31 comments · Fixed by #54201
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). E-starter Might be suitable for a starter project for new employees or team members. O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/7009f8750d5c3af32d5c43011869048ea7a311ae

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1300930&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:876,cdc.go:225,cdc.go:542,test.go:1251: max latency was more than allowed: 16m32.009570382s vs 15m0s

@cockroach-teamcity cockroach-teamcity added this to the 19.2 milestone May 22, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels May 22, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/630a6e9cb3771912cd138f9aa3bea1f0ca9fa7c9

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1306250&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:878,cdc.go:225,cdc.go:542,test.go:1251: max latency was more than allowed: 17m37.675566453s vs 15m0s

@nvanbenschoten
Copy link
Member

max latency was more than allowed: 16m32.009570382s vs 15m0s

That's close enough that it may not indicate that anything went seriously wrong. @danhhz can I assign this to you to make a decision on what to do?

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/db98d5fb943e0a45b3878bdf042838408e9aee40

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308281&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:41978->35.231.168.246:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1308281-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:542,test.go:1251: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/fc7e48295cd05f94fd2883498d96d91ad538e559

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1308263&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:878,cdc.go:225,cdc.go:542,test.go:1251: max latency was more than allowed: 16m8.349398897s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c280de40c2bcab93c41fe82bef8353a5ecd95ac4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1311970&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:43360->34.74.90.185:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1311970-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:542,test.go:1251: Goexit() was called

@danhhz
Copy link
Contributor

danhhz commented May 28, 2019

This is related to #36879. The threshold really should be more like 2-3m but we don't have predictable behavior around crdb chaos. The previous failures seemed to all be between 11-12m so I was hoping there was a 10m timeout somewhere that bumping to 15 would fix. At this point, I think we need to investigate what's going on here. This actually would be a good starter/intermediate issue for someone. Possibly me?

Could also be you or tobi to try to spread the changefeed debugging skills, but last time I looked into it, it seemed like closed timestamps taking a while to unstick after node chaos, so dunno if it's really a "changefeed" issue

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/61715f0f96f519d599eec6541bbee7394d63209a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312952&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:38994->35.243.199.17:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1312952-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:542,test.go:1251: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8f42e0d9948256af8b3e1994d514314ba1718c48

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1315162&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:54242->34.73.92.72:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1315162-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:542,test.go:1251: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/83e62d69214aaa0f7b976f764b97b0e21a41cde3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1318703&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:176,cluster.go:1854,errgroup.go:57: read tcp 172.17.0.2:50816->34.74.124.199:26257: read: connection reset by peer
	cluster.go:1516,cdc.go:747,cdc.go:135,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1318703-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1875,cdc.go:223,cdc.go:542,test.go:1251: Goexit() was called

@danhhz danhhz added the E-starter Might be suitable for a starter project for new employees or team members. label Jun 4, 2019
@danhhz
Copy link
Contributor

danhhz commented Jun 4, 2019

Unassigning myself so someone else can get some cdc exposure.

@danhhz danhhz removed their assignment Jun 4, 2019
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/5a88de2233e1405c0553f2d5380fd24218fac3d2

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324169&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:47910->35.229.50.181:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1324169-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:539,test.go:1248: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8e7ef35a8e4169ec63dc5a4df963d8b31a3d5b61

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1324151&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:35912->35.196.156.131:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1324151-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:539,test.go:1248: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8892e379d84a36b29003420189edd1e10db41d71

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1328407&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:35508->35.237.238.212:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1328407-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:539,test.go:1248: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/db6d4425d65bdb027624972ccb19d7aad0bc57cc

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1339372&tab=buildLog

The test failed on branch=master, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:44490->35.196.36.92:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1339372-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:539,test.go:1248: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/0854bf6d9dd30b4893c19a6c0c3a08809c3748c8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1351925&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
	cdc.go:173,cluster.go:1851,errgroup.go:57: read tcp 172.17.0.2:35338->34.73.228.172:26257: read: connection reset by peer
	cluster.go:1513,cdc.go:744,cdc.go:132,cluster.go:1851,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1351925-cdc-crdb-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		: signal: killed
	cluster.go:1872,cdc.go:220,cdc.go:539,test.go:1251: Goexit() was called

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/da56c792e968574b8f1d9ef3fdb45d56a530221a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1415578&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190801-1415578/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:691: max latency was more than allowed: 19m54.756832575s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/98d6832e9f9edb7e554aaa90d9d4296bb00af16e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1433695&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190810-1433695/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:691: max latency was more than allowed: 20m8.642977252s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/8ebcdac113118ae5fbcaddeecd269f59399aea8c

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1443904&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190819-1443904/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:674: max latency was more than allowed: 16m42.317238966s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/e8faca611a902766154ed82581d6d3a7483ad231

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1462518&tab=buildLog

The test failed on branch=provisional_201908291837_v19.2.0-beta.20190903, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190830-1462518/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:673: max latency was more than allowed: 24m36.93044168s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/96b1500e20575ee5c609a00857c78c918078c99b

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1465459&tab=buildLog

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190904-1465459/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:688: max latency was more than allowed: 15m57.293137143s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/4784fe3c51545db5fb5d411937ec1db2ef2b9761

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1472753&tab=buildLog

The test failed on branch=provisional_201909060000_v19.2.0-beta.20190910, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190906-1472753/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:688: max latency was more than allowed: 15m25.236366971s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/62b1678f652461bbc1aaf6bc2c0dd03105ce0ebe

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1488785&tab=buildLog

The test failed on branch=40765, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190914-1488785/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:688: max latency was more than allowed: 18m39.273633827s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c6342c90a7fa4ceb1b674faa47a95e1726d05e79

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1496387&tab=artifacts#/cdc/crdb-chaos/rangefeed=true

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20190919-1496387/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:689: max latency was more than allowed: 15m20.176892433s vs 15m0s

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/1ed03c9811f01fef31950a5cb75a7b591af6fc26

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/crdb-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1518416&tab=artifacts#/cdc/crdb-chaos/rangefeed=true

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/20191003-1518416/cdc/crdb-chaos/rangefeed=true/run_1
	cdc.go:879,cdc.go:226,cdc.go:543,test_runner.go:689: max latency was more than allowed: 15m56.096234132s vs 15m0s

@danhhz
Copy link
Contributor

danhhz commented Oct 3, 2019

Okay, so this max latency failure is definitely interesting and worth looking into, but not now. It does seem to always get itself started back up, just taking much longer than we'd expect (this assertion really should be around 2m, but we've upped it to try and reduce flakes). I don't think we get a lot of value from running this every night, especially while cdc is not under active development. I'm going to skip this in the interest of reducing noise.

@danhhz danhhz changed the title roachtest: cdc/crdb-chaos/rangefeed=true failed [skipped] roachtest: cdc/crdb-chaos/rangefeed=true failed Oct 3, 2019
@tbg tbg changed the title [skipped] roachtest: cdc/crdb-chaos/rangefeed=true failed roachtest: cdc/crdb-chaos/rangefeed=true failed [skipped] Oct 4, 2019
@tbg
Copy link
Member

tbg commented Oct 4, 2019

(Fixed the issue name - I think putting [skipped] at the front probably isn't going to work well with our tooling that uses the prefix as a scope)

@danhhz
Copy link
Contributor

danhhz commented Oct 4, 2019

Whoops, for some reason I specifically remember you telling me that it worked as long as it was contained in the title (not just a prefix). Anyway, thanks for fixing it! Is this written down anywhere?

@tbg
Copy link
Member

tbg commented Oct 4, 2019 via email

@tbg tbg added the branch-master Failures and bugs on the master branch. label Jan 22, 2020
@nvanbenschoten
Copy link
Member

@danhhz is there anything we should do here for the upcoming release?

@danhhz
Copy link
Contributor

danhhz commented Mar 23, 2020

With infinite bandwidth, yes, I'd still love for someone to dig into this. Practically... no.

@irfansharif
Copy link
Contributor

Discussed in the team meeting but this could've been resolved by us having addressed #48553. Given we found it difficult to repro, I'd be happy to unskip it going forward.

craig bot pushed a commit that referenced this issue Feb 3, 2021
54201: roachtest: unskip cdc/crdb-chaos r=aayushshah15 a=aayushshah15

I ran this test a total of 15 times in parallel and wasn't able to
reproduce. Since its been skipped for 2+ releases, it's hard to know
what fixed it, but a good guess is #48561.

Release note: None

Fixes #37716 

Informs #36879

Release justification: testing only

57170: util/log: new experimental integration with Fluentd  r=itsbilal a=knz

Release note (cli change): It is now possible to redirect logging to
[Fluentd](https://www.fluentd.org)-compatible network collectors. See
the documentation for details. This is an alpha-quality feature.


59741: opt: fix panic in GenerateLookupJoin r=mgartner a=mgartner

#### opt: fix panic in GenerateLookupJoin

In #57690 a new code path was introduced from `findConstantFilterCols`
from `GenerateLookupJoins`. This new code path made it possible for the
filters passed to `findConstantFilterCols` to contain columns that are
not part of the given table. This violated the assumption that the
filter only contains columns in the given table and caused a panic. This
commit fixes the issue by neglecting constant filters for columns not in
the given table.

Fixes #59738

Release note (bug fix): A bug has been fixed that caused errors when
joining two tables when one of the tables had a computed column. This
bug was present since version 21.1.0-alpha.2 and not present in any
production releases.

#### opt: move findConstantFilterCols to general_funcs.go

Release note: None


59779: flowinfra: deflake a test r=yuzefovich a=yuzefovich

Previously, a unit test could fail in rare circumstances when relocating
a range to a remote node, and now we will use SucceedsSoon to avoid
that. Also unskip the vectorized option.

Fixes: #59712

Release note: None

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
@craig craig bot closed this as completed in 51d31c1 Feb 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). E-starter Might be suitable for a starter project for new employees or team members. O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants