Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: cdc/sink-chaos/rangefeed=true failed [skipped] #36019

Closed
cockroach-teamcity opened this issue Mar 21, 2019 · 7 comments · Fixed by #36852
Closed

roachtest: cdc/sink-chaos/rangefeed=true failed [skipped] #36019

cockroach-teamcity opened this issue Mar 21, 2019 · 7 comments · Fixed by #36852
Assignees
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

SHA: https://github.com/cockroachdb/cockroach/commits/dfa23c01e4ea39b19ca8b2e5c8a4e7cf9b9445f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1189954&tab=buildLog

The test failed on master:
	cluster.go:1267,cdc.go:625,cdc.go:125,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1189954-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   4m33s        0            1.0            1.5  40802.2  40802.2  40802.2  40802.2 delivery
		   4m33s        0            5.0           16.4  49392.1  51539.6  51539.6  51539.6 newOrder
		   4m33s        0            1.0            1.6  38654.7  38654.7  38654.7  38654.7 orderStatus
		   4m33s        0            3.0           16.6  47244.6  51539.6  51539.6  51539.6 payment
		   4m33s        0            0.0            1.1      0.0      0.0      0.0      0.0 stockLevel
		   4m34s        0            0.0            1.5      0.0      0.0      0.0      0.0 delivery
		   4m34s        0            8.0           16.3  49392.1  53687.1  53687.1  53687.1 newOrder
		   4m34s        0            0.0            1.6      0.0      0.0      0.0      0.0 orderStatus
		   4m34s        0            8.0           16.5  47244.6  51539.6  51539.6  51539.6 payment
		   4m34s        0            0.0            1.1      0.0      0.0      0.0      0.0 stockLevel
		: signal: killed
	cluster.go:1626,cdc.go:213,cdc.go:417,test.go:1214: unexpected status: failed

@cockroach-teamcity cockroach-teamcity added this to the 19.1 milestone Mar 21, 2019
@cockroach-teamcity cockroach-teamcity added C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. labels Mar 21, 2019
@danhhz danhhz self-assigned this Mar 22, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 25, 2019
…list

In the roachtests for crdb-chaos and sink-chaos we're seeing changefeeds
fail with surprising errors:

    [NotLeaseHolderError] r681: replica (n1,s1):1 not lease holder; replica (n2,s2):2 is

    descriptor not found

We'd like to avoid failing a changefeed unnecessarily, so when an error
bubbles up to the top level, we'd like to retry the distributed flow if
possible. We initially tried to whitelist which errors should cause the
changefeed to retry, but this turns out to be brittle, so this commit
switches to a blacklist. Any error that is expected to be permanent is
now marked with `MarkTerminalError` by the time it comes out of
`distChangefeedFlow`. Everything else should be logged loudly and
retried.

Touches cockroachdb#35974
Touches cockroachdb#36019

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 26, 2019
…list

In the roachtests for crdb-chaos and sink-chaos we're seeing changefeeds
fail with surprising errors:

    [NotLeaseHolderError] r681: replica (n1,s1):1 not lease holder; replica (n2,s2):2 is

    descriptor not found

We'd like to avoid failing a changefeed unnecessarily, so when an error
bubbles up to the top level, we'd like to retry the distributed flow if
possible. We initially tried to whitelist which errors should cause the
changefeed to retry, but this turns out to be brittle, so this commit
switches to a blacklist. Any error that is expected to be permanent is
now marked with `MarkTerminalError` by the time it comes out of
`distChangefeedFlow`. Everything else should be logged loudly and
retried.

Touches cockroachdb#35974
Touches cockroachdb#36019

Release note: None
danhhz added a commit to danhhz/cockroach that referenced this issue Mar 26, 2019
…list

In the roachtests for crdb-chaos and sink-chaos we're seeing changefeeds
fail with surprising errors:

    [NotLeaseHolderError] r681: replica (n1,s1):1 not lease holder; replica (n2,s2):2 is

    descriptor not found

We'd like to avoid failing a changefeed unnecessarily, so when an error
bubbles up to the top level, we'd like to retry the distributed flow if
possible. We initially tried to whitelist which errors should cause the
changefeed to retry, but this turns out to be brittle, so this commit
switches to a blacklist. Any error that is expected to be permanent is
now marked with `MarkTerminalError` by the time it comes out of
`distChangefeedFlow`. Everything else should be logged loudly and
retried.

Touches cockroachdb#35974
Touches cockroachdb#36019

Release note: None
craig bot pushed a commit that referenced this issue Mar 27, 2019
36132: changefeedccl: switch high-level retry marker from whitelist to blacklist r=nvanbenschoten a=danhhz

In the roachtests for crdb-chaos and sink-chaos we're seeing changefeeds
fail with surprising errors:

    [NotLeaseHolderError] r681: replica (n1,s1):1 not lease holder; replica (n2,s2):2 is

    descriptor not found

We'd like to avoid failing a changefeed unnecessarily, so when an error
bubbles up to the top level, we'd like to retry the distributed flow if
possible. We initially tried to whitelist which errors should cause the
changefeed to retry, but this turns out to be brittle, so this commit
switches to a blacklist. Any error that is expected to be permanent is
now marked with `MarkTerminalError` by the time it comes out of
`distChangefeedFlow`. Everything else should be logged loudly and
retried.

Touches #35974
Touches #36019

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/17565100d1e7c66341e6db3e39bb66202958cb81

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1204567&tab=buildLog

The test failed on master:
	cluster.go:1267,cdc.go:633,cdc.go:133,cluster.go:1605,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1204567-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   4m13s        0            4.0            2.2     39.8     52.4     52.4     52.4 delivery
		   4m13s        0           24.0           21.4     28.3     35.7     39.8     39.8 newOrder
		   4m13s        0            1.0            2.1      7.1      7.1      7.1      7.1 orderStatus
		   4m13s        0           21.0           21.9     14.2     22.0     22.0     22.0 payment
		   4m13s        0            3.0            2.1     11.5     12.1     12.1     12.1 stockLevel
		   4m14s        0            2.0            2.2     48.2     52.4     52.4     52.4 delivery
		   4m14s        0           28.0           21.4     31.5     39.8     41.9     41.9 newOrder
		   4m14s        0            3.0            2.1      6.0      7.1      7.1      7.1 orderStatus
		   4m14s        0           25.0           21.9     15.2     18.9     19.9     19.9 payment
		   4m14s        0            3.0            2.1     14.7     16.8     16.8     16.8 stockLevel
		: signal: killed
	cluster.go:1626,cdc.go:221,cdc.go:425,test.go:1216: unexpected status: failed

@danhhz
Copy link
Contributor

danhhz commented Mar 28, 2019

Looks like my constants in #36132 need tuning

W190328 06:36:11.735780 10194 ccl/changefeedccl/changefeed_stmt.go:486  [n1] CHANGEFEED job 438056760563924993 saw the same non-terminal error 5 times in 3.357061917s: ccl/changefeedccl/sink.go:337 in makeKafkaSink(): (57P03) connecting to kafka: 10.142.0.186:9092: kafka: client has run out of available brokers to talk to (Is your cluster reachable?)

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/d03a34e92d2ee558fb6aedb0709b733a1fab97f4

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1207666&tab=buildLog

The test failed on master:
	cluster.go:1293,cdc.go:629,cdc.go:133,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1207666-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		   4m24s        0            4.0            2.1     50.3     54.5     54.5     54.5 delivery
		   4m24s        0           17.0           21.6     28.3     33.6     33.6     33.6 newOrder
		   4m24s        0            1.0            2.1      6.3      6.3      6.3      6.3 orderStatus
		   4m24s        0           10.0           22.3     15.7     17.8     17.8     17.8 payment
		   4m24s        0            0.0            2.2      0.0      0.0      0.0      0.0 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   4m25s        0            3.0            2.1     37.7     52.4     52.4     52.4 delivery
		   4m25s        0           17.0           21.6     32.5     39.8     41.9     41.9 newOrder
		   4m25s        0            1.0            2.1      6.8      6.8      6.8      6.8 orderStatus
		   4m25s        0           16.0           22.2     14.7     17.8     21.0     21.0 payment
		   4m25s        0            3.0            2.2     12.6     29.4     29.4     29.4 stockLevel
		: signal: killed
	cluster.go:1293,cdc.go:539,cdc.go:570,cdc.go:205,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1207666-cdc-sink-chaos-rangefeed-true:4 -- CONFLUENT_CURRENT=/mnt/data1/confluent /mnt/data1/confluent/confluent-4.0.0/bin/confluent start kafka returned:
		stderr:
		
		stdout:
		Starting zookeeper
		
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/�zookeeper is [�[0;32mUP�[0m]
		Starting kafka
		
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/: signal: killed
	cluster.go:1652,cdc.go:221,cdc.go:425,test.go:1223: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/a6b3c540b696002b2ed07036a657612995d6d1ab

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1211353&tab=buildLog

The test failed on master:
	cluster.go:1293,cdc.go:629,cdc.go:133,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1211353-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   2m13s        0            3.0            2.3     46.1     50.3     50.3     50.3 delivery
		   2m13s        0           28.0           21.1     31.5     39.8     44.0     44.0 newOrder
		   2m13s        0            3.0            2.3      7.1      7.3      7.3      7.3 orderStatus
		   2m13s        0           20.0           23.3     15.7     19.9     21.0     21.0 payment
		   2m13s        0            3.0            2.4     12.6     15.7     15.7     15.7 stockLevel
		   2m14s        0            2.0            2.3     48.2     52.4     52.4     52.4 delivery
		   2m14s        0           27.0           21.1     30.4     44.0     48.2     48.2 newOrder
		   2m14s        0            3.0            2.3      6.3      6.6      6.6      6.6 orderStatus
		   2m14s        0           30.0           23.3     15.2     18.9     23.1     23.1 payment
		   2m14s        0            3.0            2.4     12.6     12.6     12.6     12.6 stockLevel
		: signal: killed
	cluster.go:1652,cdc.go:221,cdc.go:425,test.go:1223: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/668162cc99e4f3198b663b1abfa51858eeb3ccb8

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1212251&tab=buildLog

The test failed on master:
	cluster.go:1293,cdc.go:629,cdc.go:133,cluster.go:1631,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1212251-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		l
		   2m16s        0            3.0            2.5     44.0     44.0     44.0     44.0 delivery
		   2m16s        0           30.0           21.4     31.5     37.7     39.8     39.8 newOrder
		   2m16s        0            3.0            2.2      6.3      7.3      7.3      7.3 orderStatus
		   2m16s        0           19.0           22.5     15.7     19.9     30.4     30.4 payment
		   2m16s        0            2.0            2.3     12.6     14.7     14.7     14.7 stockLevel
		_elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
		   2m17s        0            0.0            2.4      0.0      0.0      0.0      0.0 delivery
		   2m17s        0           18.0           21.4     29.4     37.7     39.8     39.8 newOrder
		   2m17s        0            0.0            2.2      0.0      0.0      0.0      0.0 orderStatus
		   2m17s        0           25.0           22.5     15.7     19.9     19.9     19.9 payment
		   2m17s        0            1.0            2.3     12.1     12.1     12.1     12.1 stockLevel
		: signal: killed
	cluster.go:1652,cdc.go:221,cdc.go:425,test.go:1223: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/2851c7d56ee4966109691b5c48c73ec8d4cc9847

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1215354&tab=buildLog

The test failed on master:
	cluster.go:1329,cdc.go:738,cdc.go:134,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1215354-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		       0            1.0            2.1     13.1     13.1     13.1     13.1 stockLevel
		   2m14s        0            2.0            2.1     37.7     41.9     41.9     41.9 delivery
		   2m14s        0           23.0           21.3     28.3     33.6     35.7     35.7 newOrder
		   2m14s        0            3.0            2.5      6.6      6.6      6.6      6.6 orderStatus
		   2m14s        0           20.0           22.6     13.6     18.9     19.9     19.9 payment
		   2m14s        0            1.0            2.1     19.9     19.9     19.9     19.9 stockLevel
		   2m15s        0            2.0            2.1     52.4     52.4     52.4     52.4 delivery
		   2m15s        0           30.0           21.3     32.5     41.9     62.9     62.9 newOrder
		   2m15s        0            1.0            2.5      6.8      6.8      6.8      6.8 orderStatus
		   2m15s        0           21.0           22.6     15.2     22.0     23.1     23.1 payment
		   2m15s        0            1.0            2.1     11.0     11.0     11.0     11.0 stockLevel
		: signal: killed
	cluster.go:1688,cdc.go:222,cdc.go:522,test.go:1226: unexpected status: failed

@cockroach-teamcity
Copy link
Member Author

SHA: https://github.com/cockroachdb/cockroach/commits/c6df752eefe4609b8a5bbada0955f79a2cfb790e

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=cdc/sink-chaos/rangefeed=true PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1217763&tab=buildLog

The test failed on master:
	cluster.go:1329,cdc.go:652,cdc.go:683,cdc.go:207,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1217763-cdc-sink-chaos-rangefeed-true:4 -- CONFLUENT_CURRENT=/mnt/data1/confluent /mnt/data1/confluent/confluent-4.0.0/bin/confluent start schema-registry returned:
		stderr:
		
		stdout:
		Starting zookeeper
		
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/
-
\
|
/�zookeeper is [�[0;32mUP�[0m]
		Starting kafka
		
-: signal: killed
	cluster.go:1329,cdc.go:746,cdc.go:135,cluster.go:1667,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1217763-cdc-sink-chaos-rangefeed-true:4 -- ./workload run tpcc --warehouses=100 --duration=30m --tolerate-errors {pgurl:1-3}  returned:
		stderr:
		
		stdout:
		       0            1.0            2.2     13.6     13.6     13.6     13.6 stockLevel
		   4m22s        0            2.0            2.3     46.1     46.1     46.1     46.1 delivery
		   4m22s        0           24.0           21.4     27.3     31.5     35.7     35.7 newOrder
		   4m22s        0            2.0            2.1      5.5      5.8      5.8      5.8 orderStatus
		   4m22s        0           18.0           21.8     13.1     18.9     19.9     19.9 payment
		   4m22s        0            2.0            2.2     13.1     13.6     13.6     13.6 stockLevel
		   4m23s        0            1.0            2.3     46.1     46.1     46.1     46.1 delivery
		   4m23s        0           26.0           21.5     30.4     35.7     37.7     37.7 newOrder
		   4m23s        0            5.0            2.2      5.5      5.8      5.8      5.8 orderStatus
		   4m23s        0           14.0           21.7     14.2     16.8     16.8     16.8 payment
		   4m23s        0            2.0            2.2      9.4     11.0     11.0     11.0 stockLevel
		: signal: killed
	cluster.go:1688,cdc.go:223,cdc.go:530,test.go:1228: unexpected status: failed

@danhhz danhhz changed the title roachtest: cdc/sink-chaos/rangefeed=true failed roachtest: cdc/sink-chaos/rangefeed=true failed [skipped] Apr 3, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 15, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in cockroachdb#35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes cockroachdb#35974
Closes cockroachdb#36018
Closes cockroachdb#36019
Closes cockroachdb#36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations
craig bot pushed a commit that referenced this issue Apr 16, 2019
36804: sql/sem/pretty: use left alignment for column names in CREATE r=knz a=knz

Before:

```
CREATE TABLE t (
    name STRING,
    id INT8
       NOT NULL
       PRIMARY KEY
)
```

After:

```
CREATE TABLE t (
    name STRING,
    id   INT8
         NOT NULL
         PRIMARY KEY
)
```


36852: changefeedccl: switch retryable errors back to a whitelist r=nvanbenschoten a=danhhz

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in #35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in #36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes #35974
Closes #36018
Closes #36019
Closes #36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations

36872: coldata: fix Slice when slicing up to batch.Length() r=yuzefovich a=asubiotto

A panic occured because we weren't treating the end slice index as
exclusive, resulting in an out of bounds panic when attempting to slice
the nulls slice.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
@craig craig bot closed this as completed in #36852 Apr 16, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 24, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in cockroachdb#35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes cockroachdb#35974
Closes cockroachdb#36018
Closes cockroachdb#36019
Closes cockroachdb#36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants