changefeedccl: improve retryable error handling #36432

danhhz · 2019-04-02T18:20:02Z

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in #35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) #36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. I'm about to revert the error blacklist pr and (since we're not getting any new information from them) skip the two choas roachtests until I can get to fixing up the whitelist (likely next week, given that I'm on support rotation this week). This is a placeholder issue to summarize what's going on.

As cockroachdb#36432 mentions, we know what's wrong. We're not getting any new data from running these, so stop spamming failures until we get a chance to fix them. Release note: None

danhhz · 2019-04-02T19:50:24Z

Tried a revert and a) there are some conflicts and b) I realized there were some nice cleanups in here that I don't super want to revert. We intentionally never backported this into 19.1, I'm now considering just leaving it as-is with the roachtests skipped until I get a chance to do another pass (hopefully next week at the latest).

36438: roachtest: skip cdc/crdb-chaos and cdc/sink-chaos r=tbg a=danhhz As #36432 mentions, we know what's wrong. We're not getting any new data from running these, so stop spamming failures until we get a chance to fix them. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in cockroachdb#35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes cockroachdb#35974 Closes cockroachdb#36018 Closes cockroachdb#36019 Closes cockroachdb#36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations

36804: sql/sem/pretty: use left alignment for column names in CREATE r=knz a=knz Before: ``` CREATE TABLE t ( name STRING, id INT8 NOT NULL PRIMARY KEY ) ``` After: ``` CREATE TABLE t ( name STRING, id INT8 NOT NULL PRIMARY KEY ) ``` 36852: changefeedccl: switch retryable errors back to a whitelist r=nvanbenschoten a=danhhz For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in #35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in #36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes #35974 Closes #36018 Closes #36019 Closes #36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations 36872: coldata: fix Slice when slicing up to batch.Length() r=yuzefovich a=asubiotto A panic occured because we weren't treating the end slice index as exclusive, resulting in an out of bounds panic when attempting to slice the nulls slice. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in cockroachdb#35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes cockroachdb#35974 Closes cockroachdb#36018 Closes cockroachdb#36019 Closes cockroachdb#36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations

danhhz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture labels Apr 2, 2019

danhhz self-assigned this Apr 2, 2019

This was referenced Apr 2, 2019

roachtest: skip cdc/crdb-chaos and cdc/sink-chaos #36438

Merged

ccl/changefeedccl: TestChangefeedDataTTL failed under stress #36369

Closed

danhhz mentioned this issue Apr 15, 2019

changefeedccl: switch retryable errors back to a whitelist #36852

Merged

craig bot closed this as completed in #36852 Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedccl: improve retryable error handling #36432

changefeedccl: improve retryable error handling #36432

danhhz commented Apr 2, 2019 •

edited

Loading

danhhz commented Apr 2, 2019

changefeedccl: improve retryable error handling #36432

changefeedccl: improve retryable error handling #36432

Comments

danhhz commented Apr 2, 2019 • edited Loading

danhhz commented Apr 2, 2019

danhhz commented Apr 2, 2019 •

edited

Loading