Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: improve retryable error handling #36432

Closed
danhhz opened this issue Apr 2, 2019 · 1 comment · Fixed by #36852
Closed

changefeedccl: improve retryable error handling #36432

danhhz opened this issue Apr 2, 2019 · 1 comment · Fixed by #36852
Assignees
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@danhhz
Copy link
Contributor

danhhz commented Apr 2, 2019

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in #35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) #36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. I'm about to revert the error blacklist pr and (since we're not getting any new information from them) skip the two choas roachtests until I can get to fixing up the whitelist (likely next week, given that I'm on support rotation this week). This is a placeholder issue to summarize what's going on.

@danhhz danhhz added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture labels Apr 2, 2019
@danhhz danhhz self-assigned this Apr 2, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 2, 2019
As cockroachdb#36432 mentions, we know what's wrong. We're not getting any new data
from running these, so stop spamming failures until we get a chance to
fix them.

Release note: None
@danhhz
Copy link
Contributor Author

danhhz commented Apr 2, 2019

Tried a revert and a) there are some conflicts and b) I realized there were some nice cleanups in here that I don't super want to revert. We intentionally never backported this into 19.1, I'm now considering just leaving it as-is with the roachtests skipped until I get a chance to do another pass (hopefully next week at the latest).

craig bot pushed a commit that referenced this issue Apr 3, 2019
36438: roachtest: skip cdc/crdb-chaos and cdc/sink-chaos r=tbg a=danhhz

As #36432 mentions, we know what's wrong. We're not getting any new data
from running these, so stop spamming failures until we get a chance to
fix them.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 15, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in cockroachdb#35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes cockroachdb#35974
Closes cockroachdb#36018
Closes cockroachdb#36019
Closes cockroachdb#36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations
craig bot pushed a commit that referenced this issue Apr 16, 2019
36804: sql/sem/pretty: use left alignment for column names in CREATE r=knz a=knz

Before:

```
CREATE TABLE t (
    name STRING,
    id INT8
       NOT NULL
       PRIMARY KEY
)
```

After:

```
CREATE TABLE t (
    name STRING,
    id   INT8
         NOT NULL
         PRIMARY KEY
)
```


36852: changefeedccl: switch retryable errors back to a whitelist r=nvanbenschoten a=danhhz

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in #35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in #36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes #35974
Closes #36018
Closes #36019
Closes #36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations

36872: coldata: fix Slice when slicing up to batch.Length() r=yuzefovich a=asubiotto

A panic occured because we weren't treating the end slice index as
exclusive, resulting in an out of bounds panic when attempting to slice
the nulls slice.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
@craig craig bot closed this as completed in #36852 Apr 16, 2019
danhhz added a commit to danhhz/cockroach that referenced this issue Apr 24, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in cockroachdb#35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes cockroachdb#35974
Closes cockroachdb#36018
Closes cockroachdb#36019
Closes cockroachdb#36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant