Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedccl: switch retryable errors back to a whitelist #36852

Merged
merged 2 commits into from
Apr 16, 2019

Conversation

danhhz
Copy link
Contributor

@danhhz danhhz commented Apr 15, 2019

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in #35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in #36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes #35974
Closes #36018
Closes #36019
Closes #36432

Release note (bug fix): CHANGEFEED now retry instead of erroring in
more situations

…to blacklist"

This reverts commit 200567d.

Release note: None
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in cockroachdb#35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes cockroachdb#35974
Closes cockroachdb#36018
Closes cockroachdb#36019
Closes cockroachdb#36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations
@danhhz danhhz requested review from tbg, nvanbenschoten and a team April 15, 2019 21:48
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 12 of 12 files at r1, 10 of 10 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @tbg)

@danhhz
Copy link
Contributor Author

danhhz commented Apr 16, 2019

Thanks for the review! Ran extended versions of these roachtests today and didn't see anything pop up of the original whitelisting issues. I did see crdb-choas sometimes take a surprising amount of time to recover, so filed #36879 to follow up.

bors r=nvanbenschoten

craig bot pushed a commit that referenced this pull request Apr 16, 2019
36804: sql/sem/pretty: use left alignment for column names in CREATE r=knz a=knz

Before:

```
CREATE TABLE t (
    name STRING,
    id INT8
       NOT NULL
       PRIMARY KEY
)
```

After:

```
CREATE TABLE t (
    name STRING,
    id   INT8
         NOT NULL
         PRIMARY KEY
)
```


36852: changefeedccl: switch retryable errors back to a whitelist r=nvanbenschoten a=danhhz

For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been
failing because an error that should be marked as retryable wasn't. As a
result of the discussion in #35974, I tried switching from a whitelist
(retryable error) to a blacklist (terminal error) in #36132, but on
reflection this doesn't seem like a great idea. We added a safety net to
prevent false negatives from retrying indefinitely but it was
immediately apparent that this meant we needed to tune the retry loop
parameters. Better is to just do the due diligence of investigating the
errors that should be retried and retrying them.

The commit is intended for backport into 19.1 once it's baked for a bit.

Closes #35974
Closes #36018
Closes #36019
Closes #36432

Release note (bug fix): `CHANGEFEED` now retry instead of erroring in
more situations

36872: coldata: fix Slice when slicing up to batch.Length() r=yuzefovich a=asubiotto

A panic occured because we weren't treating the end slice index as
exclusive, resulting in an out of bounds panic when attempting to slice
the nulls slice.

Release note: None

Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com>
Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Apr 16, 2019

Build succeeded

@craig craig bot merged commit 04f4b54 into cockroachdb:master Apr 16, 2019
@danhhz danhhz deleted the cdc_errors branch April 17, 2019 18:47
danhhz added a commit to danhhz/cockroach that referenced this pull request Apr 22, 2019
They're known to be flaky until cockroachdb#36852 gets backported but that won't
happen for another week.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 23, 2019
37011: roachtest: skip crdb/{crdb,sink}-chaos on 2.1 and 19.1 r=tbg a=danhhz

They're known to be flaky until #36852 gets backported but that won't
happen for another week.

Release note: None

Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants