-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: improve retryable error handling #36432
Comments
danhhz
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-cdc
Change Data Capture
labels
Apr 2, 2019
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Apr 2, 2019
As cockroachdb#36432 mentions, we know what's wrong. We're not getting any new data from running these, so stop spamming failures until we get a chance to fix them. Release note: None
This was referenced Apr 2, 2019
Tried a revert and a) there are some conflicts and b) I realized there were some nice cleanups in here that I don't super want to revert. We intentionally never backported this into 19.1, I'm now considering just leaving it as-is with the roachtests skipped until I get a chance to do another pass (hopefully next week at the latest). |
craig bot
pushed a commit
that referenced
this issue
Apr 3, 2019
36438: roachtest: skip cdc/crdb-chaos and cdc/sink-chaos r=tbg a=danhhz As #36432 mentions, we know what's wrong. We're not getting any new data from running these, so stop spamming failures until we get a chance to fix them. Release note: None Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Apr 15, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in cockroachdb#35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes cockroachdb#35974 Closes cockroachdb#36018 Closes cockroachdb#36019 Closes cockroachdb#36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations
craig bot
pushed a commit
that referenced
this issue
Apr 16, 2019
36804: sql/sem/pretty: use left alignment for column names in CREATE r=knz a=knz Before: ``` CREATE TABLE t ( name STRING, id INT8 NOT NULL PRIMARY KEY ) ``` After: ``` CREATE TABLE t ( name STRING, id INT8 NOT NULL PRIMARY KEY ) ``` 36852: changefeedccl: switch retryable errors back to a whitelist r=nvanbenschoten a=danhhz For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in #35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in #36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes #35974 Closes #36018 Closes #36019 Closes #36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations 36872: coldata: fix Slice when slicing up to batch.Length() r=yuzefovich a=asubiotto A panic occured because we weren't treating the end slice index as exclusive, resulting in an out of bounds panic when attempting to slice the nulls slice. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com> Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com> Co-authored-by: Alfonso Subiotto Marqués <alfonso@cockroachlabs.com>
danhhz
added a commit
to danhhz/cockroach
that referenced
this issue
Apr 24, 2019
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in cockroachdb#35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) in cockroachdb#36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. The commit is intended for backport into 19.1 once it's baked for a bit. Closes cockroachdb#35974 Closes cockroachdb#36018 Closes cockroachdb#36019 Closes cockroachdb#36432 Release note (bug fix): `CHANGEFEED` now retry instead of erroring in more situations
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For a while, the cdc/crdb-chaos and cdc/sink-chaos roachtests have been failing because an error that should be marked as retryable wasn't. As a result of the discussion in #35974, I tried switching from a whitelist (retryable error) to a blacklist (terminal error) #36132, but on reflection this doesn't seem like a great idea. We added a safety net to prevent false negatives from retrying indefinitely but it was immediately apparent that this meant we needed to tune the retry loop parameters. Better is to just do the due diligence of investigating the errors that should be retried and retrying them. I'm about to revert the error blacklist pr and (since we're not getting any new information from them) skip the two choas roachtests until I can get to fixing up the whitelist (likely next week, given that I'm on support rotation this week). This is a placeholder issue to summarize what's going on.
The text was updated successfully, but these errors were encountered: