-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CORE-5749 retry on dns failure #22327
CORE-5749 retry on dns failure #22327
Conversation
During a rolling upgrade in cloud, it was observed that RP's kafka client would attempt to connect to the 'reserve' node after it was decomissioned. This was because the error code (C-Ares ENOTFOUND) was not treated as a retriable error. This change checks for the above error code when attempting to connect to a broker and if it is encountered, treats it as a retriable error. Signed-off-by: Michael Boquard <michael@redpanda.com>
Signed-off-by: Michael Boquard <michael@redpanda.com>
e.code().category() == std::system_category() | ||
|| e.code().category() == std::generic_category()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've been reading the cppreference pages, and i still don't quite understand what the difference is between these... oh well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "correct" way is not to inspect the category, but to compare with the error enum:
if (e.code().category() == ss::tls::error_category()) {
return absl::c_any_of(
ss_tls_reconnect_errors, [v](int ec) { return v == ec; });
} else if (
e.code() == std::errc::connection_refused
|| e.code() == std::errc::host_unreachable
|| e.code() == std::errc::timed_out
...
operator==
provides a mapping between std::system_category
and std::generic_category
, I can't remember which is which, but on the platforms we care about , one is a subset of the other, and that subset has the same numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool thx
new failures in https://buildkite.com/redpanda/redpanda/builds/52044#0190eaee-0738-4c73-b8e7-0cd4b5123a4b:
new failures in https://buildkite.com/redpanda/redpanda/builds/52044#0190eaed-614b-4046-a895-eb78caf7c03d:
new failures in https://buildkite.com/redpanda/redpanda/builds/52044#0190eb44-8f4b-45f1-8718-935118a82900:
|
ducktape was retried in https://buildkite.com/redpanda/redpanda/builds/52044#0190eb44-8f4b-45f1-8718-935118a82900 |
CI Failure: |
/backport v24.2.x |
/backport v24.1.x |
/backport v23.3.x |
Failed to create a backport PR to v24.1.x branch. I tried:
|
Failed to create a backport PR to v23.3.x branch. I tried:
|
Backports Required
Release Notes
Bug Fixes