-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
router: retry overloaded requests #10847
router: retry overloaded requests #10847
Conversation
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@snowp @mattklein123 PTAL |
@tonya11en can you take a look at the linked issue and do a first pass over this? I want to make sure we are all in agreement here on the best overall approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good change and should occur, but we should be aware that this will almost certainly result in a noticeable increase in retry overflows. Retry budgets were implemented to mitigate this problem, but it still might raise an eyebrow for anyone monitoring their systems during CB load shedding events.
Just a few small comments around wording of the release note.
Note that retries may be disabled depending on the contents of the :ref:`x-envoy-overloaded | ||
<config_http_filters_router_x-envoy-overloaded_consumed>`. | ||
Note that Envoy retries requests when :ref:`x-envoy-overloaded | ||
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already set to a default value of 3, so folks should already be protected. I also would mention retry budgets here instead of the max_retries CB threshold. Especially since the max_retries circuit breaker already is enabled by default to a low value. This change will certainly result in a dramatic increase in retry overflows when circuit breaking occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I really wonder if we should at some point turn on the budgets by default vs. the hard limit. It's a much better default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonya11en I will adjust the doc.
@mattklein123 Agreed. Retry budgets are better defaults than hard limits. We should consider doing that
@@ -12,6 +12,7 @@ Changes | |||
Disabled by default and can be enabled via :ref:`enable_upstream_stats <envoy_v3_api_field_extensions.filters.http.grpc_stats.v3.FilterConfig.enable_upstream_stats>`. | |||
* http: fixed a bug where the upgrade header was not cleared on responses to non-upgrade requests. | |||
Can be reverted temporarily by setting runtime feature `envoy.reloadable_features.fix_upgrade_response` to false. | |||
* router: added support for retrying requests when upstream responds with :ref:`x-envoy-overloaded <config_http_filters_router_x-envoy-overloaded_set>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would reword this to be more clear about what this change actually does. Maybe say something like:
router: allow retries by default when encountering x-envoy-overloaded header
As it reads now, it can imply that the old behavior is still intact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to explicitly mention circuit breakers, readers might not be aware of this header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the doc and mentioned about using circuit breaker as a recommendation in the retry section. Do you think we should add it in the Version History as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few drive by comments on top of Tony's. Thank you!
/wait
Note that retries may be disabled depending on the contents of the :ref:`x-envoy-overloaded | ||
<config_http_filters_router_x-envoy-overloaded_consumed>`. | ||
Note that Envoy retries requests when :ref:`x-envoy-overloaded | ||
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, I really wonder if we should at some point turn on the budgets by default vs. the hard limit. It's a much better default?
@@ -274,14 +274,6 @@ for the next health check interval. The host can become healthy again via standa | |||
checks. See the :ref:`health checking overview <arch_overview_health_checking>` for more | |||
information. | |||
|
|||
.. _config_http_filters_router_x-envoy-overloaded_consumed: | |||
|
|||
x-envoy-overloaded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this header documented anywhere else? I think we still should document it. I think it does still make sense to send it as downstream may want to track it, but we should document somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked earlier- it's explained further down in the same file.
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
/retest |
🐴 hold your horses - no failures detected, yet. |
/retest |
🔨 rebuilding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with small comments, thanks!
/wait
Note that Envoy retries requests when :ref:`x-envoy-overloaded | ||
<config_http_filters_router_x-envoy-overloaded_set>` is present. It is recommended to either configure | ||
:ref:`retry budgets (preferred) <envoy_api_field_cluster.CircuitBreakers.Thresholds.retry_budget>` or set | ||
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value avoid retry storms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value avoid retry storms. | |
:ref:`maximum active retries circuit breaker <arch_overview_circuit_break>` to an appropriate value to avoid retry storms. |
@@ -281,10 +281,6 @@ RetryStatus RetryStateImpl::shouldHedgeRetryPerTryTimeout(DoRetryCallback callba | |||
} | |||
|
|||
bool RetryStateImpl::wouldRetryFromHeaders(const Http::ResponseHeaderMap& response_headers) { | |||
if (response_headers.EnvoyOverloaded() != nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this from the O(1) header map? I think we don't need this anymore as the header can be added by string reference where it is added currently?
Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Signed-off-by: Rama Chavali <rama.rao@salesforce.com> Signed-off-by: pengg <pengg@google.com>
Signed-off-by: Spencer Lewis <slewis@squareup.com> * master: (46 commits) allow specifying the API version of bootstrap from the command line (envoyproxy#10803) config: adding connect matcher (unused) (envoyproxy#10894) Add missing dependency on `assert.h` (envoyproxy#10918) Lower heap and disk space used by kafka tests (envoyproxy#10915) [tools] handle commits merged without PR in deprecated script (envoyproxy#10723) tools: including working tree in modified_since_last_github_commit.sh diff. (envoyproxy#10911) rocketmq_proxy: implement rocketmq proxy [docs] PR template to include commit message (envoyproxy#10900) docs: breaking long word to stop content overflow. (envoyproxy#10880) Delete legacy connection pool code. (envoyproxy#10881) wasm: clarify how configuration is passed (envoyproxy#10782) issue template: clarify security/crash reporting (envoyproxy#10885) api/faq: add entry on incremental xDS. (envoyproxy#10876) router: retry overloaded requests (envoyproxy#10847) Remove inclusion of pthread.h, not needed for linux compilation (envoyproxy#10895) request_id: Add option to always set request id in response (envoyproxy#10808) xray: Use correct types for segment document output (envoyproxy#10834) router: fixing a watermark bug for streaming retries (envoyproxy#10866) http: auditing Path() calls for safety with Pathless CONNECT (envoyproxy#10851) Remove hardcoded type urls Part.2 (envoyproxy#10848) ...
Description: Retries overloaded requests
Risk Level: Low
Testing: Added
Docs Changes: Updated
Release Notes: Added
Fixes #10706