-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fault injection: add support for setting gRPC status #10841
Changes from 4 commits
3990560
5a77205
7db1fb3
0b369ea
160a6fa
535fa00
e6ff272
4576e19
8d19697
0ff968f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,12 @@ x-envoy-fault-abort-request | |
In order for the header to work, :ref:`header_abort | ||
<envoy_api_field_config.filter.http.fault.v2.FaultAbort.header_abort>` needs to be set. | ||
|
||
x-envoy-fault-abort-grpc-request | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we have a % header here also? cc @Augustyniak There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think that we should add a support for Together with @buildbreaker @murki we discussed potential scenarios where having There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I missed one thing when I posted my initial comment here. Initially, I thought that GRPC aborts would be applied on top of http aborts meaning something like:
and for this reason I thought that we needed the addition of Now that I know that the implementation looks more like apply "either HTTP or GRPC abort" (only one of them) I don't see any benefits of us having I really feel bad asking you to do this @vijayrajput1 but could we get rid of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @Augustyniak , I actually had this (using one header) approach in mind earlier but then I kinda convinced myself having a matching header would make things more clear. Meaning use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah agreed on using 1 header. Sorry, I didn't look very deeply at this before punting review over to @Augustyniak. Apologies also! |
||
gRPC status code to abort a request with. The header value should be an integer that specifies | ||
the gRPC status code to return in response to a request and must be in the range [0, 16]. | ||
When this header is set, the HTTP response status code will be set to 200. In order for the header to work, | ||
:ref:`header_abort <envoy_api_field_config.filter.http.fault.v2.FaultAbort.header_abort>` needs to be set. | ||
|
||
x-envoy-fault-abort-request-percentage | ||
The percentage of requests that should be failed with a status code that's defined | ||
by the value of *x-envoy-fault-abort-request* HTTP header. The header value should be an integer | ||
|
@@ -144,6 +150,14 @@ fault.http.abort.http_status | |
available regardless of whether the filter is :ref:`configured for abort | ||
<envoy_api_field_config.filter.http.fault.v2.HTTPFault.abort>`. | ||
|
||
fault.http.abort.grpc_status | ||
gRPC status code that will be used as the response status code of requests that will be | ||
aborted if the headers match. Defaults to the gRPC status code specified | ||
in the config. If this field is missing from both the runtime and the config, | ||
gRPC status code in the response will be derived from *fault.http.abort.http_status* field. | ||
This runtime key is only available when the filter is :ref:`configured for abort | ||
<envoy_api_field_config.filter.http.fault.v2.HTTPFault.abort>`. | ||
|
||
fault.http.delay.fixed_delay_percent | ||
% of requests that will be delayed if the headers match. Defaults to the | ||
*delay_percent* specified in the config or 0 otherwise. This runtime key is only available when | ||
|
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,7 +34,14 @@ FaultAbortConfig::FaultAbortConfig( | |
switch (abort_config.error_type_case()) { | ||
case envoy::extensions::filters::http::fault::v3::FaultAbort::ErrorTypeCase::kHttpStatus: | ||
provider_ = | ||
std::make_unique<FixedAbortProvider>(abort_config.http_status(), abort_config.percentage()); | ||
std::make_unique<FixedAbortProvider>(static_cast<Http::Code>(abort_config.http_status()), | ||
absl::nullopt, abort_config.percentage()); | ||
break; | ||
case envoy::extensions::filters::http::fault::v3::FaultAbort::ErrorTypeCase::kGrpcStatus: | ||
// If gRPC status code is set, then HTTP will be set to 200 | ||
provider_ = std::make_unique<FixedAbortProvider>( | ||
Http::Code::OK, static_cast<Grpc::Status::GrpcStatus>(abort_config.grpc_status()), | ||
abort_config.percentage()); | ||
break; | ||
case envoy::extensions::filters::http::fault::v3::FaultAbort::ErrorTypeCase::kHeaderAbort: | ||
provider_ = std::make_unique<HeaderAbortProvider>(abort_config.percentage()); | ||
|
@@ -44,10 +51,9 @@ FaultAbortConfig::FaultAbortConfig( | |
} | ||
} | ||
|
||
absl::optional<Http::Code> FaultAbortConfig::HeaderAbortProvider::statusCode( | ||
const Http::RequestHeaderMap* request_headers) const { | ||
absl::optional<Http::Code> ret; | ||
const auto header = request_headers->get(HeaderNames::get().AbortRequest); | ||
absl::optional<Http::Code> | ||
FaultAbortConfig::HeaderAbortProvider::httpStatusCode(const Http::HeaderEntry* header) const { | ||
absl::optional<Http::Code> ret = absl::nullopt; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think that we need There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep its not required, I put that in there to make it apparent. I had to read There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's fine either way. I wouldn't worry about it unless we need another push. |
||
if (header == nullptr) { | ||
return ret; | ||
} | ||
|
@@ -64,6 +70,25 @@ absl::optional<Http::Code> FaultAbortConfig::HeaderAbortProvider::statusCode( | |
return ret; | ||
} | ||
|
||
absl::optional<Grpc::Status::GrpcStatus> | ||
FaultAbortConfig::HeaderAbortProvider::grpcStatusCode(const Http::HeaderEntry* header) const { | ||
absl::optional<Grpc::Status::GrpcStatus> ret = absl::nullopt; | ||
if (header == nullptr) { | ||
return ret; | ||
} | ||
|
||
uint64_t code; | ||
if (!absl::SimpleAtoi(header->value().getStringView(), &code)) { | ||
return ret; | ||
} | ||
|
||
if (code <= 16) { | ||
ret = static_cast<Grpc::Status::GrpcStatus>(code); | ||
} | ||
|
||
return ret; | ||
} | ||
|
||
FaultDelayConfig::FaultDelayConfig( | ||
const envoy::extensions::filters::common::fault::v3::FaultDelay& delay_config) { | ||
switch (delay_config.fault_delay_secifier_case()) { | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
|
||
#include "envoy/extensions/filters/common/fault/v3/fault.pb.h" | ||
#include "envoy/extensions/filters/http/fault/v3/fault.pb.h" | ||
#include "envoy/grpc/status.h" | ||
#include "envoy/http/header_map.h" | ||
#include "envoy/type/v3/percent.pb.h" | ||
|
||
|
@@ -20,6 +21,7 @@ class HeaderNameValues { | |
const char* prefix() { return ThreadSafeSingleton<Http::PrefixValue>::get().prefix(); } | ||
|
||
const Http::LowerCaseString AbortRequest{absl::StrCat(prefix(), "-fault-abort-request")}; | ||
const Http::LowerCaseString AbortGrpcRequest{absl::StrCat(prefix(), "-fault-abort-grpc-request")}; | ||
const Http::LowerCaseString AbortRequestPercentage{ | ||
absl::StrCat(prefix(), "-fault-abort-request-percentage")}; | ||
const Http::LowerCaseString DelayRequest{absl::StrCat(prefix(), "-fault-delay-request")}; | ||
|
@@ -53,8 +55,11 @@ class FaultAbortConfig { | |
public: | ||
FaultAbortConfig(const envoy::extensions::filters::http::fault::v3::FaultAbort& abort_config); | ||
|
||
absl::optional<Http::Code> statusCode(const Http::RequestHeaderMap* request_headers) const { | ||
return provider_->statusCode(request_headers); | ||
absl::optional<Http::Code> httpStatusCode(const Http::HeaderEntry* header) const { | ||
return provider_->httpStatusCode(header); | ||
} | ||
absl::optional<Grpc::Status::GrpcStatus> grpcStatusCode(const Http::HeaderEntry* header) const { | ||
return provider_->grpcStatusCode(header); | ||
} | ||
|
||
envoy::type::v3::FractionalPercent | ||
|
@@ -70,31 +75,44 @@ class FaultAbortConfig { | |
|
||
// Return the HTTP status code to use. Optionally passed HTTP headers that may contain the | ||
// HTTP status code depending on the provider implementation. | ||
virtual absl::optional<Http::Code> | ||
statusCode(const Http::RequestHeaderMap* request_headers) const PURE; | ||
virtual absl::optional<Http::Code> httpStatusCode(const Http::HeaderEntry* header) const PURE; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why not continue to pass the full header map here and below? It seems like it would be more clear if the header was referenced closer to where it is used? |
||
|
||
// Return the gRPC status code to use. Optionally passed an HTTP header that may contain the | ||
// gRPC status code depending on the provider implementation. | ||
virtual absl::optional<Grpc::Status::GrpcStatus> | ||
grpcStatusCode(const Http::HeaderEntry* header) const PURE; | ||
|
||
// Return what percentage of requests abort faults should be applied to. Optionally passed | ||
// HTTP headers that may contain the percentage depending on the provider implementation. | ||
virtual envoy::type::v3::FractionalPercent | ||
percentage(const Http::RequestHeaderMap* request_headers) const PURE; | ||
}; | ||
|
||
// Delay provider that uses a fixed abort status code. | ||
// Abort provider that uses a fixed abort status code. | ||
class FixedAbortProvider : public AbortProvider { | ||
public: | ||
FixedAbortProvider(uint64_t status_code, const envoy::type::v3::FractionalPercent& percentage) | ||
: status_code_(status_code), percentage_(percentage) {} | ||
FixedAbortProvider(Http::Code http_status_code, | ||
absl::optional<Grpc::Status::GrpcStatus> grpc_status_code, | ||
const envoy::type::v3::FractionalPercent& percentage) | ||
: http_status_code_(http_status_code), grpc_status_code_(grpc_status_code), | ||
percentage_(percentage) {} | ||
|
||
// AbortProvider | ||
absl::optional<Http::Code> statusCode(const Http::RequestHeaderMap*) const override { | ||
return static_cast<Http::Code>(status_code_); | ||
absl::optional<Http::Code> httpStatusCode(const Http::HeaderEntry*) const override { | ||
return http_status_code_; | ||
} | ||
|
||
absl::optional<Grpc::Status::GrpcStatus> | ||
grpcStatusCode(const Http::HeaderEntry*) const override { | ||
return grpc_status_code_; | ||
} | ||
|
||
envoy::type::v3::FractionalPercent percentage(const Http::RequestHeaderMap*) const override { | ||
return percentage_; | ||
} | ||
|
||
private: | ||
const uint64_t status_code_; | ||
const absl::optional<Http::Code> http_status_code_; | ||
const absl::optional<Grpc::Status::GrpcStatus> grpc_status_code_; | ||
const envoy::type::v3::FractionalPercent percentage_; | ||
}; | ||
|
||
|
@@ -103,9 +121,11 @@ class FaultAbortConfig { | |
public: | ||
HeaderAbortProvider(const envoy::type::v3::FractionalPercent& percentage) | ||
: HeaderPercentageProvider(HeaderNames::get().AbortRequestPercentage, percentage) {} | ||
// AbortProvider | ||
absl::optional<Http::Code> | ||
statusCode(const Http::RequestHeaderMap* request_headers) const override; | ||
|
||
absl::optional<Http::Code> httpStatusCode(const Http::HeaderEntry* header) const override; | ||
|
||
absl::optional<Grpc::Status::GrpcStatus> | ||
grpcStatusCode(const Http::HeaderEntry*) const override; | ||
|
||
envoy::type::v3::FractionalPercent | ||
percentage(const Http::RequestHeaderMap* request_headers) const override { | ||
|
@@ -180,6 +200,7 @@ class FaultDelayConfig { | |
public: | ||
HeaderDelayProvider(const envoy::type::v3::FractionalPercent& percentage) | ||
: HeaderPercentageProvider(HeaderNames::get().DelayRequestPercentage, percentage) {} | ||
|
||
// DelayProvider | ||
absl::optional<std::chrono::milliseconds> | ||
duration(const Http::RequestHeaderMap* request_headers) const override; | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,6 +43,8 @@ FaultSettings::FaultSettings(const envoy::extensions::filters::http::fault::v3:: | |
RuntimeKeys::get().DelayDurationKey)), | ||
abort_http_status_runtime_(PROTOBUF_GET_STRING_OR_DEFAULT( | ||
fault, abort_http_status_runtime, RuntimeKeys::get().AbortHttpStatusKey)), | ||
abort_grpc_status_runtime_(PROTOBUF_GET_STRING_OR_DEFAULT( | ||
fault, abort_grpc_status_runtime, RuntimeKeys::get().AbortGrpcStatusKey)), | ||
max_active_faults_runtime_(PROTOBUF_GET_STRING_OR_DEFAULT( | ||
fault, max_active_faults_runtime, RuntimeKeys::get().MaxActiveFaultsKey)), | ||
response_rate_limit_percent_runtime_( | ||
|
@@ -164,9 +166,12 @@ Http::FilterHeadersStatus FaultFilter::decodeHeaders(Http::RequestHeaderMap& hea | |
return Http::FilterHeadersStatus::StopIteration; | ||
} | ||
|
||
const auto abort_code = abortHttpStatus(headers); | ||
if (abort_code.has_value()) { | ||
abortWithHTTPStatus(abort_code.value()); | ||
absl::optional<Http::Code> http_status; | ||
absl::optional<Grpc::Status::GrpcStatus> grpc_status; | ||
std::tie(http_status, grpc_status) = abortStatus(headers); | ||
|
||
if (http_status.has_value()) { | ||
abortWithStatus(http_status.value(), grpc_status); | ||
return Http::FilterHeadersStatus::StopIteration; | ||
} | ||
|
||
|
@@ -284,29 +289,57 @@ FaultFilter::delayDuration(const Http::RequestHeaderMap& request_headers) { | |
return ret; | ||
} | ||
|
||
absl::optional<Http::Code> | ||
FaultFilter::abortHttpStatus(const Http::RequestHeaderMap& request_headers) { | ||
AbortHttpAndGrpcStatus FaultFilter::abortStatus(const Http::RequestHeaderMap& request_headers) { | ||
if (!isAbortEnabled(request_headers)) { | ||
return absl::nullopt; | ||
return std::make_pair(absl::nullopt, absl::nullopt); | ||
} | ||
|
||
auto grpc_status = abortGrpcStatus(request_headers); | ||
|
||
// If gRPC status code is set, then HTTP will be set to 200 | ||
return grpc_status.has_value() ? std::make_pair(Http::Code::OK, grpc_status) | ||
: std::make_pair(abortHttpStatus(request_headers), grpc_status); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we pass |
||
} | ||
|
||
absl::optional<Http::Code> | ||
FaultFilter::abortHttpStatus(const Http::RequestHeaderMap& request_headers) { | ||
// See if the configured abort provider has a default status code, if not there is no abort status | ||
// code (e.g., header configuration and no/invalid header). | ||
const auto config_abort = fault_settings_->requestAbort()->statusCode(&request_headers); | ||
if (!config_abort.has_value()) { | ||
auto http_status = fault_settings_->requestAbort()->httpStatusCode( | ||
request_headers.get(Filters::Common::Fault::HeaderNames::get().AbortRequest)); | ||
if (!http_status.has_value()) { | ||
return absl::nullopt; | ||
} | ||
|
||
auto status_code = static_cast<uint64_t>(config_abort.value()); | ||
auto code = static_cast<Http::Code>(config_->runtime().snapshot().getInteger( | ||
fault_settings_->abortHttpStatusRuntime(), status_code)); | ||
auto default_http_status_code = static_cast<uint64_t>(http_status.value()); | ||
auto runtime_http_status_code = static_cast<Http::Code>(config_->runtime().snapshot().getInteger( | ||
fault_settings_->abortHttpStatusRuntime(), default_http_status_code)); | ||
|
||
if (!downstream_cluster_abort_http_status_key_.empty()) { | ||
code = static_cast<Http::Code>(config_->runtime().snapshot().getInteger( | ||
downstream_cluster_abort_http_status_key_, status_code)); | ||
runtime_http_status_code = static_cast<Http::Code>(config_->runtime().snapshot().getInteger( | ||
downstream_cluster_abort_http_status_key_, default_http_status_code)); | ||
} | ||
|
||
return code; | ||
return runtime_http_status_code; | ||
} | ||
|
||
absl::optional<Grpc::Status::GrpcStatus> | ||
FaultFilter::abortGrpcStatus(const Http::RequestHeaderMap& request_headers) { | ||
auto grpc_status = fault_settings_->requestAbort()->grpcStatusCode( | ||
request_headers.get(Filters::Common::Fault::HeaderNames::get().AbortGrpcRequest)); | ||
|
||
auto default_grpc_status_code = grpc_status.has_value() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. shouldn't we return an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. intention here is to support setting gRPC status even through just runtime config. For example, even if abort fault filter configuration and header doesn't have grpc_status but if the abort_grpc_status_runtime is set we still want to set the grpc_status. Does this make sense ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use case you want to support definitely makes sense. 👍 I'm just a little bit worried that GRPC aborts will end up using proto config in a way that's different from the way other faults use their protos. "Fixed" fault providers - including HTTP status code, response rate limit and delay request - providers, all require proto config values to be present. For the user of the Envoy, it may be a little bit confusing if for GRPC aborts we follow a different convention. That being said, if runtime-only config is the thing you want to support I'm going to leave it up to more Envoy experienced folks to decide whether it's ok or no to have this fault provider not require full proto config (no GRPC status code required in proto config). //EDIT There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with @Augustyniak that we should be consistent here. I'm going to defer to @Augustyniak for initial review of this PR and I will do a pass after. :) |
||
? static_cast<uint64_t>(grpc_status.value()) | ||
: std::numeric_limits<uint64_t>::max(); | ||
|
||
auto runtime_grpc_status_code = config_->runtime().snapshot().getInteger( | ||
fault_settings_->abortGrpcStatusRuntime(), default_grpc_status_code); | ||
|
||
if (runtime_grpc_status_code == std::numeric_limits<uint64_t>::max()) { | ||
return absl::nullopt; | ||
} | ||
|
||
return static_cast<Grpc::Status::GrpcStatus>(runtime_grpc_status_code); | ||
} | ||
|
||
void FaultFilter::recordDelaysInjectedStats() { | ||
|
@@ -375,19 +408,23 @@ void FaultFilter::postDelayInjection(const Http::RequestHeaderMap& request_heade | |
resetTimerState(); | ||
|
||
// Delays can be followed by aborts | ||
const auto abort_code = abortHttpStatus(request_headers); | ||
if (abort_code.has_value()) { | ||
abortWithHTTPStatus(abort_code.value()); | ||
absl::optional<Http::Code> http_status; | ||
absl::optional<Grpc::Status::GrpcStatus> grpc_status; | ||
std::tie(http_status, grpc_status) = abortStatus(request_headers); | ||
|
||
if (http_status.has_value()) { | ||
abortWithStatus(http_status.value(), grpc_status); | ||
} else { | ||
// Continue request processing. | ||
decoder_callbacks_->continueDecoding(); | ||
} | ||
} | ||
|
||
void FaultFilter::abortWithHTTPStatus(Http::Code abort_code) { | ||
void FaultFilter::abortWithStatus(Http::Code http_status_code, | ||
absl::optional<Grpc::Status::GrpcStatus> grpc_status_code) { | ||
decoder_callbacks_->streamInfo().setResponseFlag(StreamInfo::ResponseFlag::FaultInjected); | ||
decoder_callbacks_->sendLocalReply(abort_code, "fault filter abort", nullptr, absl::nullopt, | ||
RcDetails::get().FaultAbort); | ||
decoder_callbacks_->sendLocalReply(http_status_code, "fault filter abort", nullptr, | ||
grpc_status_code, RcDetails::get().FaultAbort); | ||
recordAbortsInjectedStats(); | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should drop the "lte" restriction here so that we can test unknown gRPC codes above those well defined. This is a really common source of bugs (including in envoy itself) so it will be nice to be able to test this. I might clarify in the docs that we allow testing any numeric code, even those that are not well defined. (docs need updating also)