Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP health checker: handle GOAWAY from HTTP2 upstreams #13599

Merged
merged 51 commits into from
Jan 12, 2021
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ce00ea3
HTTP health checker: handle GOAWAY from HTTP2 upstreams
mpuncel Oct 14, 2020
9f83533
fix format of version history to be compliant with check_format.py
mpuncel Oct 15, 2020
736d0ca
fix spelling mistake in comment
mpuncel Oct 15, 2020
f3e9da2
fix incorrect testcomment
mpuncel Oct 16, 2020
b76c46e
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 17, 2020
01fa365
fix alphabetical order in version.rst
mpuncel Oct 19, 2020
cd06c28
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 21, 2020
f899f0a
Add runtime disable option to go back to old GOAWAY behavior
mpuncel Oct 21, 2020
8f21af4
follow runtime naming convention, make new flag default to true, remo…
mpuncel Oct 22, 2020
c23a9a5
document runtime guard in release notes
mpuncel Oct 22, 2020
447d07c
group common mock expectations into helper methods
mpuncel Oct 22, 2020
04458ab
rename received_no_error_goaway_ to reuse_connection_, and consolidat…
mpuncel Oct 22, 2020
0d57547
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 29, 2020
15d4dcd
use runtimeFeatureEnabled() and don't close() twice after reset stream
mpuncel Oct 30, 2020
00503eb
add some integration tests covering GOAWAY
mpuncel Oct 30, 2020
bbd6e08
fix incorrect comments
mpuncel Oct 30, 2020
bb0cc29
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Oct 30, 2020
2b00126
address PR feedback
mpuncel Nov 10, 2020
2352719
attempt, no goaway in fake upstream
mpuncel Nov 11, 2020
ed7a0f1
Merge branch 'master' into HEAD
mpuncel Nov 18, 2020
38cc1bf
various fixes
mpuncel Nov 18, 2020
0bd20e6
make integration test send actual GOAWAY frames
mpuncel Nov 18, 2020
38c4fb8
undo change to workflow
mpuncel Nov 18, 2020
b4496a9
fix github workflow
mpuncel Nov 18, 2020
6dd4400
remove commented line
mpuncel Nov 18, 2020
41a330b
address PR feedback
mpuncel Nov 18, 2020
bbb5b8e
wip amend
mpuncel Dec 1, 2020
8cd1560
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 2, 2020
ec669f2
move protocolErrorForTest to the non-legacy http2 codec
mpuncel Dec 2, 2020
455034a
fix compilation error in health_check_fuzz_test_utils
mpuncel Dec 2, 2020
e75b30a
fix compilation errors in health checker unit test
mpuncel Dec 2, 2020
df8f734
add protocolErrorForTest to the legacy http2 impl
mpuncel Dec 2, 2020
56efbce
undo changes to nghttp2 user callback
mpuncel Dec 8, 2020
2b0d930
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 8, 2020
9e339ee
address PR feedback
mpuncel Dec 9, 2020
0345dba
add unit test for protocolErrorForTest() on the non-legacy codec impl
mpuncel Dec 9, 2020
06652ed
fix protocolErrorForTest() unit test when legacy codec impl is used
mpuncel Dec 10, 2020
322e564
remove some unintended newlines
mpuncel Dec 10, 2020
deb3893
add TODO comment to clean up fake upstream when legacy h2 codec is re…
mpuncel Dec 10, 2020
3ff4d9f
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 10, 2020
bcb7346
add goaway_error stat too health checker, wait on it in integration t…
mpuncel Dec 11, 2020
f74ed68
fix wording of goaway_error doc
mpuncel Dec 11, 2020
1c287c1
commit stat increment line
mpuncel Dec 11, 2020
49e689d
remove unused lambda capture variable
mpuncel Dec 11, 2020
a7cfa59
PR feedback: remove new stat, use real sleep in integration test
mpuncel Dec 16, 2020
d15bbf3
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 16, 2020
31c0641
drop changes accommodating legacy h2 codec, it was removed
mpuncel Dec 16, 2020
5e8b981
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Dec 22, 2020
6a3ad8c
switch to real time
mpuncel Dec 22, 2020
eacd1f4
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Jan 8, 2021
64cc217
Merge branch 'master' into mpuncel/http2-hc-goaway
mpuncel Jan 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/root/version_history/current.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ Bug Fixes
---------
*Changes expected to improve the state of the world and are unlikely to have negative effects*

* active http health checks: properly handles HTTP/2 GOAWAY frames from the upstream. Previously a GOAWAY frame due to a graceful listener drain could cause improper failed health checks due to streams being refused by the upstream on a connection that is going away. To revert to old GOAWAY handling behavior, set the runtime feature `envoy.reloadable_features.health_check.graceful_goaway_handling` to false.

* http: sending CONNECT_ERROR for HTTP/2 where appropriate during CONNECT requests.

Removed Config or Runtime
Expand Down
1 change: 1 addition & 0 deletions source/common/runtime/runtime_features.cc
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ constexpr const char* runtime_features[] = {
"envoy.reloadable_features.fix_wildcard_matching",
"envoy.reloadable_features.fixed_connection_close",
"envoy.reloadable_features.hcm_stream_error_on_invalid_message",
"envoy.reloadable_features.health_check.graceful_goaway_handling",
"envoy.reloadable_features.http_default_alpn",
"envoy.reloadable_features.http_match_on_all_headers",
"envoy.reloadable_features.http_set_copy_replace_all_headers",
Expand Down
48 changes: 45 additions & 3 deletions source/common/upstream/health_checker_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -256,11 +256,14 @@ void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval() {
parent_.transportSocketMatchMetadata().get());
client_.reset(parent_.createCodecClient(conn));
client_->addConnectionCallbacks(connection_callback_impl_);
client_->setCodecConnectionCallbacks(http_connection_callback_impl_);
expect_reset_ = false;
reuse_connection_ = parent_.reuse_connection_;
}

Http::RequestEncoder* request_encoder = &client_->newStream(*this);
request_encoder->getStream().addCallbacks(*this);
request_in_flight_ = true;

const auto request_headers = Http::createHeaderMap<Http::RequestHeaderMapImpl>(
{{Http::Headers::get().Method, "GET"},
Expand All @@ -279,15 +282,51 @@ void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onInterval() {

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onResetStream(Http::StreamResetReason,
absl::string_view) {
request_in_flight_ = false;
ENVOY_CONN_LOG(debug, "connection/stream error health_flags={}", *client_,
HostUtility::healthFlagsToString(*host_));
if (client_ && !reuse_connection_) {
client_->close();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyssawilk this is the one place it might be weird to just use a reuse_connection_ bool, because in the old behavior it looks like it would not close the connection if the stream was reset? That means i'd have to change the behavior a bit, but it also might be a small bug that it wasn't closing the connection before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll push the commit, we can always revert it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, closing if we can't reuse and the stream is reset seems fine, and it's behind the guard so even if someone gets surprised they can always revert the behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of this behavior change is not guarded. It looks like the same case came up when implementing GOAWAY handling in gRPC.

There is a slight behavior change if you have connection reuse disabled, and get a reset stream. Previously the connection would be reused! which seems wrong, but is an unguarded behavior change.

Reference PR: #11325

}

if (expect_reset_) {
return;
}

ENVOY_CONN_LOG(debug, "connection/stream error health_flags={}", *client_,
HostUtility::healthFlagsToString(*host_));
handleFailure(envoy::data::core::v3::NETWORK);
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onGoAway(
Http::GoAwayErrorCode error_code) {
ENVOY_CONN_LOG(debug, "connection going away health_flags={}", *client_,
HostUtility::healthFlagsToString(*host_));

// Runtime guard around graceful handling of NO_ERROR GOAWAY handling. The old behavior is to
// ignore GOAWAY completely.
if (!parent_.runtime_.snapshot().featureEnabled(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtimeFeatureEnabled() ?

"envoy.reloadable_features.health_check.graceful_goaway_handling", 100UL)) {
return;
}

if (request_in_flight_ && error_code == Http::GoAwayErrorCode::NoError) {
// The server is starting a graceful shutdown. Allow the in flight request
// to finish without treating this as a health check error, and then
// reconnect.
reuse_connection_ = false;
return;
}

if (request_in_flight_) {
// Record this as a failed health check.
handleFailure(envoy::data::core::v3::NETWORK);
}

if (client_) {
expect_reset_ = true;
mpuncel marked this conversation as resolved.
Show resolved Hide resolved
client_->close();
}
}

HttpHealthCheckerImpl::HttpActiveHealthCheckSession::HealthCheckResult
HttpHealthCheckerImpl::HttpActiveHealthCheckSession::healthCheckResult() {
uint64_t response_code = Http::Utility::getResponseStatus(*response_headers_);
Expand Down Expand Up @@ -318,6 +357,8 @@ HttpHealthCheckerImpl::HttpActiveHealthCheckSession::healthCheckResult() {
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onResponseComplete() {
request_in_flight_ = false;

switch (healthCheckResult()) {
case HealthCheckResult::Succeeded:
handleSuccess(false);
Expand Down Expand Up @@ -345,7 +386,7 @@ bool HttpHealthCheckerImpl::HttpActiveHealthCheckSession::shouldClose() const {
return false;
}

if (!parent_.reuse_connection_) {
if (!reuse_connection_) {
return true;
}

Expand Down Expand Up @@ -375,6 +416,7 @@ bool HttpHealthCheckerImpl::HttpActiveHealthCheckSession::shouldClose() const {
}

void HttpHealthCheckerImpl::HttpActiveHealthCheckSession::onTimeout() {
request_in_flight_ = false;
if (client_) {
host_->setActiveHealthFailureType(Host::ActiveHealthFailureType::TIMEOUT);
ENVOY_CONN_LOG(debug, "connection/stream timeout health_flags={}", *client_,
Expand Down
14 changes: 14 additions & 0 deletions source/common/upstream/health_checker_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ class HttpHealthCheckerImpl : public HealthCheckerImplBase {
void onBelowWriteBufferLowWatermark() override {}

void onEvent(Network::ConnectionEvent event);
void onGoAway(Http::GoAwayErrorCode error_code);

class ConnectionCallbackImpl : public Network::ConnectionCallbacks {
public:
Expand All @@ -118,14 +119,27 @@ class HttpHealthCheckerImpl : public HealthCheckerImplBase {
HttpActiveHealthCheckSession& parent_;
};

class HttpConnectionCallbackImpl : public Http::ConnectionCallbacks {
public:
HttpConnectionCallbackImpl(HttpActiveHealthCheckSession& parent) : parent_(parent) {}
// Http::ConnectionCallbacks
void onGoAway(Http::GoAwayErrorCode error_code) override { parent_.onGoAway(error_code); }

private:
HttpActiveHealthCheckSession& parent_;
};

ConnectionCallbackImpl connection_callback_impl_{*this};
HttpConnectionCallbackImpl http_connection_callback_impl_{*this};
HttpHealthCheckerImpl& parent_;
Http::CodecClientPtr client_;
Http::ResponseHeaderMapPtr response_headers_;
const std::string& hostname_;
const Http::Protocol protocol_;
Network::Address::InstanceConstSharedPtr local_address_;
bool expect_reset_{};
bool reuse_connection_ = false;
bool request_in_flight_ = false;
};

using HttpActiveHealthCheckSessionPtr = std::unique_ptr<HttpActiveHealthCheckSession>;
Expand Down
Loading