-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upstream: only prefetching if the upstream is healthy #12758
Conversation
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@@ -34,6 +34,10 @@ void ConnPoolImplBase::destructAllConnections() { | |||
} | |||
|
|||
bool ConnPoolImplBase::shouldCreateNewConnection() const { | |||
// If the host is unhealthy, don't make it do extra work. | |||
if (host_->health() != Upstream::Host::Health::Healthy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include Degraded
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @snowp
ggreenway: nice observation.
The comment in the degraded enum definition suggests that they should not participate in prefetching.
envoy/include/envoy/upstream/upstream.h
Line 143 in bd2b989
* Host is healthy, but degraded. It is able to serve traffic, but hosts that aren't degraded |
That said, prefetching seems like a newer operation. #5202 introduced this comment long ago. I think that same-host prefetching may make sense for degraded hosts. When implemented, look ahead prefetching for round-robin style load balancers may want to avoid degraded hosts; but it would naturally avoid degraded hosts since it would do the look ahead by looking at the contents of the non-degraded list of hosts.
I think we may want to update the comments and include degraded hosts in prefetching.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could go either way, but I intentionally did this for !Healthy rather than Unhealthy for a couple of reasons. Late connection fetching could result in up to date health information if there were a change in health status and if lookahead prefetching will avoid degraded hosts, I think prefetching may be overenthusiastic given they're likely to be avoided. Basically I think when hosts are not healthy, the LB is less predictable so I lean towards taking the latency hit over a CPU hit but it's easy enough to go the other way if either of your care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest a TODO to consider ideal prefetch behavior for degraded endpoints. Limiting this version to healthy endpoints seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. Maybe add some of that explanation as a code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd lean against a TODO as I only know of one user of degraded upstream and don't know if they need prefetching. I'll note it could be extended and comment why we don't by default. SG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good enough for me. @antoniovicente ?
@@ -34,6 +34,10 @@ void ConnPoolImplBase::destructAllConnections() { | |||
} | |||
|
|||
bool ConnPoolImplBase::shouldCreateNewConnection() const { | |||
// If the host is unhealthy, don't make it do extra work. | |||
if (host_->health() != Upstream::Host::Health::Healthy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @snowp
ggreenway: nice observation.
The comment in the degraded enum definition suggests that they should not participate in prefetching.
envoy/include/envoy/upstream/upstream.h
Line 143 in bd2b989
* Host is healthy, but degraded. It is able to serve traffic, but hosts that aren't degraded |
That said, prefetching seems like a newer operation. #5202 introduced this comment long ago. I think that same-host prefetching may make sense for degraded hosts. When implemented, look ahead prefetching for round-robin style load balancers may want to avoid degraded hosts; but it would naturally avoid degraded hosts since it would do the look ahead by looking at the contents of the non-degraded list of hosts.
I think we may want to update the comments and include degraded hosts in prefetching.
tryCreateNewConnections(); | ||
// All requests will be purged. newStream may create new ones, at which | ||
// point prefetching will happen there. | ||
ASSERT(!shouldCreateNewConnection()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: ASSERT looks like a safe sanity check. Nothing unlikely to go terribly wrong if ASSERT fails.
pool_.destructAllConnections(); | ||
} | ||
|
||
TEST_F(ConnPoolImplBaseTest, NoPrefetchIfUnhealthy) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to have a test that covers the degraded case.
EXPECT_CALL(pool_, instantiateActiveClient).Times(1); | ||
EXPECT_CALL(pool_, onPoolFailure).WillOnce(InvokeWithoutArgs([&]() -> void { | ||
pool_.newStream(context_); | ||
})); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the expected ordering of the two calls above? Seems to me that the execution of the newStream on onPoolFailure will trigger the instantiateActiveClient. It may be worth adding a "testing::InSequence s;" to this test.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
/lgtm api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks for the improvement to the prefetch implementation.
@@ -119,5 +138,6 @@ TEST_F(ConnPoolImplBaseTest, NoPrefetchIfUnhealthy) { | |||
pool_.destructAllConnections(); | |||
} | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline.
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Risk Level: low (only affects hidden prefetching)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a