Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upstream: only prefetching if the upstream is healthy #12758

Merged
merged 7 commits into from
Sep 2, 2020

Conversation

alyssawilk
Copy link
Contributor

Risk Level: low (only affects hidden prefetching)
Testing: new unit tests
Docs Changes: n/a
Release Notes: n/a

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@@ -34,6 +34,10 @@ void ConnPoolImplBase::destructAllConnections() {
}

bool ConnPoolImplBase::shouldCreateNewConnection() const {
// If the host is unhealthy, don't make it do extra work.
if (host_->health() != Upstream::Host::Health::Healthy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include Degraded?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @snowp

ggreenway: nice observation.

The comment in the degraded enum definition suggests that they should not participate in prefetching.

* Host is healthy, but degraded. It is able to serve traffic, but hosts that aren't degraded

That said, prefetching seems like a newer operation. #5202 introduced this comment long ago. I think that same-host prefetching may make sense for degraded hosts. When implemented, look ahead prefetching for round-robin style load balancers may want to avoid degraded hosts; but it would naturally avoid degraded hosts since it would do the look ahead by looking at the contents of the non-degraded list of hosts.

I think we may want to update the comments and include degraded hosts in prefetching.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could go either way, but I intentionally did this for !Healthy rather than Unhealthy for a couple of reasons. Late connection fetching could result in up to date health information if there were a change in health status and if lookahead prefetching will avoid degraded hosts, I think prefetching may be overenthusiastic given they're likely to be avoided. Basically I think when hosts are not healthy, the LB is less predictable so I lean towards taking the latency hit over a CPU hit but it's easy enough to go the other way if either of your care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest a TODO to consider ideal prefetch behavior for degraded endpoints. Limiting this version to healthy endpoints seems fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable. Maybe add some of that explanation as a code comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lean against a TODO as I only know of one user of degraded upstream and don't know if they need prefetching. I'll note it could be extended and comment why we don't by default. SG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good enough for me. @antoniovicente ?

@@ -34,6 +34,10 @@ void ConnPoolImplBase::destructAllConnections() {
}

bool ConnPoolImplBase::shouldCreateNewConnection() const {
// If the host is unhealthy, don't make it do extra work.
if (host_->health() != Upstream::Host::Health::Healthy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @snowp

ggreenway: nice observation.

The comment in the degraded enum definition suggests that they should not participate in prefetching.

* Host is healthy, but degraded. It is able to serve traffic, but hosts that aren't degraded

That said, prefetching seems like a newer operation. #5202 introduced this comment long ago. I think that same-host prefetching may make sense for degraded hosts. When implemented, look ahead prefetching for round-robin style load balancers may want to avoid degraded hosts; but it would naturally avoid degraded hosts since it would do the look ahead by looking at the contents of the non-degraded list of hosts.

I think we may want to update the comments and include degraded hosts in prefetching.

tryCreateNewConnections();
// All requests will be purged. newStream may create new ones, at which
// point prefetching will happen there.
ASSERT(!shouldCreateNewConnection());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: ASSERT looks like a safe sanity check. Nothing unlikely to go terribly wrong if ASSERT fails.

pool_.destructAllConnections();
}

TEST_F(ConnPoolImplBaseTest, NoPrefetchIfUnhealthy) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a test that covers the degraded case.

EXPECT_CALL(pool_, instantiateActiveClient).Times(1);
EXPECT_CALL(pool_, onPoolFailure).WillOnce(InvokeWithoutArgs([&]() -> void {
pool_.newStream(context_);
}));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the expected ordering of the two calls above? Seems to me that the execution of the newStream on onPoolFailure will trigger the instantiateActiveClient. It may be worth adding a "testing::InSequence s;" to this test.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #12758 was synchronize by alyssawilk.

see: more, trace.

@htuch
Copy link
Member

htuch commented Aug 26, 2020

/lgtm api

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
antoniovicente
antoniovicente previously approved these changes Aug 26, 2020
Copy link
Contributor

@antoniovicente antoniovicente left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the improvement to the prefetch implementation.

@@ -119,5 +138,6 @@ TEST_F(ConnPoolImplBaseTest, NoPrefetchIfUnhealthy) {
pool_.destructAllConnections();
}


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline.

Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Signed-off-by: Alyssa Wilk <alyssar@chromium.org>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@alyssawilk alyssawilk merged commit f4fd39d into envoyproxy:master Sep 2, 2020
@alyssawilk alyssawilk deleted the not_unhealthy branch December 10, 2020 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants