Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failures in External services configured in envoy should be visible to the application developer #2325

Closed
youngnick opened this issue Mar 5, 2020 · 5 comments
Labels
blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.

Comments

@youngnick
Copy link
Member

We have a few requests currently to allow the configuration of external services of various kinds in Envoy: #432, #1691, and #370.

The thing that all of these have in common is that with our current Contour design, there is no way for Contour to do anything other than pass the config to Envoy.

This means that, for example, there is no way for someone configuring external auth for a particular HTTPProxy to know if that external auth is working correctly without having access to the external auth service.

Another example is that there is no way for someone using a service with a rate-limiter to know if their service has tripped the rate limit.

When something does go wrong with one of these services, there needs to be a way for someone using them indirectly to know where the problem is.

I think that this problem requires two things:

Contour should be able to health check clusters in Envoy

All the external services must be configured as a cluster in Envoy. So, if Contour has a way to check the state of a cluster in Envoy (whether it has healthy endpoints, or some other information about it), then we have the information we need to pass to the application developer.

Contour should be able to expose external service health info

Contour should be able to expose external service health info in the relevant place, whether that is a status field on an object like a HTTPProxy, a log line in Contour, a metric, or some combination of the above.

Obviously the first is a requirement for the second.

I'm not sure of the best way to check Envoy clusters from Contour, whether it's some gRPC thing, checking the stats by fetching them, or something else.

This issue is to cover:

  • if this is a good idea
  • doing the two steps if it is.
@youngnick youngnick added kind/design Categorizes issue or PR as related to design. blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. labels Mar 5, 2020
@jpeach
Copy link
Contributor

jpeach commented Mar 11, 2020

In principle, I think that exposing envoy's information about clusters is a good idea. It's useful for regular services and also for special infrastructure services.

@jpeach
Copy link
Contributor

jpeach commented Apr 20, 2020

Note to self:

General envoy cluster metrics that could be used to support this feature:

Name Type Desc
membership_healthy Gauge Current cluster healthy total (inclusive of both health checking and outlier detection)
membership_degraded Gauge Current cluster degraded total
membership_total Gauge Current cluster membership total
upstream_cx_none_healthy Counter Total times connection not established due to no healthy hosts

upstream_cx_none_healthy is pretty interesting if we can use it to create a level-based signal. Otherwise need some more research around the membership metrics.

jpeach added a commit to jpeach/contour that referenced this issue Jul 7, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 14, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 16, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 17, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 20, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 21, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit to jpeach/contour that referenced this issue Jul 22, 2020
This updates projectcontour#432.
This updates projectcontour#2459.
This updates projectcontour#2325.

Signed-off-by: James Peach <jpeach@vmware.com>
jpeach added a commit that referenced this issue Jul 22, 2020
This updates #432.
This updates #2459.
This updates #2325.

Signed-off-by: James Peach <jpeach@vmware.com>
@youngnick
Copy link
Member Author

With the addition of ExtensionService, including a Conditions block, we have the space available for Contour to set a Ready condition on an ExtensionService, this would indicate that the service is up and able to receive traffic. I'm still not currently sure of the best way to grab this information, however. It could be that the best way is to do a lookup of the Endpoints associated with the Service associated with the ExtensionService, and update the status that way (on the assumption that if there are Kubernetes Endpoints, then Envoy will be able to send traffic there.)

Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
Copy link

The Contour project currently lacks enough contributors to adequately respond to all Issues.

This bot triages Issues according to the following rules:

  • After 60d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the Issue is closed

You can:

  • Mark this Issue as fresh by commenting
  • Close this Issue
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked/needs-design Categorizes the issue or PR as blocked because it needs a design document. kind/design Categorizes issue or PR as related to design. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

No branches or pull requests

2 participants