Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP: Client Certificate Verification for Gateway Listeners #91

Open
jpeach opened this issue Feb 14, 2020 · 34 comments · Fixed by #2080
Open

GEP: Client Certificate Verification for Gateway Listeners #91

jpeach opened this issue Feb 14, 2020 · 34 comments · Fixed by #2080
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/gep PRs related to Gateway Enhancement Proposal(GEP) kind/user-story Categorizes an issue as capturing a user story priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@jpeach
Copy link
Contributor

jpeach commented Feb 14, 2020

What would you like to be added:

The ability for a HTTPS (or TLS generally) endpoint to require that the client present a certificate that can be validated according to some configurable policy.

Why is this needed:

As an application developer, I want to restrict access to my application to a certain audience of clients. The audience is defined by one or more of

  • a collection of specific TLS certificates (maybe by hash)
  • a collection of subject names in certificates
  • a collection of certificates issued by a specific (unique) CA

I want the infrastructure to guarantee that I only receive client traffic that originates from this audience.

/kind user-story

@jpeach jpeach added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 14, 2020
@k8s-ci-robot k8s-ci-robot added the kind/user-story Categorizes an issue as capturing a user story label Feb 14, 2020
@bowei
Copy link
Contributor

bowei commented Feb 14, 2020

Is this mTLS? Can you add to the issue title, as that is a commonly known term around the K8S community.

@youngnick
Copy link
Contributor

It is mutual TLS in the sense that the client and the server perform mutual TLS, but 'mTLS' also has some connotations about speed and automation of certificate rotation that we probably don't want to bring in here.

I say this because when we had an issue to add 'mTLS' to Contour, people started asking straight away if Contour was now a service mesh. Which it is not. But the term is overloaded.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 17, 2020
@bowei bowei removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 19, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 16, 2020
@hbagdi
Copy link
Contributor

hbagdi commented Sep 16, 2020

/remove-lifecycle rotten

#268 should close this.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Sep 16, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2020
@hbagdi
Copy link
Contributor

hbagdi commented Dec 15, 2020

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2020
@hbagdi hbagdi added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Dec 15, 2020
@alanchrt
Copy link

alanchrt commented Mar 4, 2021

Adding my comments from Slack regarding some configuration options that would be nice:

either deny or don't deny requests from clients who don't present valid certs (useful when you want to show a "logged out state" based on application logic in your upstream)

an option to pass through the subject/sans from the certificate to your upstream (the identity of the client), maybe in a header? not sure if there's a standard there. this could probably be the default.. not sure if it would matter if you still passed through the client identity and the upstream didn't care.

this could let you configure things like "i'll take requests in my upstream even if they don't present a valid cert, but if they do present a valid cert, i want to know that the identity was alan so i can decide his access level. and if there's no cert, i want to show them logged out state."

The ingress-nginx annotations seem like a really good reference point for which configuration options might be nice, as well as the names and contents of headers to pass back to the upstream:

https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/#client-certificate-authentication

I think this one largely gets at some of the use cases I was describing above:

nginx.ingress.kubernetes.io/auth-tls-verify-client: optional_no_ca

@youngnick
Copy link
Contributor

It's also important to remember that, if the following are true:

  • you have a single proxy routing multiple SNIs
  • the multiple SNI routes can have different security profiles (as in, some can have client certificates and some can't)
  • You are using a protocol that has a higher-level routing construct (like HTTP's Host header)

Then you can connect to insecure.foo.com without a client cert, and set your Host header to secure.foo.com, and end up at the secure site, with no client cert.

This is roughly the same thing as domain fronting.

@alanchrt
Copy link

alanchrt commented Mar 5, 2021

I wonder if sane defaults plus an "I know I'm doing a dangerous thing" option could help with domain fronting.

That is, if your listener is configured in Terminate mode, client certificate authentication can be configured with no special options. In addition to verifying the certificate, requests by default won't be routed upstream unless the SNI host matches the HTTP Host.

But, if you're in Passthrough mode, client authentication is only allowed with disableHostVerify: true. The docs can indicate that any upstream routers will need to be sure not to route based on Host with the expectation that it is the actually hostname that the client authenticated with. The disableHostVerify option could also be available for Terminate mode if there was a use case that needed it.

@mmalone
Copy link

mmalone commented Mar 5, 2021

@alanchrt you're talking about two modes: one that enforces that Host matches SNI and the other that doesn't, right? We're not terminating TLS vs. passing it through for a backend to terminate? If that's correct, I like the idea but would probably look for different terminology ;).

I'm not convinced the non-enforcing mode needs to exist at all. What's the use case for upstreams actually needing to route based on Host header? I'd consider setting the Host header for the upstream request to force it to match SNI. The host header is ignored all the time (e.g., most non-proxy web applications will ignore the host header). In a case like this, I can't think of a way it would ever break non-malicious clients.

I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.

@howardjohn
Copy link
Contributor

I don't have as much experience with http/2 so I have no comment on whether this might affect those clients.

I think per the HTTP2 spec servers are expected to return a 421 here in some scenarios - this allows the browser to automatically retry when its optimizations cause it to not match
envoyproxy/envoy#6767 and istio/istio#13589 have a lot of info here

@alanchrt
Copy link

alanchrt commented Mar 5, 2021

@mmalone Actually, I wasn't trying to suggest Terminate and Passthrough as names for host verification, but rather trying to describe some possible default behaviors for those (existing) TLS mode options, which are defined in the TLS config.

Enforcement of SNI host and HTTP Host match wouldn't be possible in a Passthrough scenario, which is why I was suggesting you'd have to explicitly disable the enforcement to be able to enable client auth. Then, people know what they're getting themselves into.

As an example of an upstream routing based on Host, imagine it wasn't even a router usptream, but rather a monolith web app that's multi-tenant. It could serve customer-a.mycompany.com and customer-b.mycompany.com.

This is totally contrived, but imagine they decided they wanted to have cient cert auth on Gateway to prevent Customer A from even being able to load Customer B's app at all. They already have all the wiring to terminate TLS and manage certs and keys for their upstream app (for the customer domains), so they don't want to mess with that. They just verify the client cert, then passthrough TLS traffic to the app upstream. But, the app uses the HTTP Host to determine which customer's app to display.

Again, contrived, but I could see scenarios where someone might want it. In the above scenario, they could set disableHostVerify: true, and they could check the SNI host (passed in a header) to make sure it matches the HTTP Host in the upstream app itself.

@arkodg
Copy link
Contributor

arkodg commented Sep 15, 2023

/reopen

@k8s-ci-robot
Copy link
Contributor

@arkodg: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Sep 15, 2023
@robscott
Copy link
Member

robscott commented Dec 1, 2023

/kind gep

@k8s-ci-robot k8s-ci-robot added the kind/gep PRs related to Gateway Enhancement Proposal(GEP) label Dec 1, 2023
@shaneutt shaneutt changed the title Client Certificate Verification for Gateway Listeners GEP: Client Certificate Verification for Gateway Listeners Dec 4, 2023
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 3, 2024
@youngnick
Copy link
Contributor

/remove-lifecycle stale
/lifecycle frozen

Since the GEP is still in motion.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 4, 2024
@howardjohn
Copy link
Contributor

@arkodg are there conformance tests planned?

@arkodg
Copy link
Contributor

arkodg commented Apr 15, 2024

thanks for the reminder, let me create a issue to track it

@shaneutt
Copy link
Member

Doesn't seem like we've had motion on this recently?

/remove-lifecycle frozen

@arkodg just wanted to check in, did you have any thoughts or a summary update for this one?

@k8s-ci-robot k8s-ci-robot removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Aug 16, 2024
@arkodg
Copy link
Contributor

arkodg commented Aug 16, 2024

next steps here would be

  1. have some implementations, implement this
  2. build out conformance tests
  3. make sure tests pass for at least 3 implementations
  4. move to standard

we're still on 1., afaik, no one has implemented this yet

@shaneutt
Copy link
Member

shaneutt commented Sep 18, 2024

Arko if I'm understanding correctly you're actively/planning-on moving this one forward:

/priority important-longterm

Can you please help us to better understand a rough idea about what you imagine you'll be able to deliver and roughly when you'd like to try for that? Not looking for a promise just trying to get a feel for your capacity and what you're thinking, so we can look at organizing this along the upcoming milestones.

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Sep 18, 2024
@shaneutt shaneutt moved this to FROZEN in Post-GA Refinement Sep 18, 2024
@shaneutt shaneutt removed the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Oct 2, 2024
@kflynn kflynn moved this from Experimental to Provisional in Gateway API Enhancement Proposals (GEPs) Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/gep PRs related to Gateway Enhancement Proposal(GEP) kind/user-story Categorizes an issue as capturing a user story priority/backlog Higher priority than priority/awaiting-more-evidence. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
Status: Provisional
Status: Blocked/Stalled
Development

Successfully merging a pull request may close this issue.