Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL Certs from Vault #70

Closed
far-blue opened this issue Mar 29, 2016 · 15 comments
Closed

SSL Certs from Vault #70

far-blue opened this issue Mar 29, 2016 · 15 comments
Milestone

Comments

@far-blue
Copy link

I love the concept of fabio where you cut out all the middle layers and simply route according to the service records. I'd love to see TCP routing for ssh and mysql services but that's a different issue ;)

What I'd like to suggest here is that SSL certs are fetched from Vault. This would allow services to have auto-generated certs based on Vault's PKI support which has improved greatly in the last couple of releases. I believe the Rest API for Vault is very simple if you are just requesting certs and then it's just a case of tracking expiry - which can be done in memory because restarting fabio you can just request fresh certs. You could even fetch certs lazily on first routing request.

@magiconair
Copy link
Contributor

Yeah, that has been on my wish list for a while (see #27)...

@jefferai
Copy link

Vault guy here! Got pointed this way :-)

This would be super cool -- you are probably aware of this, but there is a function that lets you get a certificate based on a client-supplied host name (GetCertificate at https://golang.org/pkg/crypto/tls/#Config) so you could definitely fetch-on-demand.

Looked through the other ticket -- there's definitely room for both LE and Vault, as they really tackle different use-cases. LE is designed to provide certs in an automated way to the Internet infrastructure, but doesn't work well within an organization. Vault's PKI support is designed to provide certs in an automated way within an organization, where you don't need to issue certs acceptable to the wider Internet but you need to issue a large number from a root trusted internally. So for software like fabio, if Internet-facing, you can definitely see imagine fetching certs from LE for the front end, and your backend services fetching certs from Vault. If fabio isn't Internet-facing in your setup, it could also fetch from Vault.

magiconair added a commit that referenced this issue Jun 3, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 3, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 3, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 3, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 7, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 7, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 8, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 8, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 8, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 8, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 8, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 9, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
@magiconair
Copy link
Contributor

@jefferai I've got a question regarding the Vault integration. Right now I'm polling Vault every couple of seconds (default every 3s, no less than every sec) to get the list of certificates stored under a certain path. I am also renewing the token on every refresh.

I assume the path structure in Vault looks like this:

secret/fabio/certs
secret/fabio/certs/a.com cert=---BEGIN CERTIFICATE --- key=--- BEGIN RSA PRIVATE KEY ---
secret/fabio/certs/b.com cert=---BEGIN CERTIFICATE --- key=--- BEGIN RSA PRIVATE KEY ---
...

I don't care about the leases since I'm always replacing the certificates with whatever I get from Vault and I couldn't see how Vault would tell me when things have changed like Consul does.

Is this in line with how Vault should be used?

magiconair added a commit that referenced this issue Jun 9, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 9, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
@jefferai
Copy link

jefferai commented Jun 9, 2016

Hi @magiconair

That sounds like much, much more traffic to Vault than should be needed (and many more token refreshes). Can you explain the design a bit more? Is there any reason not so simply key off refreshing from Vault based on the certificate's expected lifetime? (e.g. start checking at halfway until expiration, increasing frequency as you get closer to expiration)

@magiconair
Copy link
Contributor

@jefferai The problem isn't about cert expiration but about detecting when another cert has been added or removed and how quickly fabio can pick this up without being restarted.

Think about how this would work in consul. You add a cert to the KV store and all watchers would be notified that something has changed. Then fabio can load the new list of certificates and replace the old one. Since Vault does not have such a mechanism for watching for changes I have to revert to polling. Am I missing something or does that make sense to you?

@magiconair
Copy link
Contributor

Hi @jefferai

the design is as follows: a background process fetches the available certificates from a source (file, path, http, consul, vault), checks if there is a difference to the previous value and only then updates them.

This decouples the fetching of the certs from the serving, i.e. fetching certs cannot block the main proxy.

If I would fetch the certs on the first request I would have to deal with a stampeding herd on startup where thousands of requests would all try to fetch the cert at the same time. I could still funnel this through a lock but this has the potential of blocking the proxy.

@magiconair
Copy link
Contributor

@jefferai your comment got me thinking. I should be able to keep this decoupled while at the same time fetch certificates only on demand.

@jefferai
Copy link

jefferai commented Jun 9, 2016

@magiconair In case it got lost, another recommendation for the GetCertificate function in https://golang.org/pkg/crypto/tls/#Config. You could use this to fetch certificates on demand and then simply memoize them.

In fact, this is exactly how we implemented certificate reloading in Vault. That function fetches the certs from disk and stores the parsed objects in memory; when a connection comes in and that function is called it simply returns the cert. However, when a SIGHUP comes in, it forgets that cert and re-parses the file on disk, then memoizes the new value.

This way you don't need to keep hitting Vault looking for new certs -- you can simply return the ones you already have, and maybe check now and again to see if new versions are available.

@magiconair
Copy link
Contributor

magiconair commented Jun 9, 2016

@jefferai no that didn't get lost and that is the function I'm using for serving the certs and the certs are cached in memory until they change.

The problem is with fetching and when and how to trigger the reload. You rely on a SIGHUP which has to be triggered by someone or something. Also, if you run more than one fabio instance they'd all have to receive the signal more or less at the same time on different machines unless you build a coordination mechanism into vault. If that isn't there then this is a process that someone has to build and maintain which I want to avoid.

Consul offers the option to wait (long poll, waitIndex) for a change. That allows me to update the proxy routing table of all connected fabio instances at the same time without the need for external coordination. I'd like to achieve the same thing with the cert sources but since only consul offers the wait-for-change feature I've reverted to polling where necessary.

@jefferai
Copy link

jefferai commented Jun 9, 2016

You know when certs expire -- why not just fetch based on time, unless someone manually sends a signal to Fabio, at which point you could remove all certs from memory and treat all as fresh?

I honestly don't see any reason for polling here.

@magiconair
Copy link
Contributor

Everything else in fabio is automatic. There are no signals to be sent and nothing to be configured. That's the design goal of it. Therefore, certificates should be available to fabio as soon as they are added to the store and they should be available to all fabio instances that make up a cluster more or less at the same time - ideally immediately.

So I either try to fetch the cert for an unknown domain on the first request, or I tell fabio to reload the certs manually or fabio checks whether something has changed periodically.

The first option requires some refactoring and has the potential for blocking fabio while the certificates are being fetched. What if I get lots of requests for domains I don't have a cert for? That might kill the cert store

The second option requires either some manual intervention or some glue code the user has to provide. Both are not in line with fabios design goals.

The third option is how fabio works now but it requires a database which notifies fabio when something has changed (i.e. consul) or I have to poll for changes.

@jefferai
Copy link

jefferai commented Jun 9, 2016

Everything else in fabio is automatic. There are no signals to be sent and nothing to be configured.

Then don't use signals. I just suggested that if you wanted a way for an operator to explicitly tell fabio to reload.

The first option requires some refactoring and has the potential for blocking fabio while the certificates are being fetched. What if I get lots of requests for domains I don't have a cert for? That might kill the cert store

It can block fabio while certs are being fetched, but after the first fetch it'll be memoized. Besides, it should only be blocking that single goroutine. You can memoize negative results with a retry timer for certificates that aren't available.

The second option requires either some manual intervention or some glue code the user has to provide. Both are not in line with fabios design goals.

I don't see why allowing an administrator to manually expire certificates is a bad thing.

The third option is how fabio works now but it requires a database which notifies fabio when something has changed (i.e. consul) or I have to poll for changes.

I think this option is fine as long as you poll reasonably. Polling the entire cert store every three seconds is completely wasteful. You're better off using a timer per certificate to control when you next poll, based on certificate lifetime. But that ends up basically looking like option number one.

My strong suggestion is option number one (with flavors of option three). Store a backoff time value, rwmutex, and certificate information in a struct; use a thread-safe data type to look up the appropriate struct for a name, do a read lock, and return the info if valid.

Separately, have a management thread that checks each certificate; if the certificate will be expiring soon, or does not exist, get a write lock and attempt a read from the certificate store. In either case set the backoff time to half of the remaining time until expiration. If a certificate doesn't yet exist, or if it has expired without being refreshed, get a write lock and do a read from the certificate store...if nothing comes back set the backoff time to some near value (say, 3 seconds) and try again later.

@jefferai
Copy link

jefferai commented Jun 9, 2016

@magiconair BTW, I'll be in Amsterdam next week for HashiConf EU. You should join us at http://www.meetup.com/Software-Circus/events/228747162/ !

@magiconair
Copy link
Contributor

Hi @jefferai

Unfortunately, I'll still be on vacation until Tuesday but we can meet on Wed, 15 Jun since I'm presenting in the afternoon. I'll be at the venue in the morning.

Admin interaction is something I specifically don't want. I'll explain that during the presentation why :)

I'll think about this a bit more.

@jefferai
Copy link

Wow...I'm embarrassed -- I totally missed that you were talking at HC EU! Blame it on me being busy with releasing, blog posts, talks, training...

Looking forward to talking to you then. BTW -- I have zero issue with you wanting admin interaction. I truly only brought it up as an alternate method on top of automatic, because, direct control gives people warm fuzzies. But as per my post above I think we can get a good automatic solution that isn't resorting to constant pulling.

magiconair added a commit that referenced this issue Jun 15, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jun 15, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
@magiconair magiconair modified the milestones: 1.1.2, 1.2 Jun 21, 2016
magiconair added a commit that referenced this issue Jul 12, 2016
* Issue #27: change certificates via API
* Issue #28: refactor listener config
* Issue #70: support Vault
* Issue #85: SNI support
magiconair added a commit that referenced this issue Jul 16, 2016
* Issue #27: Add/remove certificates using API
* Issue #28: Refactor listener config
* Issue #70: SSL Certs from Vault
* Issue #79: Refactor config loading to use flag sets
* Issue #85: SNI Support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants