Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/x509: Convenient, efficient way to reload a certificate pool after on-disk changes #35887

Open
wking opened this issue Nov 28, 2019 · 9 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@wking
Copy link
Contributor

wking commented Nov 28, 2019

From #24254:

I wonder if there is any real use case for reloading the system cert pool.

One use case is long-running processes that now have to jump through hoops to update their view of the system cert pool (e.g. openshift/cloud-credential-operator#113). On Unix, the loading logic is expensive, traversing multiple directories. But for processes who know they can load certs from a single file, it would be nice to have a way to reload if the backing file had changed but not otherwise. For example, something like:

// NewCertPoolsFromFile parses a series of PEM encoded certificates from the file at the
// given path and records the Stat ModTime of the loaded file.  When the pool is used to
// verify a certificate, it has been more than a minute since the last Stat, and a fresh
// Stat gives a ModTime newer than the cached value, the file is reloaded before being
// used to perform the verification.
func NewCertPoolFromFile(path string) *CertPool

Obviously the "don't bother Stating again" time could be configurable if that seemed important. Or maybe checking the current time is about as expensive as running the Stat, so we should just call Stat on every contains. Or maybe there would be no auto-refresh in contains, but CertPool would become an interface:

type CertPool interface {
  BySubjectKeyID(key string) []*Certificate
  ByName(name string) []*Certificate
}

In which case there could be a from-file CertPool implementation with a Refresh method to trigger the Stat check and possible reload. Or something ;). But having something in the stdlib that could be dropped into tls.Config.RootCAs to get efficient reloads without the caller having to babysit the CertPool would be great. Thoughts?

@dmitshur
Copy link
Contributor

dmitshur commented Dec 2, 2019

CC @golang/security.

@andrewpmartinez
Copy link

andrewpmartinez commented Oct 14, 2021

Have a scenario where I am running with a set of internal CAs that can change (updates, removes, etc). These are not system CAs rather instance-specific CAs. GetClientCertificate and GetCertificate are great for updating client/server certs on the fly w/o starting/stopping listeners. However, there is no equivalent for RootCAs.

GetConfigForClient is the closest thing but at that point access to the original tls.Config isn't provided and a whole-cloth tls.Config needs to be returned. The way the code I am dealing with works, the tls.Config may have been modified or configured and I can not lose those changes.

Adding a function field that can be set (GetRootCas) would be very nice.

@FiloSottile
Copy link
Contributor

GetConfigForClient is the closest thing but at that point access to the original tls.Config isn't provided and a whole-cloth tls.Config needs to be returned. The way the code I am dealing with works, the tls.Config may have been modified or configured and I can not lose those changes.

Remember that you can't modify most of tls.Config after it's in use.

Anyway, you can make GetConfigForClient a closure that has access to the lexical scope in which the tls.Config is defined, so it can access it and Clone it.

Is the performance motivation for not fully reloading the CertPool still there since we redid the certificate parser? It should be significantly faster now.

@andrewpmartinez
Copy link

andrewpmartinez commented Oct 14, 2021

Anyway, you can make GetConfigForClient a closure that has access to the lexical scope in which the tls.Config is defined, so it can access it and Clone it.

A closure would work for accessing the tls.Config .

Is the performance motivation for not fully reloading the CertPool still there since we redid the certificate parser? It should be significantly faster now.

Performance isn't motivating me. My motivation is to avoid disconnecting clients that have persistent connections. As far as I knew, the only way to provide a new x509.CertPool was to cycle the listener.

@stokito
Copy link

stokito commented Jun 27, 2023

I faced a similar problem but not about CA but for server certificates.
When the LE certbot renews a certificate I need to reload it without restarting.
This happens once in 60 days so I just going to make a timer job that will run once per day and check if the cert file was changed. So I don't even need any locks for this. I also don't need for file watchers like inotify.

I can't just update the TlsConfig.Certificates simply because internally it's cloned on a server start and any changes are ignored. In the ServeTLS() function in src/net/http/server.go:3113 we have cloneTLSConfig() call. I didn't get why it's needed.

I just wrote an util function ListenAndServeTLS that works same as http.ListenAndServeTLS() but doesn't make internally the cloneTLSConfig():

func ListenAndServeTLS(srv *http.Server, certFile, keyFile string) error {
	cert, certErr := tls.LoadX509KeyPair(certFile, keyFile)
	if certErr != nil {
		return certErr
	}

	tlsCfg := &tls.Config{
		NextProtos:   []string{"h2", "http/1.1"},
		Certificates: []tls.Certificate{cert},
	}

	addr := srv.Addr
	if addr == "" {
		addr = ":https"
	}

	ln, err := net.Listen("tcp", addr)
	if err != nil {
		return err
	}

	defer ln.Close()

	tlsListener := tls.NewListener(ln, tlsCfg)
	servErr := srv.Serve(tlsListener)
	return servErr
}

I tested and added new certs on the fly and it worked well. I had to omit some logic from the original ListenAndServeTLS like ability to disable the h2 protocol but that's not critical.

The current recommended workaround is to use TlsConfig.GetCertificate hook but I really just will reimplement the same logic that already works inside with all parsing X509 (quite a heavy operation BTW) and matching by all DNSNames.

So while I have a working and simple workaround it looks like I violated some contract and can't be sure that it will continue to work in future.

So could the cloneTLSConfig() be removed? And clearly allowed to be modified during execution.
Then we can add Certificate, RootCAs and maybe modify other fields.
Otherwise it looks like may be needed many of other hooks.

@github-anurag
Copy link

@andrewpmartinez
I am facing the same issue. If possible can you share a github gist on how you resolved your issue?

@andrewpmartinez
Copy link

andrewpmartinez commented Aug 8, 2023

@andrewpmartinez I am facing the same issue. If possible can you share a github gist on how you resolved your issue?

Here is an example. This isn't meant to run as is, the ReloadServerCert() function doesn't attempt to load anything. However, you should be able to take it from here.

In my solution, we have a backing library for loading x509 certificates/private keys from file/inline PEM/hardware (HSMs, etc.) that wraps this logic up nicely and doesn't use globals. Instead, we have an instance of a struct that we can call .Reload() or be configured to watch for file system changes. I stripped this example down to make it clear how this works.

var serverCerts []*tls.Certificate
var certLock sync.RWMutex

func ReloadServerCert() {
	certLock.Lock()
	defer certLock.Unlock()

	var newCerts []*tls.Certificate

	//create tls.Cert's here, like loading them from a file system

	serverCerts = newCerts
}

func newTlsConfig() *tls.Config {
	return &tls.Config{
		GetCertificate: GetServerCertificate,
		//do not set "Certificates" field so that GetServerCertificate is always called
	}
}

// GetServerCertificate allows us to support SNI if we want and to return any reloaded server certificates
func GetServerCertificate(hello *tls.ClientHelloInfo) (*tls.Certificate, error) {
	certLock.RLock()
	defer certLock.RUnlock()

	// this block can simply always return a single tls.Certificate if SNI isn't needed
	for _, cert := range serverCerts {
		if err := hello.SupportsCertificate(cert); err == nil {
			return cert, nil
		}
	}
	
	return nil, errors.New("no server certificate found")
}

@github-anurag
Copy link

@andrewpmartinez
Thanks for replying promptly. Apologies for not clarifying earlier, the code gist I was looking for was for the issue you mentioned here: #35887 (comment)

I have a client where the ca bundle are changing in background and I need to trust the new RootCAs on client that is already running. The motivation is similar to your which is to avoid disconnecting clients that have persistent connections.

@p0lyn0mial
Copy link

@FiloSottile Sorry if I'm abusing this issue but i think it is also related to #22836

It looks like Kubernetes client-go's library has hit this limitation as well. Just checking in if something has changed since your last comment that would be in favour of adding a callback function for reloading tls.Config.RootCAs on the client side? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants