tls: Restructure and improve certificate management #2015

mholt · 2018-02-04T08:09:27Z

Please help test and review this PR!

1. What does this change do, exactly?

Most of the changes are internal to the code, but there are bug fixes and a couple minor breaking changes for server type plugins that use the caddytls package.

Expose the list of Caddy instances through caddy.Instances()
Added arbitrary storage to caddy.Instance
The cache of loaded certificates is no longer global; now scoped
per-instance, meaning upon reload (like SIGUSR1) the old cert cache
will be discarded entirely, whereas before, aggressively reloading
config that added and removed lots of sites would cause unnecessary
build-up in the cache over time.
Key certificates in the cache by their SHA-256 hash instead of
by their names. This means certificates will not be duplicated in
memory (within each instance), making Caddy much more memory-efficient
for large-scale deployments with thousands of sites sharing certs.
Perform name-to-certificate lookups scoped per caddytls.Config instead
of a single global lookup. This prevents certificates from stepping on
each other when they overlap in their names.
Do not allow TLS configurations keyed by the same hostname to be
different; this now throws an error.
Updated relevant tests, with a stark awareness that more tests are
needed.
Change the NewContext function signature to include an *Instance.
Strongly recommend (basically require) use of caddytls.NewConfig()
to create a new *caddytls.Config, to ensure pointers to the instance
certificate cache are initialized properly.
Update the TLS-SNI challenge solver (even though TLS-SNI is disabled
currently on the CA side). Store temporary challenge cert in instance
cache, but do so directly by the ACME challenge name, not the hash.
Modified the getCertificate function to check the cache directly for
a name match if one isn't found otherwise. This will allow any
caddytls.Config to be able to help solve a TLS-SNI challenge, with one
extra side-effect that might actually be kind of interesting (and
useless): clients could send a certificate's hash as the SNI and
Caddy would be able to serve that certificate for the handshake.
Do not attempt to match a "default" (random) certificate when SNI
is present but unrecognized; return no certificate so a TLS alert
happens instead.
Store an Instance in the list of instances even while the instance
is still starting up (this allows access to the cert cache for
performing renewals at startup, etc). Will be removed from list again
if instance startup fails.
Laid groundwork for ACMEv2 and Let's Encrypt wildcard support.

Server type plugins will need to be updated slightly to accommodate
minor adjustments to their API (like passing in an Instance). This
commit includes the changes for the HTTP server.

Certain Caddyfile configurations might error out with this change, if
they configured different TLS settings for the same hostname.

This change trades some complexity for other complexity, but ultimately
this new complexity is more correct and robust than earlier logic.

2. Please link to the relevant issues.

Fixes #1991
Fixes #1994
Fixes #1303

3. Which documentation changes (if any) need to be made because of this PR?

The tls directive page will definitely need updating. TLS configurations for equal hostnames cannot be different. Any description about how certificates are stored internally might need to change.

OH, and something I forgot on the original commit message: Caddy can now better support being used in conjunction with other Caddy instances that share the same certificate storage. In other words, if you have multiple instances all using the same $CADDYPATH, that's been a bad idea because they will all try to manage the certificates themselves (keep them renewed, etc, which runs you into rate limits fast). Now, Caddy will attempt to check if the certificate in storage still needs to be renewed just before it fires off a renewal. So if you stagger Caddy deployments by more time than it takes to complete an ACME challenge (usually just a few seconds, but if you use DNS challenge maybe minutes or more), they might play better together in an acceptable way, for small-medium scale use cases. I wouldn't rely on this for high-volume deployments (like, hundreds or thousands of instances sharing the same certificate storage and all trying to manage them) but I bet it'll be better for dozens of Caddy instances!

** UPDATE **: The latest changes I pushed to this branch now eliminate the need to stagger the timing of running each Caddy instance. If multiple Caddy instances (even thousands) share the same certificate storage on disk, they will coordinate renewals and not step on each others' toes. It just requires using the DNS challenge when behind a load balancer. However, testing this is a little tricky, so we'll have to see how it goes. It's an advanced use case, so I'm not expecting many people to be benefitted by it, but I hope they will let us know how it goes!

4. Checklist

I have written tests and verified that they fail without my change
I have squashed any insignificant commits
This change has comments for package types, values, functions, and non-obvious lines of code
I am willing to help maintain this change if there are issues with it later

/cc @pieterlouw and @miekg -- I'll be getting in touch with you soon with more details for what this means about your server type plugins. I can probably submit a PR to caddy-net, and I'm not sure that CoreDNS will have to change much except a line where NewContext() is invoked.

- Expose the list of Caddy instances through caddy.Instances() - Added arbitrary storage to caddy.Instance - The cache of loaded certificates is no longer global; now scoped per-instance, meaning upon reload (like SIGUSR1) the old cert cache will be discarded entirely, whereas before, aggressively reloading config that added and removed lots of sites would cause unnecessary build-up in the cache over time. - Key certificates in the cache by their SHA-256 hash instead of by their names. This means certificates will not be duplicated in memory (within each instance), making Caddy much more memory-efficient for large-scale deployments with thousands of sites sharing certs. - Perform name-to-certificate lookups scoped per caddytls.Config instead of a single global lookup. This prevents certificates from stepping on each other when they overlap in their names. - Do not allow TLS configurations keyed by the same hostname to be different; this now throws an error. - Updated relevant tests, with a stark awareness that more tests are needed. - Change the NewContext function signature to include an *Instance. - Strongly recommend (basically require) use of caddytls.NewConfig() to create a new *caddytls.Config, to ensure pointers to the instance certificate cache are initialized properly. - Update the TLS-SNI challenge solver (even though TLS-SNI is disabled currently on the CA side). Store temporary challenge cert in instance cache, but do so directly by the ACME challenge name, not the hash. Modified the getCertificate function to check the cache directly for a name match if one isn't found otherwise. This will allow any caddytls.Config to be able to help solve a TLS-SNI challenge, with one extra side-effect that might actually be kind of interesting (and useless): clients could send a certificate's hash as the SNI and Caddy would be able to serve that certificate for the handshake. - Do not attempt to match a "default" (random) certificate when SNI is present but unrecognized; return no certificate so a TLS alert happens instead. - Store an Instance in the list of instances even while the instance is still starting up (this allows access to the cert cache for performing renewals at startup, etc). Will be removed from list again if instance startup fails. - Laid groundwork for ACMEv2 and Let's Encrypt wildcard support. Server type plugins will need to be updated slightly to accommodate minor adjustments to their API (like passing in an Instance). This commit includes the changes for the HTTP server. Certain Caddyfile configurations might error out with this change, if they configured different TLS settings for the same hostname. This change trades some complexity for other complexity, but ultimately this new complexity is more correct and robust than earlier logic. Fixes #1991 Fixes #1994 Fixes #1303

francislavoie · 2018-02-04T18:01:17Z

I had a few questions/ideas as to what this means for Caddy in the future. I'm not really reviewing the code, so I'm sorry if I'm asking things that are painfully obvious by looking at the code.

Added arbitrary storage to caddy.Instance

This means we'll now have support for plugins adding their own storage drivers, correct? E.g. Consul (see #1918) - for situations where Caddy is a load balancer, there needs to be a way to ensure instances don't conflict with eachother when attempting to renew. Which brings me to my next question:

Caddy can now better support being used in conjunction with other Caddy instances that share the same certificate storage. In other words, if you have multiple instances all using the same $CADDYPATH, that's been a bad idea because they will all try to manage the certificates themselves (keep them renewed, etc, which runs you into rate limits fast). Now, Caddy will attempt to check if the certificate in storage still needs to be renewed just before it fires off a renewal. So if you stagger Caddy deployments by more time than it takes to complete an ACME challenge (usually just a few seconds, but if you use DNS challenge maybe minutes or more), they might play better together in an acceptable way, for small-medium scale use cases. I wouldn't rely on this for high-volume deployments (like, hundreds or thousands of instances sharing the same certificate storage and all trying to manage them) but I bet it'll be better for dozens of Caddy instances!

So this is kinda part two of the requirements in #1918, locking. This PR doesn't implement locking for renewals, based on your description, right? For that usecase you described, it sounds like a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up. Maybe that's out of scope for this PR though. A lock interface would help for implementing Consul or other storage drivers.

The cache of loaded certificates is no longer global; now scoped
per-instance

Maybe I'm getting confused by the terminology here, but does instance here mean individual running instances of Caddy or does it mean something smaller/internal (like an instance per site)? If the former, my understanding is that the old behaviour was a single cache for every instance running on the same machine? If the later, then was the old behaviour that each instance had its own cache (this seems like the ideal goal, so I'm leaning towards the former being the meaning)?

I ask that, because:

Expose the list of Caddy instances through caddy.Instances()

So does each running instance now know about the other instances on the same machine? If so, neat!

The tls directive page will definitely need updating. TLS configurations for equal hostnames cannot be different. Any description about how certificates are stored internally might need to change.

This sounds like a pretty good case for recommending using the new snippet syntax (#1921) to avoid duplicated TLS blocks.

mholt · 2018-02-04T18:18:18Z

@francislavoie Great thoughts, let me respond to each one:

This means we'll now have support for plugins adding their own storage drivers, correct? E.g. Consul (see #1918) - for situations where Caddy is a load balancer, there needs to be a way to ensure instances don't conflict with eachother when attempting to renew

No, sorry for the confusion. This storage is simply a map[interface{}]interface{} that allows plugins to store things scoped to an Instance, instead of globally. This makes for better correctness and efficiency with regards to reloads (SIGUSR1, etc). Our certificate cache is stored in Instance storage instead of globals now. This is not the same as TLS storage, which I'm still thinking of how to develop it fully.

So this is kinda part two of the requirements in #1918, locking. This PR doesn't implement locking for renewals, based on your description, right? For that usecase you described, it sounds like a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up. Maybe that's out of scope for this PR though. A lock interface would help for implementing Consul or other storage drivers.

Right: this PR does not totally solve the issue of shared, managed certificates among multiple Caddy instances. However, it should make any problem virtually go away as long as the instances are staggered in their execution by a few seconds to a few minutes, and as long as there aren't too many of them (hundreds or more might be too many to be safe). The TLS storage stuff I am working on in my head should introduce locking and do it properly, but those will be plugins. This solution is baked right into Caddy and should "just work" for the vast majority of people who need something like it.

Maybe I'm getting confused by the terminology here, but does instance here mean individual running instances of Caddy or does it mean something smaller/internal (like an instance per site)? If the former, my understanding is that the old behaviour was a single cache for every instance running on the same machine? If the later, then was the old behaviour that each instance had its own cache (this seems like the ideal goal, so I'm leaning towards the former being the meaning)?

Each Caddy process instantiates a literal *Instance value which holds all the "servers" it is running. A single Caddy process can run more than 1 instance, for example momentarily during graceful reloads, or if being used as a library. This is uncommon but still possible, and scoping things to an Instance helps us avoid global variables and keep things more correct and efficient. Before, there was a single certificate cache for the whole PROCESS, meaning reloading the config and changing sites didn't always flush the certificate cache properly. Now, each Instance value has its own cache, so a reload will just dispose of the old cache.

So does each running instance now know about the other instances on the same machine? If so, neat!

In the same process, yes. If they have to know about them, they can. The updated certificate maintenance routines run one-per-process and have to iterate the list of Instances in order to access their individual certificate caches. Then they iterate the list of certificates and perform renewals, etc.

This sounds like a pretty good case for recommending using the new snippet syntax (#1921) to avoid duplicated TLS blocks.

Good point.

Thanks for your comments! I hope this makes a little more sense!

francislavoie · 2018-02-04T18:28:37Z

Ah, yeah that helps. I got a little excited that this would cover #1918 because there's a few scenarios I ran into that it would really help with, i.e. load balancing your load balancers, or having certs managed by the underlying instances that are being load balanced. That's currently not possible with Caddy because they don't share certs - if one renews, the other won't know about it unless they share the same filesystem.

elcore

Without further ado, I will approve this PR.

It is well written and documented!!

Have a wonderful day :)

nodesocket · 2018-02-13T06:45:53Z

a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up.

👍

That also means that incoming requests would be blocked until the TLS renew lock is removed right?

mholt · 2018-02-13T06:55:44Z

@nodesocket

a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up.

👍
That also means that incoming requests would be blocked until the TLS renew lock is removed right?

I wouldn't go too far with the quoted suggestion (no offense Francis, ha) -- what we'd rather have is a truly atomic "create-file-if-not-exists" operation, provided by the OS. Right now, Caddy does not utilize any such features, which is why cert management, even with this PR, is not truly synchronized.

And no, incoming requests are NOT blocked while certificate maintenance happens. Maintenance happens in the background and is entirely non-blocking after Caddy starts up.

Also introduce caddy.OnProcessExit which is a list of functions that run before exiting the process cleanly; these do not count as shutdown callbacks, so they do not return errors and must execute quickly.

# Conflicts: # sigtrap_posix.go

mholt · 2018-02-13T20:29:25Z

@nodesocket I just pushed some changes which make Caddy synchronize its renewals using the file system, so multiple instances can run at the same time and safely coordinate renewals. You won't have to stagger the execution of Caddy processes anymore, and this should scale well to thousands of instances. Would you please test this and report back?

(NOTE: Use the DNS challenge when behind a load balancer.)

nodesocket · 2018-02-13T20:33:17Z

@mholt thanks for this. Awesome. Can you give a brief overview of how it works? I assume still need to use a shared file system like NFS (Amazon Elastic File System) correct? This is not doing any rsync magic is it?

mholt · 2018-02-13T20:37:52Z

@nodesocket It should "just work" - yes, it'll use the file system. No rsync; the .caddy folder (well, $CADDYPATH) must be locally mounted. This design may not be 100% perfect -- I'm guessing more like 99.5% or thereabouts, assuming there are no implementation bugs (ha!) -- but it doesn't need to be perfect, since rate limits have a little bit of tolerance.

By the way, if your company is relying on Caddy, I highly suggest ordering extended support. If you encounter any bugs, it'll ensure a quicker resolution for your business. 👍

nodesocket · 2018-02-13T23:54:24Z

@mholt can you provide instructions on how to build this? I assume I need to compile myself?

mholt · 2018-02-14T00:06:22Z

@nodesocket

can you provide instructions on how to build this? I assume I need to compile myself?

Yes :) Just like you've been building it up to now, except on this branch instead of master. (If you haven't been building from source, then you should get a commercial license if you haven't already.)

The build instructions are in the README -- just switch to this branch.

dschwar · 2018-02-14T18:19:14Z

Did some testing between this branch and master, with the following caddyfile

    subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key
    }
    *.subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key
    }
    :443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 10
        storage mysql
      }
    }
    :80 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 10
        storage mysql
      }
    }

I find that when I visit subdomain.host.com I get cert a let's encrypt certificate. On master this serves the certificate that I've provided.

mholt · 2018-02-14T18:25:09Z

@dschwar Thanks for trying it out! What's the storage mysql thing you have going on? That seems custom. Please try again without modifications so I can work on reproducing the issue. That will be helpful so we can be sure we're working with the same code base!

dschwar · 2018-02-14T18:59:30Z

After removing the mysql plugin, I tested master vs your branch. On master it loads the provided cert of subdomain.host.com, on this branch. it looks like it's attempting to get a LE cert, but fails. I probably don't have the filesystem configured properly though.

2018/02/14 18:52:18 [INFO] Obtaining new certificate for subdomain.host.com
2018/02/14 18:52:19 [INFO] acme: Registering account for email@host.com
2018/02/14 18:52:20 http: TLS handshake error from 10.111.0.2:15721: open /root/.caddy/acme/acme-v01.api.letsencrypt.org/sites/subdomain.host.com/subdomain.host.com.crt.lock: no such file or directory

The weird thing is, if I browse a site like secondary.subdomain.host.com, then subsequent requests to subdomain.host.com will be served using the provided cert.

mholt · 2018-02-14T19:08:03Z

Thanks for giving it another go! We're getting closer.

First thing -- and you should do this immediately -- use the Let's Encrypt staging endpoint for testing! Set it with the -ca flag. Or you may hit rate limits.

Are you running caddy as the same user as who owns that directory, etc? Also, is this subdomain new (i.e. does the containing folder listed in the error message already exist on disk)?

dschwar · 2018-02-14T19:18:06Z

Are you running caddy as the same user as who owns that directory, etc?

Yup

~/.caddy/acme/acme-v01.api.letsencrypt.org # ls -la
total 12
drwx------    3 root     root          4096 Feb 14 18:52 .
drwx------    3 root     root          4096 Feb 14 18:52 ..
drwx------    3 root     root          4096 Feb 14 18:52 users
~/.caddy/acme/acme-v01.api.letsencrypt.org # whoami
root
~/.caddy/acme/acme-v01.api.letsencrypt.org # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/bin/caddy --conf /caddy/Caddyfile --log stdout --email email@host.com --host subdomain.host.com

Also, is this subdomain new (i.e. does the containing folder listed in the error message already exist on disk)?

The containing folder doesn't exist (see output pasted above). Should caddy be trying to provision a LE cert for subdomain.host.com since that cert is part of the tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key?

mholt · 2018-02-14T19:40:17Z

Now we're getting somewhere. So there's 2 things going on: opening a lock file where the parent dir doesn't exist, and the subdomain thing. I'm looking into it...

mholt · 2018-02-14T20:14:47Z

@dschwar PS. If you want to help dive into the code and help debug it (since I have to work to reproduce it) -- specifically help with the problem of serving the wrong certificate -- I would really appreciate it. And it would help us get it fixed sooner. You'd want to look in caddytls/handshake.go -- probably putting some prints in some of the getConfig or getCertificate functions should help!

nodesocket · 2018-02-15T05:10:54Z

@mholt Let's Encrypt staging is still timing-out for me. I can't believe others are not encountering this after 4+ hours. Just curious, are you testing from AWS EC2 instances? I am.

dschwar · 2018-02-15T05:12:57Z

@dschwar Thanks; I'm wondering, though, how [WARNING] TLS disabled for http://subdomain.host.com made its way into there from your Caddyfile. And also that there is a TLSConfig for "subdomain.host.com" at the same time. This isn't a behavior I'm observing...

This seems to happen because I have the command line -host arg set to subdomain.host.com.

I tried removing the -host command line arg, which gives me the log
[WARNING] TLS disabled for http://
and then subdomain.host.com is served using the appropriate provided certificate, but Let's Encrypt no longer provisions.

So I tried changing the -host arg to https://subdomain.host.com and now I get
WARNING] TLS disabled for http://https://subdomin.host.com

mholt · 2018-02-15T05:22:59Z

@nodesocket

Let's Encrypt staging is still timing-out for me. I can't believe others are not encountering this after 4+ hours. Just curious, are you testing from AWS EC2 instances? I am.

Nope, not using EC2. Thanks for commenting on their forums -- hopefully they'll find it soon. Not many people use staging except client developers I think.

@dschwar -- okay, I think what you'll need to do is simplify your config down to bare-bones. Run caddy without any extra command line args (except to set the staging endpoint!) and then run Caddy again with a simple Caddyfile. I can help debug it but only if I have all the information: how Caddy was run, what env variables are set, the Caddyfile, log output, requests that are made, etc -- all unredacted and unmodified. Otherwise, I can only give you tips to help you debug it on your end.

(The -host flag, per the docs specifies a default host to listen on; so if the hostname defined in the Caddyfile is empty, it will be filled in with -host. And https://... is not a host, of course. Remember, Caddy is very easy to use, but has enough power that you can easily shoot yourself in the foot. Unlike other web servers, with Caddy, you'll want to trim back the config when things start getting complicated.)

dschwar · 2018-02-15T05:45:50Z

@mholt I can test this tomorrow but my current hypothesis is that if the ‘-host’ flag is supplied and there is a Caddyfile with a supplied tls cert for that host and a wildcard on-demand config, then when you visit the host, it will choose to generate an on demand cert.

…

On Feb 15, 2018, at 12:23 AM, Matt Holt ***@***.***> wrote: @nodesocket Let's Encrypt staging is still timing-out for me. I can't believe others are not encountering this after 4+ hours. Just curious, are you testing from AWS EC2 instances? I am. Nope, not using EC2. Thanks for commenting on their forums -- hopefully they'll find it soon. Not many people use staging except client developers I think. @dschwar -- okay, I think what you'll need to do is simplify your config down to bare-bones. Run caddy without any extra command line args (except to set the staging endpoint!) and then run Caddy again with a simple Caddyfile. I can help debug it but only if I have all the information: how Caddy was run, what env variables are set, the Caddyfile, log output, requests that are made, etc -- all unredacted and unmodified. Otherwise, I can only give you tips to help you debug it on your end. (The -host flag, per the docs specifies a default host to listen on; so if the hostname defined in the Caddyfile is empty, it will be filled in with -host. And https://... is not a host, of course. Remember, Caddy is very easy to use, but has enough power that you can easily shoot yourself in the foot. Unlike other web servers, with Caddy, you'll want to trim back the config when things start getting complicated.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mholt · 2018-02-15T05:51:40Z

@dschwar Ahhhh, wait, it's only generating the certificate on-demand (i.e. not at startup)? Is subdomain.host.com that you visit in your browser the exact same one as the first site in your Caddyfile, subdomain.host.com:443? If so, then what you found may not be a bug with the TLS changes. If you set -host subdomain.host.com and then define a site in the Caddyfile with an empty hostname (:443 for example) then that site is assigned a default hostname of subdomain.host.com according to your flag. Thus there is a duplicate site name and that should be caught at startup. Let me know what you think.

dschwar · 2018-02-15T06:05:50Z

@dschwar Ahhhh, wait, it's only generating the certificate on-demand (i.e. not at startup)?

Yup, only on demand. Should have mentioned that earlier, sorry.

Is subdomain.host.com that you visit in your browser the exact same one as the first site in your Caddyfile, subdomain.host.com:443?

Yup. When redacting, I made sure everything matches up and is in the same form.

hostname (:443 for example)

I thought a :443 would give me a wildcard host? So that I can generate certs for any domain that points at my server (hence why I redact, don’t want spam)

…

On Feb 15, 2018, at 12:51 AM, Matt Holt ***@***.***> wrote: @dschwar Ahhhh, wait, it's only generating the certificate on-demand (i.e. not at startup)? Is subdomain.host.com that you visit in your browser the exact same one as the first site in your Caddyfile, subdomain.host.com:443? If so, then what you found may not be a bug with the TLS changes. If you set -host subdomain.host.com and then define a site in the Caddyfile with an empty hostname (:443 for example) then that site is assigned a default hostname of subdomain.host.com according to your flag. Thus there is a duplicate site name and that should be caught at startup. Let me know what you think. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

mholt · 2018-02-15T06:18:49Z

@dschwar Thanks for clarifying.

I thought a :443 would give me a wildcard host? So that I can generate certs for any domain that points at my server

Well, the -host flag sets the host when it is missing for site configurations. Generally, -host is used without a Caddyfile to even more quickly get a simple site up and running. The -host flag can be used in conjunction with one in some cases for convenience, but your case is not one of them.

(hence why I redact, don’t want spam)

Won't matter -- all hostnames with Let's Encrypt certificates go to public CT logs anyway. In fact, soon all certificates will need to be submitted to CT.

Anyway, I'm thinking of making your situation a "collision" at startup, because otherwise it's not obvious. In other words, using -host subdomain.host.com and with sites subdomain.host.com and :443 in your Caddyfile should be an error because the second site is indistinguishable from the first. Sound good?

See discussion on #2015 for how this situation was discovered. For a Caddyfile like this: localhost { ... } :2015 { ... } Running Caddy like this: caddy -host localhost Produces two sites both defined as `localhost:2015` because the flag changes the default host value to be `localhost`. This should be an error since the sites are not distinct and it is confusing. It can also cause issues with TLS handshakes loading the wrong cert, as the linked discussion shows.

mholt · 2018-02-15T07:05:55Z

@dschwar I've pushed that fix to this branch; feel free to try it out and let me know. You should now get an error when you try to start Caddy with the configuration you had. The reason you were getting the wrong certificate is because Caddy couldn't disambiguate between the two site configurations because of your -host flag: the sites weren't distinct, even though they had different configurations. So now we prevent this case, by raising an error at startup.

dschwar · 2018-02-15T15:09:09Z

@dschwar I've pushed that fix to this branch; feel free to try it out and let me know. You should now get an error when you try to start Caddy with the configuration you had. The reason you were getting the wrong certificate is because Caddy couldn't disambiguate between the two site configurations because of your -host flag: the sites weren't distinct, even though they had different configurations. So now we prevent this case, by raising an error at startup.

I think I'm missing something now. I've removed the -host subdomain.host.com arg and now when I visit www.davesdomain.com which CNAME's to subdomain.host.com, Chrome gives me a This site can’t provide a secure connection

and the logs show

2018/02/15 14:49:29 [INFO] Successfully loaded TLS assets from /star-subdomain-hostcom-cert/tls.crt and /star-subdomain-hostcom-cert/tls.key
2018/02/15 14:49:29 [INFO] Successfully loaded TLS assets from /star-subdomain-hostcom-cert/tls.crt and /star-subdomain-hostcom-cert/tls.key
Activating privacy features... done.
2018/02/15 14:49:29 [WARNING] TLS disabled for http://
https://subdomain.host.com
2018/02/15 14:49:29 https://subdomain.host.com
https://*.subdomain.host.com
2018/02/15 14:49:29 https://*.subdomain.host.com
https://
2018/02/15 14:49:29 https://
http://
2018/02/15 14:49:29 http://
2018/02/15 14:52:30 [INFO] getCertificate for name: www.davesdomain.com
(map[string]string) (len=2) {
 (string) (len=32) "*.subdomain.host.com": (string) (len=64) "41z9d420f85af211afdsfsdgdsfhfghfgdhjghjghfkhjkjhkjhgkjhgkjhgc122b7",
 (string) (len=30) "subdomain.host.com": (string) (len=64) "41z9d420f85af211afdsfsdgdsfhfghfgdhjghjghfkhjkjhkjhgkjhgkjhgc122b7"
}
2018/02/15 14:52:30 [INFO] onDemand: false, loadIfNecessary: true, obtainIfNecessary: true
2018/02/15 14:52:30 http: TLS handshake error from 10.234.0.25:23064: no certificate available for www.davesdomain.com

mholt · 2018-02-15T15:18:47Z

@dschwar Thanks, I think I'm finally onto a fix. Just a hunch, can you verify that replacing the site named :443 in your Caddyfile with *:443 acts different?

dschwar · 2018-02-15T15:38:58Z

Just a hunch, can you verify that replacing the site named :443 in your Caddyfile with *:443 acts different?

Seems to do the same thing

See discussion on #2015; the initial change had removed this check, and I can't remember why I removed it or if it was accidental. Anyway, it's back now.

mholt · 2018-02-15T15:49:23Z

@dschwar Hmm well that's discouraging. Are you sure? If you could pull the latest and try again -- with and without the *, that would be good. I was able to reproduce what I think is the issue, in that it was getting the wrong config for the name. If you could try it after pulling my latest commit, then let me know, that would be great!

dschwar · 2018-02-15T16:05:42Z

It works!

without the *

Caddyfile

    subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-hostcom-cert/tls.crt /star-subdomain-hostcom-cert/tls.key
    }
    *.subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-hostcom-cert/tls.crt /star-subdomain-hostcom-cert/tls.key
    }
    :443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }
    :80 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

Thanks so much 🎉

mholt · 2018-02-15T16:07:43Z

@dschwar Great!! It should work with the * as well.

I'll merge this PR as Go 1.10 is released, unless someone speaks up with another bug report before then.

dschwar · 2018-02-15T16:18:52Z

It should work with the * as well.

Testing that (for fun), when you say it should with *, do you mean

    *:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

or

    * {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

I just attempted the first one and it doesn't work for me.

mholt · 2018-02-15T16:44:19Z

The first one. That should work, (it works for me), so if you want to investigate feel free!

dschwar · 2018-02-15T20:52:46Z

The first one. That should work, (it works for me), so if you want to investigate feel free!

It seems to work if I remove the subdomain.host.com:443 and *.subdomain.host.com:443 from my Caddyfile, the resulting Caddyfile becomes

*:443 {
  proxy / web:8000 {
    transparent
    header_upstream X-Request-Start-Edge {>X-Request-Start}
    header_upstream X-Request-Start "t={when_unix}000"
  }
  tls {
    max_certs 5000
  }
}
*:80 {
  proxy / web:8000 {
    transparent
    header_upstream X-Request-Start-Edge {>X-Request-Start}
    header_upstream X-Request-Start "t={when_unix}000"
  }
  tls {
    max_certs 5000
  }
}

with the command line args
["--conf", "/caddy/Caddyfile", "--log", "stdout", "--email", "myemail@host.com", "--ca", "https://acme-staging.api.letsencrypt.org/directory"]

Only strip the port from the Location URL value if the port is NOT the HTTPSPort (before, we compared against DefaultHTTPSPort instead of HTTPSPort). The HTTPSPort can be changed, but is done so for port forwarding, since in reality you can't 'change' the standard HTTPS port, you can only forward it.

mholt · 2018-02-16T19:37:03Z

@dschwar Ah, you know what, I'm sorry -- that is correct and expected behavior; I was wrong before. Wildcard characters (*) only match a single domain label (e.g. sub.domain.com cannot be matched by * but or even *.* but can be matched by *.*.*), as per the RFC related to wildcard SSL certificates.

So actually I think that remaining "bug" is a non-issue. (PS. You can't serve tls on port 80 with Caddy, so you can remove that from your configuration.) However, the investigation led me to fix what I think was some logical inconsistencies in the code, so it was all good. I'm gonna merge this real soon. Probably today.

mholt added the under review 🧐 Review is pending before merging label Feb 4, 2018

mholt added this to the 0.10.11 milestone Feb 4, 2018

mholt requested review from FiloSottile and elcore February 4, 2018 08:09

mholt mentioned this pull request Feb 4, 2018

Upcoming change in Caddy will require a change in caddy-net pieterlouw/caddy-net#5

Closed

elcore approved these changes Feb 9, 2018

View reviewed changes

mholt removed the under review 🧐 Review is pending before merging label Feb 10, 2018

mholt mentioned this pull request Feb 13, 2018

Feature request: randomize TLS Let's Encrypt renewal logic #2022

Closed

mholt added 3 commits February 13, 2018 13:23

tls: Synchronize renewals between Caddy instances sharing file storage

0802871

Also introduce caddy.OnProcessExit which is a list of functions that run before exiting the process cleanly; these do not count as shutdown callbacks, so they do not return errors and must execute quickly.

Merge branch 'master' into cert-cache

f26447e

# Conflicts: # sigtrap_posix.go

sigtrap: Ensure cleanup actions happen before too many things go wrong

4b2e222

tls: Ensure parent dir exists before creating lock file

ef585ed

pieterlouw mentioned this pull request Feb 15, 2018

TLS support is broken pieterlouw/caddy-net#6

Closed

tls: Try empty name if no matches for getting config during handshake

896dc6b

See discussion on #2015; the initial change had removed this check, and I can't remember why I removed it or if it was accidental. Anyway, it's back now.

Merge branch 'master' into cert-cache

4704a56

mholt added 2 commits February 16, 2018 12:05

tls: Fix HTTP->HTTPS redirects and HTTP challenge when using custom port

8db80c4

mholt mentioned this pull request Feb 16, 2018

Consul KV and Vault for TLS certificate storage #1918

Closed

mholt merged commit 986d4ff into master Feb 16, 2018

mholt deleted the cert-cache branch February 16, 2018 19:49

mholt mentioned this pull request Feb 22, 2018

onevent bug: need to deregister previous hooks after graceful restart #2044

Closed

mholt mentioned this pull request Nov 12, 2018

caddytls: Raise TLS alert if no certificate matches SAN (closes #1303) #2339

Merged

tatianab mentioned this pull request Nov 8, 2023

x/vulndb: potential Go vuln in github.com/mholt/caddy: CVE-2018-19148 golang/vulndb#2217

Closed

tls: Restructure and improve certificate management #2015

tls: Restructure and improve certificate management #2015

Conversation

mholt commented Feb 4, 2018 • edited Loading

1. What does this change do, exactly?

2. Please link to the relevant issues.

3. Which documentation changes (if any) need to be made because of this PR?

4. Checklist

francislavoie commented Feb 4, 2018

mholt commented Feb 4, 2018

francislavoie commented Feb 4, 2018

elcore left a comment

Choose a reason for hiding this comment

nodesocket commented Feb 13, 2018 • edited Loading

mholt commented Feb 13, 2018 • edited Loading

mholt commented Feb 13, 2018 • edited Loading

nodesocket commented Feb 13, 2018 • edited Loading

mholt commented Feb 13, 2018 • edited Loading

nodesocket commented Feb 13, 2018

mholt commented Feb 14, 2018

dschwar commented Feb 14, 2018 • edited Loading

mholt commented Feb 14, 2018 • edited Loading

dschwar commented Feb 14, 2018

mholt commented Feb 14, 2018

dschwar commented Feb 14, 2018

mholt commented Feb 14, 2018

mholt commented Feb 14, 2018

nodesocket commented Feb 15, 2018

dschwar commented Feb 15, 2018

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018 via email

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018 via email

mholt commented Feb 15, 2018 • edited Loading

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018 • edited Loading

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018

mholt commented Feb 15, 2018

dschwar commented Feb 15, 2018

mholt commented Feb 16, 2018 • edited Loading

mholt commented Feb 4, 2018 •

edited

Loading

nodesocket commented Feb 13, 2018 •

edited

Loading

mholt commented Feb 13, 2018 •

edited

Loading

mholt commented Feb 13, 2018 •

edited

Loading

nodesocket commented Feb 13, 2018 •

edited

Loading

mholt commented Feb 13, 2018 •

edited

Loading

dschwar commented Feb 14, 2018 •

edited

Loading

mholt commented Feb 14, 2018 •

edited

Loading

mholt commented Feb 15, 2018 •

edited

Loading

dschwar commented Feb 15, 2018 •

edited

Loading

mholt commented Feb 16, 2018 •

edited

Loading