Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: Restructure and improve certificate management #2015

Merged
merged 10 commits into from
Feb 16, 2018
Merged

tls: Restructure and improve certificate management #2015

merged 10 commits into from
Feb 16, 2018

Conversation

mholt
Copy link
Member

@mholt mholt commented Feb 4, 2018

Please help test and review this PR!

1. What does this change do, exactly?

Most of the changes are internal to the code, but there are bug fixes and a couple minor breaking changes for server type plugins that use the caddytls package.

  • Expose the list of Caddy instances through caddy.Instances()

  • Added arbitrary storage to caddy.Instance

  • The cache of loaded certificates is no longer global; now scoped
    per-instance, meaning upon reload (like SIGUSR1) the old cert cache
    will be discarded entirely, whereas before, aggressively reloading
    config that added and removed lots of sites would cause unnecessary
    build-up in the cache over time.

  • Key certificates in the cache by their SHA-256 hash instead of
    by their names. This means certificates will not be duplicated in
    memory (within each instance), making Caddy much more memory-efficient
    for large-scale deployments with thousands of sites sharing certs.

  • Perform name-to-certificate lookups scoped per caddytls.Config instead
    of a single global lookup. This prevents certificates from stepping on
    each other when they overlap in their names.

  • Do not allow TLS configurations keyed by the same hostname to be
    different; this now throws an error.

  • Updated relevant tests, with a stark awareness that more tests are
    needed.

  • Change the NewContext function signature to include an *Instance.

  • Strongly recommend (basically require) use of caddytls.NewConfig()
    to create a new *caddytls.Config, to ensure pointers to the instance
    certificate cache are initialized properly.

  • Update the TLS-SNI challenge solver (even though TLS-SNI is disabled
    currently on the CA side). Store temporary challenge cert in instance
    cache, but do so directly by the ACME challenge name, not the hash.
    Modified the getCertificate function to check the cache directly for
    a name match if one isn't found otherwise. This will allow any
    caddytls.Config to be able to help solve a TLS-SNI challenge, with one
    extra side-effect that might actually be kind of interesting (and
    useless): clients could send a certificate's hash as the SNI and
    Caddy would be able to serve that certificate for the handshake.

  • Do not attempt to match a "default" (random) certificate when SNI
    is present but unrecognized; return no certificate so a TLS alert
    happens instead.

  • Store an Instance in the list of instances even while the instance
    is still starting up (this allows access to the cert cache for
    performing renewals at startup, etc). Will be removed from list again
    if instance startup fails.

  • Laid groundwork for ACMEv2 and Let's Encrypt wildcard support.

Server type plugins will need to be updated slightly to accommodate
minor adjustments to their API (like passing in an Instance). This
commit includes the changes for the HTTP server.

Certain Caddyfile configurations might error out with this change, if
they configured different TLS settings for the same hostname.

This change trades some complexity for other complexity, but ultimately
this new complexity is more correct and robust than earlier logic.

2. Please link to the relevant issues.

Fixes #1991
Fixes #1994
Fixes #1303

3. Which documentation changes (if any) need to be made because of this PR?

The tls directive page will definitely need updating. TLS configurations for equal hostnames cannot be different. Any description about how certificates are stored internally might need to change.

OH, and something I forgot on the original commit message: Caddy can now better support being used in conjunction with other Caddy instances that share the same certificate storage. In other words, if you have multiple instances all using the same $CADDYPATH, that's been a bad idea because they will all try to manage the certificates themselves (keep them renewed, etc, which runs you into rate limits fast). Now, Caddy will attempt to check if the certificate in storage still needs to be renewed just before it fires off a renewal. So if you stagger Caddy deployments by more time than it takes to complete an ACME challenge (usually just a few seconds, but if you use DNS challenge maybe minutes or more), they might play better together in an acceptable way, for small-medium scale use cases. I wouldn't rely on this for high-volume deployments (like, hundreds or thousands of instances sharing the same certificate storage and all trying to manage them) but I bet it'll be better for dozens of Caddy instances!

** UPDATE **: The latest changes I pushed to this branch now eliminate the need to stagger the timing of running each Caddy instance. If multiple Caddy instances (even thousands) share the same certificate storage on disk, they will coordinate renewals and not step on each others' toes. It just requires using the DNS challenge when behind a load balancer. However, testing this is a little tricky, so we'll have to see how it goes. It's an advanced use case, so I'm not expecting many people to be benefitted by it, but I hope they will let us know how it goes!

4. Checklist

  • I have written tests and verified that they fail without my change
  • I have squashed any insignificant commits
  • This change has comments for package types, values, functions, and non-obvious lines of code
  • I am willing to help maintain this change if there are issues with it later

/cc @pieterlouw and @miekg -- I'll be getting in touch with you soon with more details for what this means about your server type plugins. I can probably submit a PR to caddy-net, and I'm not sure that CoreDNS will have to change much except a line where NewContext() is invoked.

- Expose the list of Caddy instances through caddy.Instances()

- Added arbitrary storage to caddy.Instance

- The cache of loaded certificates is no longer global; now scoped
  per-instance, meaning upon reload (like SIGUSR1) the old cert cache
  will be discarded entirely, whereas before, aggressively reloading
  config that added and removed lots of sites would cause unnecessary
  build-up in the cache over time.

- Key certificates in the cache by their SHA-256 hash instead of
  by their names. This means certificates will not be duplicated in
  memory (within each instance), making Caddy much more memory-efficient
  for large-scale deployments with thousands of sites sharing certs.

- Perform name-to-certificate lookups scoped per caddytls.Config instead
  of a single global lookup. This prevents certificates from stepping on
  each other when they overlap in their names.

- Do not allow TLS configurations keyed by the same hostname to be
  different; this now throws an error.

- Updated relevant tests, with a stark awareness that more tests are
  needed.

- Change the NewContext function signature to include an *Instance.

- Strongly recommend (basically require) use of caddytls.NewConfig()
  to create a new *caddytls.Config, to ensure pointers to the instance
  certificate cache are initialized properly.

- Update the TLS-SNI challenge solver (even though TLS-SNI is disabled
  currently on the CA side). Store temporary challenge cert in instance
  cache, but do so directly by the ACME challenge name, not the hash.
  Modified the getCertificate function to check the cache directly for
  a name match if one isn't found otherwise. This will allow any
  caddytls.Config to be able to help solve a TLS-SNI challenge, with one
  extra side-effect that might actually be kind of interesting (and
  useless): clients could send a certificate's hash as the SNI and
  Caddy would be able to serve that certificate for the handshake.

- Do not attempt to match a "default" (random) certificate when SNI
  is present but unrecognized; return no certificate so a TLS alert
  happens instead.

- Store an Instance in the list of instances even while the instance
  is still starting up (this allows access to the cert cache for
  performing renewals at startup, etc). Will be removed from list again
  if instance startup fails.

- Laid groundwork for ACMEv2 and Let's Encrypt wildcard support.

Server type plugins will need to be updated slightly to accommodate
minor adjustments to their API (like passing in an Instance). This
commit includes the changes for the HTTP server.

Certain Caddyfile configurations might error out with this change, if
they configured different TLS settings for the same hostname.

This change trades some complexity for other complexity, but ultimately
this new complexity is more correct and robust than earlier logic.

Fixes #1991
Fixes #1994
Fixes #1303
@mholt mholt added the under review 🧐 Review is pending before merging label Feb 4, 2018
@mholt mholt added this to the 0.10.11 milestone Feb 4, 2018
@mholt mholt requested review from FiloSottile and elcore February 4, 2018 08:09
@francislavoie
Copy link
Member

I had a few questions/ideas as to what this means for Caddy in the future. I'm not really reviewing the code, so I'm sorry if I'm asking things that are painfully obvious by looking at the code.

  • Added arbitrary storage to caddy.Instance

This means we'll now have support for plugins adding their own storage drivers, correct? E.g. Consul (see #1918) - for situations where Caddy is a load balancer, there needs to be a way to ensure instances don't conflict with eachother when attempting to renew. Which brings me to my next question:

Caddy can now better support being used in conjunction with other Caddy instances that share the same certificate storage. In other words, if you have multiple instances all using the same $CADDYPATH, that's been a bad idea because they will all try to manage the certificates themselves (keep them renewed, etc, which runs you into rate limits fast). Now, Caddy will attempt to check if the certificate in storage still needs to be renewed just before it fires off a renewal. So if you stagger Caddy deployments by more time than it takes to complete an ACME challenge (usually just a few seconds, but if you use DNS challenge maybe minutes or more), they might play better together in an acceptable way, for small-medium scale use cases. I wouldn't rely on this for high-volume deployments (like, hundreds or thousands of instances sharing the same certificate storage and all trying to manage them) but I bet it'll be better for dozens of Caddy instances!

So this is kinda part two of the requirements in #1918, locking. This PR doesn't implement locking for renewals, based on your description, right? For that usecase you described, it sounds like a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up. Maybe that's out of scope for this PR though. A lock interface would help for implementing Consul or other storage drivers.

  • The cache of loaded certificates is no longer global; now scoped
    per-instance

Maybe I'm getting confused by the terminology here, but does instance here mean individual running instances of Caddy or does it mean something smaller/internal (like an instance per site)? If the former, my understanding is that the old behaviour was a single cache for every instance running on the same machine? If the later, then was the old behaviour that each instance had its own cache (this seems like the ideal goal, so I'm leaning towards the former being the meaning)?

I ask that, because:

  • Expose the list of Caddy instances through caddy.Instances()

So does each running instance now know about the other instances on the same machine? If so, neat!

The tls directive page will definitely need updating. TLS configurations for equal hostnames cannot be different. Any description about how certificates are stored internally might need to change.

This sounds like a pretty good case for recommending using the new snippet syntax (#1921) to avoid duplicated TLS blocks.

@mholt
Copy link
Member Author

mholt commented Feb 4, 2018

@francislavoie Great thoughts, let me respond to each one:

This means we'll now have support for plugins adding their own storage drivers, correct? E.g. Consul (see #1918) - for situations where Caddy is a load balancer, there needs to be a way to ensure instances don't conflict with eachother when attempting to renew

No, sorry for the confusion. This storage is simply a map[interface{}]interface{} that allows plugins to store things scoped to an Instance, instead of globally. This makes for better correctness and efficiency with regards to reloads (SIGUSR1, etc). Our certificate cache is stored in Instance storage instead of globals now. This is not the same as TLS storage, which I'm still thinking of how to develop it fully.

So this is kinda part two of the requirements in #1918, locking. This PR doesn't implement locking for renewals, based on your description, right? For that usecase you described, it sounds like a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up. Maybe that's out of scope for this PR though. A lock interface would help for implementing Consul or other storage drivers.

Right: this PR does not totally solve the issue of shared, managed certificates among multiple Caddy instances. However, it should make any problem virtually go away as long as the instances are staggered in their execution by a few seconds to a few minutes, and as long as there aren't too many of them (hundreds or more might be too many to be safe). The TLS storage stuff I am working on in my head should introduce locking and do it properly, but those will be plugins. This solution is baked right into Caddy and should "just work" for the vast majority of people who need something like it.

Maybe I'm getting confused by the terminology here, but does instance here mean individual running instances of Caddy or does it mean something smaller/internal (like an instance per site)? If the former, my understanding is that the old behaviour was a single cache for every instance running on the same machine? If the later, then was the old behaviour that each instance had its own cache (this seems like the ideal goal, so I'm leaning towards the former being the meaning)?

Each Caddy process instantiates a literal *Instance value which holds all the "servers" it is running. A single Caddy process can run more than 1 instance, for example momentarily during graceful reloads, or if being used as a library. This is uncommon but still possible, and scoping things to an Instance helps us avoid global variables and keep things more correct and efficient. Before, there was a single certificate cache for the whole PROCESS, meaning reloading the config and changing sites didn't always flush the certificate cache properly. Now, each Instance value has its own cache, so a reload will just dispose of the old cache.

So does each running instance now know about the other instances on the same machine? If so, neat!

In the same process, yes. If they have to know about them, they can. The updated certificate maintenance routines run one-per-process and have to iterate the list of Instances in order to access their individual certificate caches. Then they iterate the list of certificates and perform renewals, etc.

This sounds like a pretty good case for recommending using the new snippet syntax (#1921) to avoid duplicated TLS blocks.

Good point.

Thanks for your comments! I hope this makes a little more sense!

@francislavoie
Copy link
Member

Ah, yeah that helps. I got a little excited that this would cover #1918 because there's a few scenarios I ran into that it would really help with, i.e. load balancing your load balancers, or having certs managed by the underlying instances that are being load balanced. That's currently not possible with Caddy because they don't share certs - if one renews, the other won't know about it unless they share the same filesystem.

Copy link
Collaborator

@elcore elcore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without further ado, I will approve this PR.

It is well written and documented!!

Have a wonderful day :)

@mholt mholt removed the under review 🧐 Review is pending before merging label Feb 10, 2018
@nodesocket
Copy link

nodesocket commented Feb 13, 2018

a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up.

👍

That also means that incoming requests would be blocked until the TLS renew lock is removed right?

@mholt
Copy link
Member Author

mholt commented Feb 13, 2018

@nodesocket

a lock file would be ideal (that could contain a timestamp as well to indicate the last time renewal was attempted) to alert other instances that renewal is already happening, and those other instances could give up.

👍
That also means that incoming requests would be blocked until the TLS renew lock is removed right?

I wouldn't go too far with the quoted suggestion (no offense Francis, ha) -- what we'd rather have is a truly atomic "create-file-if-not-exists" operation, provided by the OS. Right now, Caddy does not utilize any such features, which is why cert management, even with this PR, is not truly synchronized.

And no, incoming requests are NOT blocked while certificate maintenance happens. Maintenance happens in the background and is entirely non-blocking after Caddy starts up.

Also introduce caddy.OnProcessExit which is a list of functions that
run before exiting the process cleanly; these do not count as shutdown
callbacks, so they do not return errors and must execute quickly.
# Conflicts:
#	sigtrap_posix.go
@mholt
Copy link
Member Author

mholt commented Feb 13, 2018

@nodesocket I just pushed some changes which make Caddy synchronize its renewals using the file system, so multiple instances can run at the same time and safely coordinate renewals. You won't have to stagger the execution of Caddy processes anymore, and this should scale well to thousands of instances. Would you please test this and report back?

(NOTE: Use the DNS challenge when behind a load balancer.)

@nodesocket
Copy link

nodesocket commented Feb 13, 2018

@mholt thanks for this. Awesome. Can you give a brief overview of how it works? I assume still need to use a shared file system like NFS (Amazon Elastic File System) correct? This is not doing any rsync magic is it?

@mholt
Copy link
Member Author

mholt commented Feb 13, 2018

@nodesocket It should "just work" - yes, it'll use the file system. No rsync; the .caddy folder (well, $CADDYPATH) must be locally mounted. This design may not be 100% perfect -- I'm guessing more like 99.5% or thereabouts, assuming there are no implementation bugs (ha!) -- but it doesn't need to be perfect, since rate limits have a little bit of tolerance.

By the way, if your company is relying on Caddy, I highly suggest ordering extended support. If you encounter any bugs, it'll ensure a quicker resolution for your business. 👍

@nodesocket
Copy link

@mholt can you provide instructions on how to build this? I assume I need to compile myself?

@mholt
Copy link
Member Author

mholt commented Feb 14, 2018

@nodesocket

can you provide instructions on how to build this? I assume I need to compile myself?

Yes :) Just like you've been building it up to now, except on this branch instead of master. (If you haven't been building from source, then you should get a commercial license if you haven't already.)

The build instructions are in the README -- just switch to this branch.

@dschwar
Copy link

dschwar commented Feb 14, 2018

Did some testing between this branch and master, with the following caddyfile

    subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key
    }
    *.subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key
    }
    :443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 10
        storage mysql
      }
    }
    :80 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 10
        storage mysql
      }
    }

I find that when I visit subdomain.host.com I get cert a let's encrypt certificate. On master this serves the certificate that I've provided.

@mholt
Copy link
Member Author

mholt commented Feb 14, 2018

@dschwar Thanks for trying it out! What's the storage mysql thing you have going on? That seems custom. Please try again without modifications so I can work on reproducing the issue. That will be helpful so we can be sure we're working with the same code base!

@dschwar
Copy link

dschwar commented Feb 14, 2018

After removing the mysql plugin, I tested master vs your branch. On master it loads the provided cert of subdomain.host.com, on this branch. it looks like it's attempting to get a LE cert, but fails. I probably don't have the filesystem configured properly though.

2018/02/14 18:52:18 [INFO] Obtaining new certificate for subdomain.host.com
2018/02/14 18:52:19 [INFO] acme: Registering account for email@host.com
2018/02/14 18:52:20 http: TLS handshake error from 10.111.0.2:15721: open /root/.caddy/acme/acme-v01.api.letsencrypt.org/sites/subdomain.host.com/subdomain.host.com.crt.lock: no such file or directory

The weird thing is, if I browse a site like secondary.subdomain.host.com, then subsequent requests to subdomain.host.com will be served using the provided cert.

@mholt
Copy link
Member Author

mholt commented Feb 14, 2018

Thanks for giving it another go! We're getting closer.

First thing -- and you should do this immediately -- use the Let's Encrypt staging endpoint for testing! Set it with the -ca flag. Or you may hit rate limits.

Are you running caddy as the same user as who owns that directory, etc? Also, is this subdomain new (i.e. does the containing folder listed in the error message already exist on disk)?

@dschwar
Copy link

dschwar commented Feb 14, 2018

Are you running caddy as the same user as who owns that directory, etc?

Yup

~/.caddy/acme/acme-v01.api.letsencrypt.org # ls -la
total 12
drwx------    3 root     root          4096 Feb 14 18:52 .
drwx------    3 root     root          4096 Feb 14 18:52 ..
drwx------    3 root     root          4096 Feb 14 18:52 users
~/.caddy/acme/acme-v01.api.letsencrypt.org # whoami
root
~/.caddy/acme/acme-v01.api.letsencrypt.org # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/bin/caddy --conf /caddy/Caddyfile --log stdout --email email@host.com --host subdomain.host.com

Also, is this subdomain new (i.e. does the containing folder listed in the error message already exist on disk)?

The containing folder doesn't exist (see output pasted above). Should caddy be trying to provision a LE cert for subdomain.host.com since that cert is part of the tls /star-subdomain-host-cert/tls.crt /star-subdomain-host-cert/tls.key?

@mholt
Copy link
Member Author

mholt commented Feb 14, 2018

Now we're getting somewhere. So there's 2 things going on: opening a lock file where the parent dir doesn't exist, and the subdomain thing. I'm looking into it...

@mholt
Copy link
Member Author

mholt commented Feb 14, 2018

@dschwar PS. If you want to help dive into the code and help debug it (since I have to work to reproduce it) -- specifically help with the problem of serving the wrong certificate -- I would really appreciate it. And it would help us get it fixed sooner. You'd want to look in caddytls/handshake.go -- probably putting some prints in some of the getConfig or getCertificate functions should help!

@nodesocket
Copy link

@mholt Let's Encrypt staging is still timing-out for me. I can't believe others are not encountering this after 4+ hours. Just curious, are you testing from AWS EC2 instances? I am.

@dschwar
Copy link

dschwar commented Feb 15, 2018

@dschwar Thanks; I'm wondering, though, how [WARNING] TLS disabled for http://subdomain.host.com made its way into there from your Caddyfile. And also that there is a TLSConfig for "subdomain.host.com" at the same time. This isn't a behavior I'm observing...

This seems to happen because I have the command line -host arg set to subdomain.host.com.

I tried removing the -host command line arg, which gives me the log
[WARNING] TLS disabled for http://
and then subdomain.host.com is served using the appropriate provided certificate, but Let's Encrypt no longer provisions.

So I tried changing the -host arg to https://subdomain.host.com and now I get
WARNING] TLS disabled for http://https://subdomin.host.com

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@nodesocket

Let's Encrypt staging is still timing-out for me. I can't believe others are not encountering this after 4+ hours. Just curious, are you testing from AWS EC2 instances? I am.

Nope, not using EC2. Thanks for commenting on their forums -- hopefully they'll find it soon. Not many people use staging except client developers I think.

@dschwar -- okay, I think what you'll need to do is simplify your config down to bare-bones. Run caddy without any extra command line args (except to set the staging endpoint!) and then run Caddy again with a simple Caddyfile. I can help debug it but only if I have all the information: how Caddy was run, what env variables are set, the Caddyfile, log output, requests that are made, etc -- all unredacted and unmodified. Otherwise, I can only give you tips to help you debug it on your end.

(The -host flag, per the docs specifies a default host to listen on; so if the hostname defined in the Caddyfile is empty, it will be filled in with -host. And https://... is not a host, of course. Remember, Caddy is very easy to use, but has enough power that you can easily shoot yourself in the foot. Unlike other web servers, with Caddy, you'll want to trim back the config when things start getting complicated.)

@dschwar
Copy link

dschwar commented Feb 15, 2018 via email

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar Ahhhh, wait, it's only generating the certificate on-demand (i.e. not at startup)? Is subdomain.host.com that you visit in your browser the exact same one as the first site in your Caddyfile, subdomain.host.com:443? If so, then what you found may not be a bug with the TLS changes. If you set -host subdomain.host.com and then define a site in the Caddyfile with an empty hostname (:443 for example) then that site is assigned a default hostname of subdomain.host.com according to your flag. Thus there is a duplicate site name and that should be caught at startup. Let me know what you think.

@dschwar
Copy link

dschwar commented Feb 15, 2018 via email

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar Thanks for clarifying.

I thought a :443 would give me a wildcard host? So that I can generate certs for any domain that points at my server

Well, the -host flag sets the host when it is missing for site configurations. Generally, -host is used without a Caddyfile to even more quickly get a simple site up and running. The -host flag can be used in conjunction with one in some cases for convenience, but your case is not one of them.

(hence why I redact, don’t want spam)

Won't matter -- all hostnames with Let's Encrypt certificates go to public CT logs anyway. In fact, soon all certificates will need to be submitted to CT.

Anyway, I'm thinking of making your situation a "collision" at startup, because otherwise it's not obvious. In other words, using -host subdomain.host.com and with sites subdomain.host.com and :443 in your Caddyfile should be an error because the second site is indistinguishable from the first. Sound good?

See discussion on #2015 for how this situation was discovered. For a
Caddyfile like this:

	localhost {
		...
	}
	:2015 {
		...
	}

Running Caddy like this:

	caddy -host localhost

Produces two sites both defined as `localhost:2015` because the flag
changes the default host value to be `localhost`. This should be an
error since the sites are not distinct and it is confusing. It can also
cause issues with TLS handshakes loading the wrong cert, as the linked
discussion shows.
@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar I've pushed that fix to this branch; feel free to try it out and let me know. You should now get an error when you try to start Caddy with the configuration you had. The reason you were getting the wrong certificate is because Caddy couldn't disambiguate between the two site configurations because of your -host flag: the sites weren't distinct, even though they had different configurations. So now we prevent this case, by raising an error at startup.

@dschwar
Copy link

dschwar commented Feb 15, 2018

@dschwar I've pushed that fix to this branch; feel free to try it out and let me know. You should now get an error when you try to start Caddy with the configuration you had. The reason you were getting the wrong certificate is because Caddy couldn't disambiguate between the two site configurations because of your -host flag: the sites weren't distinct, even though they had different configurations. So now we prevent this case, by raising an error at startup.

I think I'm missing something now. I've removed the -host subdomain.host.com arg and now when I visit www.davesdomain.com which CNAME's to subdomain.host.com, Chrome gives me a This site can’t provide a secure connection

and the logs show

2018/02/15 14:49:29 [INFO] Successfully loaded TLS assets from /star-subdomain-hostcom-cert/tls.crt and /star-subdomain-hostcom-cert/tls.key
2018/02/15 14:49:29 [INFO] Successfully loaded TLS assets from /star-subdomain-hostcom-cert/tls.crt and /star-subdomain-hostcom-cert/tls.key
Activating privacy features... done.
2018/02/15 14:49:29 [WARNING] TLS disabled for http://
https://subdomain.host.com
2018/02/15 14:49:29 https://subdomain.host.com
https://*.subdomain.host.com
2018/02/15 14:49:29 https://*.subdomain.host.com
https://
2018/02/15 14:49:29 https://
http://
2018/02/15 14:49:29 http://
2018/02/15 14:52:30 [INFO] getCertificate for name: www.davesdomain.com
(map[string]string) (len=2) {
 (string) (len=32) "*.subdomain.host.com": (string) (len=64) "41z9d420f85af211afdsfsdgdsfhfghfgdhjghjghfkhjkjhkjhgkjhgkjhgc122b7",
 (string) (len=30) "subdomain.host.com": (string) (len=64) "41z9d420f85af211afdsfsdgdsfhfghfgdhjghjghfkhjkjhkjhgkjhgkjhgc122b7"
}
2018/02/15 14:52:30 [INFO] onDemand: false, loadIfNecessary: true, obtainIfNecessary: true
2018/02/15 14:52:30 http: TLS handshake error from 10.234.0.25:23064: no certificate available for www.davesdomain.com

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar Thanks, I think I'm finally onto a fix. Just a hunch, can you verify that replacing the site named :443 in your Caddyfile with *:443 acts different?

@dschwar
Copy link

dschwar commented Feb 15, 2018

Just a hunch, can you verify that replacing the site named :443 in your Caddyfile with *:443 acts different?

Seems to do the same thing

See discussion on #2015; the initial change had removed this check, and
I can't remember why I removed it or if it was accidental. Anyway, it's
back now.
@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar Hmm well that's discouraging. Are you sure? If you could pull the latest and try again -- with and without the *, that would be good. I was able to reproduce what I think is the issue, in that it was getting the wrong config for the name. If you could try it after pulling my latest commit, then let me know, that would be great!

@dschwar
Copy link

dschwar commented Feb 15, 2018

It works!

without the *

Caddyfile

    subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-hostcom-cert/tls.crt /star-subdomain-hostcom-cert/tls.key
    }
    *.subdomain.host.com:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls /star-subdomain-hostcom-cert/tls.crt /star-subdomain-hostcom-cert/tls.key
    }
    :443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }
    :80 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

Thanks so much 🎉

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

@dschwar Great!! It should work with the * as well.

I'll merge this PR as Go 1.10 is released, unless someone speaks up with another bug report before then.

@dschwar
Copy link

dschwar commented Feb 15, 2018

It should work with the * as well.

Testing that (for fun), when you say it should with *, do you mean

    *:443 {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

or

    * {
      proxy / web:8000 {
        transparent
        header_upstream X-Request-Start-Edge {>X-Request-Start}
        header_upstream X-Request-Start "t={when_unix}000"
      }
      tls {
        max_certs 5000
      }
    }

I just attempted the first one and it doesn't work for me.

@mholt
Copy link
Member Author

mholt commented Feb 15, 2018

The first one. That should work, (it works for me), so if you want to investigate feel free!

@dschwar
Copy link

dschwar commented Feb 15, 2018

The first one. That should work, (it works for me), so if you want to investigate feel free!

It seems to work if I remove the subdomain.host.com:443 and *.subdomain.host.com:443 from my Caddyfile, the resulting Caddyfile becomes

*:443 {
  proxy / web:8000 {
    transparent
    header_upstream X-Request-Start-Edge {>X-Request-Start}
    header_upstream X-Request-Start "t={when_unix}000"
  }
  tls {
    max_certs 5000
  }
}
*:80 {
  proxy / web:8000 {
    transparent
    header_upstream X-Request-Start-Edge {>X-Request-Start}
    header_upstream X-Request-Start "t={when_unix}000"
  }
  tls {
    max_certs 5000
  }
}

with the command line args
["--conf", "/caddy/Caddyfile", "--log", "stdout", "--email", "myemail@host.com", "--ca", "https://acme-staging.api.letsencrypt.org/directory"]

Only strip the port from the Location URL value if the port is NOT the
HTTPSPort (before, we compared against DefaultHTTPSPort instead of
HTTPSPort). The HTTPSPort can be changed, but is done so for port
forwarding, since in reality you can't 'change' the standard HTTPS port,
you can only forward it.
@mholt
Copy link
Member Author

mholt commented Feb 16, 2018

@dschwar Ah, you know what, I'm sorry -- that is correct and expected behavior; I was wrong before. Wildcard characters (*) only match a single domain label (e.g. sub.domain.com cannot be matched by * but or even *.* but can be matched by *.*.*), as per the RFC related to wildcard SSL certificates.

So actually I think that remaining "bug" is a non-issue. (PS. You can't serve tls on port 80 with Caddy, so you can remove that from your configuration.) However, the investigation led me to fix what I think was some logical inconsistencies in the code, so it was all good. I'm gonna merge this real soon. Probably today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants