Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directive "log - output file" on one site address causes log output from another site address to dump to the console #4877

Closed
thx1111 opened this issue Jul 9, 2022 · 18 comments
Labels
discussion 💬 The right solution needs to be found
Milestone

Comments

@thx1111
Copy link

thx1111 commented Jul 9, 2022

caddy 2.5.1-1 on Arch Linux

From https://caddyserver.com/docs/caddyfile/concepts :

If you specify a hostname, only requests with a matching Host header will be honored. In other words, if the site address is localhost, then Caddy will not match requests to 127.0.0.1.

{
	admin unix//run/caddy/admin.socket
}
http://localhost {
	root * /home/http/htdocs/
	file_server browse
}
http://10.0.0.2 {
	root * /home/http/htdocs/public/
	file_server browse
	log {
		output file /var/log/caddy/accesslog {
			roll_size 10MiB
			roll_keep 4
		}
		format console {
			time_format rfc3339
			duration_format string
		}
}

Then, sudo caddy start --config /etc/caddy/Caddyfile and point the browser to http://127.0.0.1 . The browser will display a blank page, and the console from which "caddy start" was run will spew logging output for ..., "host": "127.0.0.1", ...!

Logging for the other sites, http://localhost and http://10.0.0.2, works as anticipated.

Log output to the console is not expected and not wanted.

Adding a gratuitous

http://127.0.0.1 {
}

to the Caddyfile changes "logs":{"logger_names":{"10.44.0.20":"log0"},"skip_hosts":["localhost"]} to "logs":{"logger_names":{"10.44.0.20":"log0"},"skip_hosts":["localhost","127.0.0.1"]} in the json config structure, which is an effective workaround.

But still, should that workaround be necessary? Or simply documented?

@mholt
Copy link
Member

mholt commented Jul 10, 2022

This is kind of working as intended 🤔

Here's what's happening: Caddy is listening on :80, which has access logging enabled. Logging is explicitly skipped only for localhost, because no other hostnames have been specified in the config. This is obvious when you run caddy adapt.

So if you add a localhost site without a log directive, it will be skipped for logging.

Basically, the behavior for logging for an unspecified host is undefined, so 🤷‍♂️

Technically, the server is doing what it is supposed to do. I can see how it's non-obvious though. Maybe we could tune the Caddyfile adapter here.

@mholt mholt added this to the 2.x milestone Jul 10, 2022
@mholt mholt added the discussion 💬 The right solution needs to be found label Jul 10, 2022
@mholt
Copy link
Member

mholt commented Jul 10, 2022

I wonder if the logging config should accept matchers like routes do. Right now we have this clunky skip-host-oriented thing...

@francislavoie
Copy link
Member

I think #4691 might solve this by providing a workaround. You could add skip_log to the sites you don't want logs from.

@mholt
Copy link
Member

mholt commented Jul 10, 2022

@francislavoie Hmm, but this issue already has a workaround (simply by specifying the site name you don't want logged). (It's late though so I might be misunderstanding.)

That does remind me I need to get around to that PR...

@thx1111
Copy link
Author

thx1111 commented Jul 10, 2022

With respect to #4689, #4690 and #4691, my expectation would be for something a little different. It seems more intuitive that there would exist an explicit "skip_hosts" option under the "log" directive, like:

log {
	output <writer_module> ...
	format <encoder_module> ...
	level  <level>
	skip_hosts <host_list>
}

such that the specified "host_list" would be in addition to the current, automatically determined, list of hosts for skip_hosts.

Reading at https://caddyserver.com/docs/caddyfile/matchers#syntax ,

In the Caddyfile, a matcher token immediately following the directive can limit that directive's scope. The matcher token can be one of these forms:

  1. * to match all requests (wildcard; default).
  2. /path start with a forward slash to match a request path.
  3. @name to specify a named matcher.

I don't really see the proposed skip_log <matcher> format as being conformant or consistent with the matcher syntax as defined. In particular, I don't see that any of the "Standard matchers" at https://caddyserver.com/docs/caddyfile/matchers#standard-matchers currently describes an independent list of host names and host addresses. Also, the existing host <hosts...> matcher would be especially confusing for any new Caddy user.

Wouldn't it be fairly straightforward to parse a skip_hosts option in the log directive and add host names and host addresses to the json skip_hosts key-value set?

@francislavoie
Copy link
Member

francislavoie commented Jul 10, 2022

What I was suggesting is that you would put the skip_log directive inside your localhost site block (with no matcher); that way all requests that reach that site block will not get logged.

I don't think skip_hosts is a good idea, because it breaks the "encapsulation" of a site block in the Caddyfile. It doesn't make sense to configure to skip a different host from within a log of a particular site, IMO.

@thx1111
Copy link
Author

thx1111 commented Jul 10, 2022

What I was suggesting is that you would put the skip_log directive inside your localhost site block (with no matcher); that way all requests that reach that site block will not get logged.

That part is not broken. There was no "localhost" site log being generated. Caddy did just what was promised. There is nothing to "fix" there.

I don't think skip_hosts is a good idea, because it breaks the "encapsulation" of a site block in the Caddyfile. It doesn't make sense to configure to skip a different host from within a log of a particular site, IMO.

Well, in that case, the workaround described - an "empty" site block for any site that was being improperly logged - already works fine. Since this remedy already works, the problem would be, instead, a documentation issue.

The thing that was awkward in my case was that the log for "127.0.0.1" was unexpectedly being dumped to the console. If, instead, there was a log for "127.0.0.1" being unexpectedly written to "/var/log/caddy/", that would not have been as much of a problem.

I was surprised to see that the json configuration had a "logging" block at the "top level", separate from any site definition below. The "logging" block itself included a "writer" block with an explicit "output": "file". So, I might have expected that any bug there would, again, have caused a log to be written to a file, as defined, as was already being done, properly, for the "10.0.0.2" site.

That just made the log dump to the console that much more unpredictable. It may be useful to determine why caddy was dumping to the console instead of creating an unexpected log file.

An alternative resolution could be to, instead of an explicit skip_hosts list under the Caddyfile log directive, have an explicit logger_names list of host names and host addresses for which to generate a log, in the manner described in the top level "logging" block. And then, the rule would be "only log sites that are explicitly listed", and there would be no possibility of "phantom" logging. Also, there would then be no need for skip_hosts, since all hosts would be automatically skipped unless actually listed under logger_names.

This approach would also generate logs for any site in the list even if that site were not anywhere otherwise defined in the Caddyfile. Of course, an actual log could only be generated if caddy was also binding to some interface having the corresponding site IP address or DNS hostname. Though still, that would also serve as a kind of "misconfiguration check", if any log were to unexpectedly appear for some site that should not exist, supposing that that "stray" site host had actually been added to the list for some reason.

For the moment, it seems prudent for an administrator to simply review the JSON config structure for logger_names and skip_hosts, to confirm that the sites listed are as expected. Ultimately, that is a caddy internal coding thing. The administrator just has to know how it works.

It just seems a bit counterintuitive at first, to have a distinct "logs" configuration block. If I understand, it is not possible for caddy to have different logging schemes for different sites, which otherwise seems would be possible if caddy were to keep a custom "logging" definition below each site "handler": "file_server" definition, instead of a single definition at the top level, applied to all sites. That approach would equally avoid any possibility of "phantom" logging. Separate site logging schemes could also be a Feature Request.

@francislavoie
Copy link
Member

The thing that was awkward in my case was that the log for "127.0.0.1" was unexpectedly being dumped to the console.

Ah my bad, misunderstood the problem then. Sorry.

I think what we could do is adjust the Caddyfile adapter to add a skip_log handler by default to the end of the routes for HTTP servers (i.e. non-HTTPS, because HTTPS requires the domain/SNI for the TLS handshake to complete, and logging happens after that... but no harm adding it there too probably) if access logging is enabled for any sites. That way any not-explicitly-configured sites will have their logs skipped. I'll need to think about how it would behave for an http:// site though.

I was surprised to see that the json configuration had a "logging" block at the "top level", separate from any site definition below.

That's because logging in Caddy is for all facilities, not just access logs. Access logs are namespaced as http.log.access. You can configure logging via global options to exclude certain namespaces (like say ignore all TLS logs -- not particularly useful but as an example) and change the writer, encoder, etc. See https://caddyserver.com/docs/logging for a dive into why we designed it that way.

@thx1111
Copy link
Author

thx1111 commented Jul 11, 2022

I think what we could do is adjust the Caddyfile adapter to add a skip_log handler by default to the end of the routes for HTTP servers (i.e. non-HTTPS, because HTTPS requires the domain/SNI for the TLS handshake to complete, and logging happens after that... but no harm adding it there too probably) if access logging is enabled for any sites. That way any not-explicitly-configured sites will have their logs skipped. I'll need to think about how it would behave for an http:// site though.

Would that have to be both a skip_log handler and a logger_names handler?

You cannot "skip" a site hostname or site IP address when you do not know that it exists, as in the case where "localhost" is defined and "127.0.0.1" is not defined. You would have to start with the rule "Do not log a site unless the site is defined", and only then, can you limit the set of defined sites with another rule saying "Do not log a defined site that is specified as a skip_hosts site".

Hmm - with a little more testing, I notice that caddy is listening on all interfaces! This is not actually documented at https://caddyserver.com/docs/getting-started , https://caddyserver.com/docs/command-line , https://caddyserver.com/docs/caddyfile-tutorial , https://caddyserver.com/docs/caddyfile/concepts , or especially, at https://caddyserver.com/docs/caddyfile/concepts#addresses ! This is totally surprising and actually counterintuitive behavior, given the priority that the Caddyfile gives to the "Site address", as the prefix to a site definition block!

The only comment about caddy binding to all interfaces is at https://caddyserver.com/docs/caddyfile/directives/bind , several layers down in the Caddy menu structure!

So now, I see that access to any interface not specifically defined in Caddyfile will cause caddy to spew logging data to the console! This has nothing to do with DNS/resolver mapping "localhost" to "127.0.0.1". Effectively, the bind directive is an essential directive for any site definition! And the bind directive does not even allow defining the port for the site! It appears to me that someone did not really think this interface/address/domain/host/port thing all the way through.

Caddy, then, has this awkward characteristic of requiring the same site address to be defined in two different places, even though one site address should be inferred from the other site address. This is not good. Rather, I would expect that, wherever it is that this bind directive variable is being used, the value should be inferred from the actual "site address", and that caddy should bind to only the interface associated with that particular site address. It is not clear to me why this would be done any other way. What is the use case?

That's because logging in Caddy is for all facilities, not just access logs. ... See https://caddyserver.com/docs/logging for a dive into why we designed it that way.

How Logging Works
...
Caddy is a log emitter. It does not consume logs, except for the minimum processing required to encode and write logs.
...

  • Too many logs are better than too few
  • Filtering is better than discarding
  • Defer encoding for greater flexibility and interoperability

Aha! Ok, thanks. It's a more general approach. Of course, that does not justify "wildcard" binding to all network interfaces, or justify dumping "phantom" logs to the console.

@francislavoie
Copy link
Member

Would that have to be both a skip_log handler and a logger_names handler?

No, logger_names is not a handler (a handler is an HTTP middleware).

You cannot "skip" a site hostname or site IP address when you do not know that it exists

Actually that's exactly what I'm suggesting we do, i.e. add the skip_log handler to the end of the HTTP routes, so that if no other HTTP route matches (i.e. by hostname like localhost or whatever), it falls through and hits skip_log.

So basically, taking this Caddyfile:

http://10.0.0.2 {
    log
    respond "ip"
}

http://localhost {
    respond "localhost"
}

which adapts to this JSON

{
  "apps": {
    "http": {
      "servers": {
        "srv0": {
          "listen": [
            ":80"
          ],
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "10.0.0.2"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "body": "ip",
                          "handler": "static_response"
                        }
                      ]
                    }
                  ]
                }
              ],
              "terminal": true
            },
            {
              "match": [
                {
                  "host": [
                    "localhost"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "body": "localhost",
                          "handler": "static_response"
                        }
                      ]
                    }
                  ]
                }
              ],
              "terminal": true
            }
          ],
          "logs": {
            "skip_hosts": [
              "localhost"
            ]
          }
        }
      }
    }
  }
}

I would make this change to the generated output (scroll down):

{
  "apps": {
    "http": {
      "servers": {
        "srv0": {
          "listen": [
            ":80"
          ],
          "routes": [
            {
              "match": [
                {
                  "host": [
                    "10.0.0.2"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "body": "ip",
                          "handler": "static_response"
                        }
                      ]
                    }
                  ]
                }
              ],
              "terminal": true
            },
            {
              "match": [
                {
                  "host": [
                    "localhost"
                  ]
                }
              ],
              "handle": [
                {
                  "handler": "subroute",
                  "routes": [
                    {
                      "handle": [
                        {
                          "body": "localhost",
                          "handler": "static_response"
                        }
                      ]
                    }
                  ]
                }
              ],
              "terminal": true
-            }
+            },
+            {
+              "handle": [
+                {
+                  "handler": "skip_log",
+                }
+              ]
+            }
          ],
          "logs": {
            "skip_hosts": [
              "localhost"
            ]
          }
        }
      }
    }
  }
}

Since the localhost and 10.0.0.2 routes are "terminal": true, they prevent skip_log from happening, but any not-explicitly-configured hosts will have their logs skipped.

Effectively, the bind directive is an essential directive for any site definition!

Not really. Most users actually want to bind to all interfaces. Like if you want to accept traffic from internal services running on the same machine, etc.

You can use the default_bind global option (recently added) to avoid needing to write it in each of your sites, if you need it.

This is not actually documented

You're right, we should document it in the site addresses section to mention the address is not the bind/listener. They are different things. The address produces a request matcher (which in turn affects whether automatic HTTPS triggers, and affects what the logger host will be, etc).

And the bind directive does not even allow defining the port for the site!

Yeah, that's not its job. The site address does the port part. The bind does the IP interface part.

It appears to me that someone did not really think this interface/address/domain/host/port thing all the way through.

No, we have, this is intended.

Caddy, then, has this awkward characteristic of requiring the same site address to be defined in two different places, even though one site address should be inferred from the other site address.

Like I said, the bind address and the domain/site address are not the same thing, and we cannot assume that they are the same thing. That's not a safe assumption to make.

@thx1111
Copy link
Author

thx1111 commented Jul 11, 2022

Actually that's exactly what I'm suggesting we do, i.e. add the skip_log handler to the end of the HTTP routes, so that if no other HTTP route matches (i.e. by hostname like localhost or whatever), it falls through and hits skip_log.
...
Since the localhost and 10.0.0.2 routes are "terminal": true, they prevent skip_log from happening, but any not-explicitly-configured hosts will have their logs skipped.

Ah! skip_log is different from skip_hosts, so "fall-through" should work. Sounds like a plan!

... the address is not the bind/listener. They are different things. The address produces a request matcher (which in turn affects whether automatic HTTPS triggers, and affects what the logger host will be, etc). ... The bind does the IP interface part.

Hmm - then, it would be possible to specify a specific bind address that associates with a specific interface, inside a site block having a different address which might not be associated with that same interface. And so, http requests to the site address will never be "heard" by caddy, listening on this unrelated bind address interface - yes?

What is the use case for that?

And related - why not simply specify the bind interface explicitly, by name, instead of using the implied interface, only specified indirectly, by some arbitrary IP address or DNS hostname?

Like if you want to accept traffic from internal services running on the same machine, etc.
You can use the default_bind global option (recently added) to avoid needing to write it in each of your sites, if you need it.

Thanks! I missed that one. Ok, that makes more sense, having the default_bind global option - why aren't these called "global directives"? - along with the site specific bind directive.

I suggest that this issue of interface binding should go "right up front" in the documentation, perhaps even being the first section linked under "Quick-starts", including mention of the distinction between "site address" and "bind interface". Interface binding is a pretty fundamental element of any protocol server, especially when that interface can include a unix socket.

On a tangent, being able to bind to a unix socket can be very useful when using Network Namespaces in Linux, when you might also want a local interface that is not defined by the network. http over a unix socket can also be useful with Docker Containers and simply for secure local server access, as in the example. Of course, a custom http client may be needed for unix socket access, but curl, for one, includes native unix socket support. These are some additional points that should be addressed in an "Interface Binding" section in the Caddy documentation.

Similarly, https://caddyserver.com/docs/api , does not currently provide any examples of http access over a unix socket, even though is does mention specifically "configuring the endpoint to bind to a permissioned unix socket".

@francislavoie
Copy link
Member

francislavoie commented Jul 12, 2022

Hmm - then, it would be possible to specify a specific bind address that associates with a specific interface, inside a site block having a different address which might not be associated with that same interface. And so, http requests to the site address will never be "heard" by caddy, listening on this unrelated bind address interface - yes?

Ish. If someone makes a request to 111.111.111.111 but puts Host: 222.222.222.222 in the headers, and you have a Caddy instance reachable at 111.111.111.111 with a site address like http://222.222.222.222 then it would respond. Because 222.222.222.222 is a host matcher there, and it matches. There's nothing that says that the Host header must match the address being connected to, if it's an IP address.

Obviously this is contrived, but the point stands, they are different concepts and binding them together is not a good idea because it would limit what people could do with Caddy. It's at different layers of the networking stack.

If you configured the site address with a domain (which is what most users do) then there's no IP interface that you can glean from that for bind.

I suggest that this issue of interface binding should go "right up front" in the documentation

I think you're greatly overestimating how much users care about this. It's not as important as you think it is. A huge majority of users don't care to configure bind. I'd say well below 1% ever configure it. Over the few years I've spent on the project and answering questions on the forums, we've probably seen a dozen people asking questions related to bind, compared to the thousands+ that have asked questions on the forums.

But like I said, we will adjust the Caddyfile Concepts page's Addresses section to mention it, you made a valid point there. But it absolutely doesn't make sense in the Getting Started guide. It's very much a technical detail that most people don't care about.

On a tangent, being able to bind to a unix socket can be very useful when using Network Namespaces in Linux

Yep, the bind docs have a unix socket example. I don't really think we need to dive deeper on that topic than that, honestly. If someone needs it, they'll find it there in the docs.

@thx1111
Copy link
Author

thx1111 commented Jul 13, 2022

Obviously this is contrived, but the point stands, they are different concepts and binding them together is not a good idea because it would limit what people could do with Caddy. It's at different layers of the networking stack.

Yes, I was conceding that point. That's why I suggested, simply specify the bind interface explicitly, by name. Then there's no reason to go down that philosophical rabbit hole.

If you configured the site address with a domain (which is what most users do) then there's no IP interface that you can glean from that for bind.

That's what a DNS lookup does. There would be no reason to configure the site address with a domain or host name, unless the name would actually resolve to a functioning IP address. The more significant point is that caddy itself must then finally associate that IP address with an actual socket interface - yes? So just use the interface name. But then, the counter argument will be that a unix socket does not have an interface name in the same sense as a network socket. And the response to that is, then just allow a unix socket path to be used as the interface name.

And the response to all that is: Feature Request - maybe caddy could accept a network interface name as an additional type of value to the bind directive, and generally? And then we say, "Caddy can accept any of: an interface name, a domain name, an IP address, or a unix domain socket path, as a parameter to any of an admin or default_bind global option, a site address, or a bind directive." And that's a Caddy feature, not a bug. Don't you get that sort of "for free", programming in Go?

I think you're greatly overestimating how much users care about this. It's not as important as you think it is. A huge majority of users don't care to configure bind.

Ha! Yes, fair point, very likely. I can only say that it's important to me, and I really appreciate your taking the trouble to clarify. And hopefully a few useful pointers can make their way into the online documentation.

But it absolutely doesn't make sense in the Getting Started guide. It's very much a technical detail that most people don't care about.

It's easier - well, for some people - when "gotcha issue" warnings are easy to find. Of course, other people do not have to read about binding and interfaces and sockets, if they're not interested. Maybe a link could go into the side-bar menu? Don't make it hard to find.

Yep, the bind docs have a unix socket example. I don't really think we need to dive deeper on that topic than that, honestly. If someone needs it, they'll find it there in the docs.

That's not what I meant. The bind docs unix socket example is "To bind to a Unix domain socket at /run/caddy: bind unix//run/caddy", which is not an API example. By "example", I mean at https://caddyserver.com/docs/api, using a unix domain socket, in addition to the current curl "localhost:2019" examples. This would be things like:
sudo curl --unix-socket /run/caddy/admin.socket -H "host:" http://none/config/
sudo curl --unix-socket /run/caddy/admin.socket -H "Content-Type: text/caddyfile" --data-binary @/etc/caddy/Caddyfile -H "host:" http://none/load
and
sudo curl -X POST --unix-socket /run/caddy/admin.socket -H "host:" http://none/stop

That took a bit of trial and error. There should also be a note there about the http "host" header and its interaction with the admin directive option origins, which is not obvious.

The hostname "none" is just some arbitrary string, and curl will complain if there is nothing there:
curl: (3) URL using bad/illegal format or missing URL

Using the /config/ endpoint, if there is no origins option, and curl is run without the -H "host:" or -H "host;" or -H "Host;", then caddy will give:
ERROR admin.api request error {"error": "host not allowed: none", "status_code": 403}

But, if instead, the admin directive includes origins none then whether curl has no parameter -H "host:" or includes -H "host: none", then caddy instead gives:

ERROR   admin.api       request error   {"error": "host not allowed: none", "status_code": 403}
{"error":"host not allowed: none"}

Yet, with that origins none configuration, sudo caddy adapt --config /etc/caddy/Caddyfile still works, but sudo caddy reload --config /etc/caddy/Caddyfile gives:

ERROR   admin.api       request error   {"error": "host not allowed: ", "status_code": 403}
reload: sending configuration to instance: caddy responded with error: HTTP 403: {"error":"host not allowed: "}

It's confusing and seems finicky, and I probably don't understand what origns is doing, even after reading at https://caddyserver.com/docs/json/admin/origins/, https://caddyserver.com/docs/json/#admin/origins, and at https://caddyserver.com/docs/caddyfile/options under "admin".

Maybe origins doesn't make any sense when using a unix domain socket? Or, some caddy "host" variable is not being set properly when using a unix socket? Or, the error messages are misleading? Can you make sense of that?

@francislavoie
Copy link
Member

francislavoie commented Jul 13, 2022

That's what a DNS lookup does.

Except that the server's view of DNS does not necessarily match the client's. For example, the server might be in a private network which has a DNS resolver that resolves the domain to a private IP address, so that clients in the private network get a direct connection to the server, while clients from the internet see a public IP address.

This is not uncommon at all, lots of users who run Caddy in their home network do this, because their routers don't support NAT hairpinning so it's the only way they can get their LAN devices to connect to their self-hosted server.

Using DNS typically causes problems for more users than I think it would help -- we play with DNS resolvers for the DNS challenge, and some users have trouble with that, since they end up needing to configure an alternate resolver for the result to make sense, etc. Gets complicated real fast.

I think many users would be surprised if we automatically used what their domain as the bind host, and it would be very non-obvious what the problem would be if they have issues, because it would be pretty hidden to them unless they understand what commands to run to see what interfaces are in use. We're careful with what we do automatically. It's a balance between visibility/transparancy and "ease of use".

then just allow a unix socket path to be used as the interface name

We do. bind unix//run/caddy. But maybe I'm not understanding the point you're trying to make there.

And then we say, "Caddy can accept any of: an interface name, a domain name, an IP address, or a unix domain socket path, as a parameter to any of an admin or default_bind global option, a site address, or a bind directive." [...] Don't you get that sort of "for free", programming in Go?

Re interface names, it's not really "for free", cause you need to get the interface with net.InterfaceByName() and then grab the addresses from that, then make a TCPAddr from that, etc. So it would need some extra code to do it correctly. I'm also not sure what heuristic we would use to check if it's a domain vs an interface name... we'd probably have to try both or something, which seems not great.

Anyways, open to the idea but it needs some careful design. It's not that easy, I'm sure there will be some tricky implementation details to work out. FWIW I'm not that interested in working on that myself, I would get no benefit from it personally and I'm just volunteering my time for this project. So someone else would need to contribute it.

And hopefully a few useful pointers can make their way into the online documentation.

They have already, I added a note to the end of https://caddyserver.com/docs/caddyfile/concepts#addresses mentioning bind, and I updated the bind docs slightly to point out they accept network addresses (with a network type prefix). Changes in caddyserver/website@7819a84#diff-eac9734924e3116d4d8a5f737a1404629be816248f64f0514d9733b4629fb18a

[Admin API stuff]

I'll let @mholt follow up on that, he plays with that stuff more than I do

@thx1111
Copy link
Author

thx1111 commented Jul 14, 2022

I think many users would be surprised if we automatically used what their domain as the bind host, and it would be very non-obvious what the problem would be if they have issues, because it would be pretty hidden to them unless they understand what commands to run to see what interfaces are in use. We're careful with what we do automatically. It's a balance between visibility/transparancy and "ease of use".

Ah! Ok, thanks. It's complicated.

We do. bind unix//run/caddy. But maybe I'm not understanding the point you're trying to make there.

Right - that was in the bind docs. More like I was "thinking out loud" there. Other than raising the question about using interface names, I was just recognizing the generality of caddy. I don't know about other web servers, but I'm fascinated with its ability to serve to a local unix socket. I only just tried it.

http:// {
	bind unix//run/caddy/serve.html
	root * /home/http/htdocs/
	file_server browse
}

And then:

sudo curl --unix-socket /run/caddy/serve.html -H "host:" http://a | lynx --stdin

Ha! And this also works:

sudo curl --unix-socket /run/caddy/serve.html -H "host:" http://a | \
 dillo $(base64 -w0 | cat <(echo -n 'data:text/html;charset=UTF-8;base64,') -)

But that's just a one-shot, and hyperlinks will not work, of course. I haven't found a browser that will attach to a unix socket, though lots of people talk about it. At least one issue is that people don't agree on a URL format. My preference is based on the "socket path as port" approach, using ":" delimiters, as in "http://unix:/var/run/server/ht.socket:/path/to/resource.html". Here, the nonstandard "domain" part invokes a special handler, but the domain name seems obvious from the socket type, "Unix Domain Socket". I suppose that a similar approach could be used for an "Interface Name" URL, something like "http://interface:enp4s0:/path/to/resource.html". But then, there is no browser to accept a URL like that, and the interface might as well have an IP address anyway.

FWIW I'm not that interested in working on that myself, I would get no benefit from it personally and I'm just volunteering my time for this project. So someone else would need to contribute it.

It's not a big deal, once I learn how Caddy does things. But I have run into a few more small surprises, after adding explicit bind directives to the Caddyfile. Now, the json configuration looks very different. Initially, sudo caddy reload --config /etc/caddy/Caddyfile throws an error:

ERROR   admin.api       request error   {"error": "loading config: loading new config: http app module: start: tcp: listening on 10.0.0.2:80: listen tcp 10.0.0.2:80: bind: address already in use", "status_code": 400}
reload: sending configuration to instance: caddy responded with error: HTTP 400: {"error":"loading config: loading new config: http app module: start: tcp: listening on 10.0.0.2:80: listen tcp 10.0.0.2:80: bind: address already in use"}

and caddy has to be stopped and restarted, which is easy enough, just not expected.

With the explicit bind directive, access to 127.0.0.1 no longer dumps the phantom log to the console, which is good. But access to IP address 127.0.0.1 also returns nothing, which displays as a blank page, rather than returning ERR_CONNECTION_REFUSED or even a 404 response. By itself, this is not any different from configuring sites without the bind directive.

But I'm wondering again about this Caddy distinction between a "site address" and a "bind address". Creating an empty site block for 127.0.0.1 in the Caddyfile does not help here, and caddy will fail with:

run: loading initial config: loading new config: http app module: start: tcp: listening on localhost:80: listen tcp 127.0.0.1:80: bind: address already in use
start: caddy process exited with error: exit status 1

Even an explicit bind does not seem to help:

http://127.0.0.1 {
        bind 127.0.0.1
}

I'm confused by the bind directive documentation which says:

For example, if two sites on the same port resolve to 127.0.0.1 and only one of those sites is configured with bind 127.0.0.1, then only one site will be accessible since the other will bind to the port without a specific host; the OS will choose the more specific matching socket.

But caddy will not accept "two sites on the same port", and still fails with the same error:

run: loading initial config: loading new config: http app module: start: tcp: listening on localhost:80: listen tcp 127.0.0.1:80: bind: address already in use
start: caddy process exited with error: exit status 1

Assigning a different port to 127.0.0.1 does not help, since the browser is still accessing 127.0.0.1:80:

http://127.0.0.1:81 {
        bind 127.0.0.1
        error * "Not found" 404
}

And now I'm back to questioning the assertion about distinguishing a "site address" and a "bind address":

There's nothing that says that the Host header must match the address being connected to, if it's an IP address. Obviously this is contrived, but the point stands, they are different concepts and binding them together is not a good idea because it would limit what people could do with Caddy.

Caddy is not making a distinction between "localhost" and "127.0.0.1", since both "sites" are using the same port 80.

Maybe the description in the bind directive documentation needs clarification? Is there some other way to distinguish a "site address" and a "bind address", perhaps using some form of named matcher?

@mholt
Copy link
Member

mholt commented Sep 2, 2022

@thx1111 This is still on my list just so you know. Sorry; been really busy lately.

@thx1111
Copy link
Author

thx1111 commented Sep 2, 2022

@mholt Thanks Matt. While the Caddy configuration file tends to be finicky, I do have a working configuration now. Any improvements you are able to provide will add robustness and avoid confusing surprises for new users.

@francislavoie
Copy link
Member

francislavoie commented Feb 26, 2023

I'm going to close this. I'm not seeing anything actionable here.

Since this was last discussed, the skip_log directive was added #4691, so users can control in a more fine-grained fashion which requests have access logs written.

@francislavoie francislavoie closed this as not planned Won't fix, can't repro, duplicate, stale Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion 💬 The right solution needs to be found
Projects
None yet
Development

No branches or pull requests

3 participants