Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Reverse proxy documentation issues with nginx example #9579

Closed
davehayes opened this issue Mar 10, 2021 · 11 comments
Closed

Reverse proxy documentation issues with nginx example #9579

davehayes opened this issue Mar 10, 2021 · 11 comments
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@davehayes
Copy link

Description

I have two issues with your reverse proxy documentation.

First, it suggests leaving out important nginx configuration variables like root. Without a root configuration, I believe it's a compiled in default. With the location block you have in the example, if someone blindly pastes that (I know that they shouldn't) they will expose whatever compiled in default is there to the internet. I suggest you add something like

root: /some/empty/directory

so people know that any other https accesses to this server block should return a 404.

Next, the blurb at the bottom of this document was unknown to me. Perhaps your examples should include a way to filter the unwanted URL acesss, e.g.:

location /_synapse/_admin {
     deny all;
     allow <authorized ip>;
     ...rest of proxy variables...
}

These are fine points to be sure, but I suspect new homeserver admins would appreciate seeing these changes. Thank you for reading.

@anoadragon453
Copy link
Member

By default the regex for the nginx example (^(\/_matrix|\/_synapse\/client)) would not forward /_synapse/admin traffic, so that shouldn't be an issue.

Looking at the docs for the root directive, it looks as if root html; is the default. I'm not quite sure what that does, but if it just sends traffic to a directory called html/, which is unlikely to exist then that's probably fine? The autoindex directive, which is what allows directory listing, is also off by default.

@davehayes
Copy link
Author

I have seen some packages provide a default in some cases, mostly so people know they got their server working. In my opinion, for security purposes and in general, being explicit is better than relying on defaults. It's not a big deal, but it is a fine point.

@erikjohnston erikjohnston added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. A-Docs things relating to the documentation labels Mar 18, 2021
@Ryonez
Copy link

Ryonez commented Dec 27, 2021

By default the regex for the nginx example (^(\/_matrix|\/_synapse\/client)) would not forward /_synapse/admin traffic, so that shouldn't be an issue.

Looking at the docs for the root directive, it looks as if root html; is the default. I'm not quite sure what that does, but if it just sends traffic to a directory called html/, which is unlikely to exist then that's probably fine? The autoindex directive, which is what allows directory listing, is also off by default.

There are cases that the regex is not a viable solution. For example, I use matrix-media-repo. This required redirecting media traffic, which is imposable with the regex (regex gets priority with location matching), so I can not use it . Documentation for staying safe without regex would be a good idea.

I'd like to note as well that the regex looks like it would forward that with what little I know. URLs starting with /_matrix, /_synapse and /client are forwarded, and given that /_synapse/admin would seem like it'd match the /_synapse section, I would've thought that would have gone through. Keep in mind I have limited knowledge of regex, I have to look things up whenever I use it, and given that extended /_matrix and /client URLs work, there's no reason I would think /_synapse/admin isn't as well. And seeing as there wasn't any mention that not exposing the admin interface is the recommend thing to do on the reverse proxy page, I had absolutely no reason to question that thought line.

@bitcoinlizard
Copy link

I agree with @davehayes. In my experience going to https://matrix.example.com didn't load the "It works! Synapse is running" page without modifying the regex in the provided nginx reverse proxy example. I could only access that page with https://matrix.example.com/_matrix/static." The problem was as @davehayes described, nginx went to a system directory that was empty. I had to modify the regex as follows:

original:
location ~* ^(\/_matrix|\/_synapse\/client)

modified:
location ~* ^(\/_matrix|\/_synapse\/client|/)

Then the "It works! Synapse is running" page loaded correctly with the redirect to https://matrix.example.com/_matrix/static.

This exposes the admin endpoint so I had to do similar to what @davehayes suggested above with the deny for /_synapse/_admin

It would be good to update the example as most people will blindly copy and paste this code. I checked a number of Synapse servers for users that I suspected were running their own instances of Synapse and the vast majority exposed the admin endpoint, most likely without knowing.

@Ryonez
Copy link

Ryonez commented Dec 27, 2021

didn't load the "It works! Synapse is running" page without modifying the regex

Wait, that doesn't even work by default? I didn't notice...

This regex is a backbox to users, and that's not good.

@bitcoinlizard
Copy link

It didn't work for me with the regex statement on Arch Linux without the above noted modification but many other Synapse instances had the redirect working correctly so I sort of suspect it is something specific to the Arch Linux Nginx configuration.

@richvdh
Copy link
Member

richvdh commented Jan 4, 2022

for reference, the documentation in question is here: https://matrix-org.github.io/synapse/latest/reverse_proxy.html#nginx, and it recommends:

    location ~* ^(\/_matrix|\/_synapse\/client) {

this is a relatively simple regular expression, matching anything starting with either /_matrix or /_synapse/client, which are the paths we need to forward to synapse. Each of the examples for other reverse-proxies forward the same set of paths.

The reason we do this is because we cannot assume that synapse is the only thing running on that host (and likewise, that is the reason that we don't redirect the root to the "it works!" page by default). If you're not running anything else on the host, then feel free to forward everything. (Aside: location ~* ^(\/_matrix|\/_synapse\/client|/) is equivalent to location /.) But we're not trying to teach everyone everything about running a reverse-proxy here - we're just trying to give an example that will work in the common case.

What I do wonder is why it is ~* rather than ~ (~* means a case-insensitive match), and why the / characters are escaped with \. I have corrected this in #11680.

I'm going to close this because afaict it's working as intended.

@richvdh richvdh closed this as completed Jan 4, 2022
@davehayes
Copy link
Author

What I do wonder is why it is ~* rather than ~ (~* means a case-insensitive match), and why the / characters are escaped with \. I have corrected this in #11680.

This is pure safety.

We use ~* so that mixed case URLs (if they somehow get to the location block) behave the same as lower case ones. Consider the URL /_sYnApSe/_aDmIn/ for example.

The escape is because perl regular expressions typically use / as a delimiter and it is safer to be more explicit. There's a chance this is not strictly needed, but better to be safe than sorry eh? :)

@richvdh
Copy link
Member

richvdh commented Jan 4, 2022

We use ~* so that mixed case URLs (if they somehow get to the location block) behave the same as lower case ones. Consider the URL /_sYnApSe/_aDmIn/ for example.

Since synapse will (should) 404 such a URL, it's hard to see how this is beneficial; rather it adds complexity for nginx.

but better to be safe than sorry eh? :)

Given the complaints above about the regular expression being a black box that people don't understand, no, I don't think it is. It makes it harder to read.

@davehayes
Copy link
Author

davehayes commented Jan 4, 2022

We use ~* so that mixed case URLs (if they somehow get to the location block) behave the same as lower case ones. Consider the URL /_sYnApSe/_aDmIn/ for example.

Since synapse will (should) 404 such a URL, it's hard to see how this is beneficial; rather it adds complexity for nginx.

Is there a rigorous way to calculate the incremental complexity of adding a "*"?

Regardless, should synapse somehow break the behavior that you imply exists, this single * covers that base. If in the future, someone decides that synapse should handle mixed case URLs, a single * still covers that base.

I'm pretty sure, were we to actually agree on a measurement of complexity, that the complexity in finding out that synapse rejects mixed or upper case URLs with 404 would be much greater than the addition of a "*" which most people configuring nginx as a reverse proxy should know means "case insensitive". I'd say Nginx's behavior is easily searchable using a search engine...would you agree that it is easier than searching for the implied behavior of synapse with regard to mixed case URLs?

but better to be safe than sorry eh? :)

Given the complaints above about the regular expression being a black box that people don't understand, no, I don't think it is. It makes it harder to read.

Given that people paste things indiscriminately, while it might be slightly harder to read, I believe it's less secure. Your mileage may vary and all that.

@Ryonez
Copy link

Ryonez commented Jan 4, 2022

Given that people paste things indiscriminately, while it might be slightly harder to read, I believe it's less secure. Your mileage may vary and all that.

As someone with some experience dealing with ngnix with what I self host, that regex was something new to me, and I only figured it out by starting with what you'd given and sorting the issues I found as I went through things. It's only been through the discussion here that people have talked enough about it that I can see what's going on. Not using the regex in my use case in the end, as regex breaks the location matching I need.

Assuming the users skill level is either know pretty much everything or just copy paste is a not great way at looking at things. While explaining everything nginx isn't something I think is needed, I think going through what's been provided is a good idea, as well as providing more than one example in this case, as that regex does make certain things impossible.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Docs things relating to the documentation T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

6 participants