Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

send SNI for federation requests #3439

Merged
merged 13 commits into from
Aug 10, 2018

Conversation

vojeroen
Copy link
Contributor

@vojeroen vojeroen commented Jun 25, 2018

Since I need synapse to be available behind a reverse proxy, I added SNI support to outgoing federation requests. This should resolve #1491. As a side effect, server certificates for federation requests are now validated as well (and connections will fail if validation fails).

To bypass the server certificate validation in development environments, where these are mostly self-signed so validation will fail, I added an option to the config file to disable certificate validation. The bypass for certificate validation could be done better, but with the current twisted API (and unfortunately I'm not an expert on twisted), I couldn't find a better solution.

UPDATE: as discussed below, self-signed certificates will still work as expected by the federation API. The default behavior of twisted to validate SSL certificates has been bypassed.

Signed-off-by: Jeroen van Os vo.jeroen@gmail.com

@matrixbot
Copy link
Member

Can one of the admins verify this patch?

1 similar comment
@matrixbot
Copy link
Member

Can one of the admins verify this patch?

@vojeroen
Copy link
Contributor Author

This will probably also resolve part of #1691.

@vojeroen
Copy link
Contributor Author

The documentation still needs to be adapted. The current documentation on reverse-proxying the federation port states:

A self-signed SSL certificate is fine for federation, so there is no need to automate renewals.

This pull request would change this, requiring an SSL certificate signed by a valid CA in order to establish a connection. Is this acceptable? With Let's Encrypt, certificate issuing and renewing is free and easy. Requiring certificate validation also enhances security so to me it makes sense to enable this, but if you want to keep the policy from the documentation then certificate validation should be disabled.

@krombel
Copy link
Contributor

krombel commented Jun 25, 2018

The validation of the certificates is based on views. So the certificates will only be accepted if other known servers see the same fingerprint.

If this is aimed to be changed it needs a Spec Proposal as it would change the federation api (e. G. here).

So just adding SNI would be OK for this PR but requiring a valid signed certificate would need a spec change and would be too much for this one PR

@vojeroen
Copy link
Contributor Author

vojeroen commented Jun 25, 2018

Certificate validation is a standard feature of the twisted API (https://twistedmatrix.com/documents/18.4.0/api/twisted.internet.ssl.optionsForClientTLS.html), so it was added as a side effect to the support for SNI, which is included in the same API. As it is not compatible with the spec, I'll disable this feature, because my primary aim was to implement support for SNI.

Originally I added a configuration option to enable / disable certificate validation, but as validation should be disabled anyway I suppose there is no point in keeping it so I'll revert this change as well.

@ara4n
Copy link
Member

ara4n commented Jun 26, 2018

@vojeroen thanks for this! as @krombel says, we do application-layer certificate validation (which i'm hoping this preserves). we're currently in a bit of a security crunch but will try to review this asap; lots of people need SNI (especially anyone behind cloudflare who doesn't want to spend $$$ for a non-SNI listener)

@vojeroen
Copy link
Contributor Author

@ara4n I hope it checks out for you. I'm running this patch now on two servers (with CA-signed certificates and self-signed certificates) behind a reverse proxy that forwards to the correct server based on SNI, and it works fine.

When reviewing you'll notice that I used some private methods from the twisted library, but as the public API automatically does certificate validation using CA, I had to reimplement some functions.

Jeroen and others added 3 commits July 9, 2018 08:51
@richvdh richvdh requested a review from a team July 27, 2018 08:18
@richvdh
Copy link
Member

richvdh commented Jul 27, 2018

@matrixbot: test this please

@richvdh
Copy link
Member

richvdh commented Jul 27, 2018

@matrixbot: test this please

@krombel
Copy link
Contributor

krombel commented Jul 29, 2018

@vojeroen To fix the issue with isort just run isort -y -rc synapse

@vojeroen
Copy link
Contributor Author

thanks for the pointer, the issue with isort is fixed now

@richvdh
Copy link
Member

richvdh commented Aug 2, 2018

@matrixbot test this please

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modulo a few nits as mentioned below, this looks amazing - thank you for doing all this work.

It would also be helpful if you could update the description of the PR - in particular to note that it no longer validates certificates.

One thing I will say is that the interaction with the SSL libs has historically been a big tale of woe - see #2350 and the related https://twistedmatrix.com/trac/ticket/9210#comment:15 for some painful reading.

I'm going to try running this on my personal server for a bit to see what breaks.


def get_options(self, host):
return ClientTLSOptions(
unicode(host),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're in the process of porting to python 3, and this won't work. Maybe you want host.encode('ascii') ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientTLSOptions should get a unicode string, so the python 3 equivalent of unicode(host) would be str(host). If the same code should run in python 2 and 3, assigning str to unicode for python 3 is a straightforward solution. Is this OK for you?

if sys.version_info.major >= 3:
    unicode = str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I was being stupid here. Let's not go defining unicode though.

I think you actually want host.decode('ascii') ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you want to use decode('ascii'), otherwise I would suggest decode('utf-8') to keep it more generic?

Just quickly looking at synapse.http.endpoint.matrix_federation_endpoint (where ClientTLSOptionsFactory is actually used), it seems to me that this function will have more problems with bytes vs str when running in python 3 than just this one. Probably to be looked at in another PR, as this would require some more investigation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason you want to use decode('ascii'), otherwise I would suggest decode('utf-8') to keep it more generic?

Well, I think we're expecting a plain ascii hostname here. If it's not plain ascii, something is going wrong.

And yes, there are currently other python 3 problems in this area. @hawkowl is working on python 3 support...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you already merged this. I did the commit with utf-8, so if you prefer ascii it might be best to change it then.


logger = logging.getLogger(__name__)


class ServerContextFactory(ssl.ContextFactory):
class ServerContextFactory(ContextFactory):
"""Factory for PyOpenSSL SSL contexts that are used to handle incoming
connections and to make connections to remote servers."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I right in thinking this is no longer used for making outbound connections? Could you update the docstring if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, I'll update it.

def __init__(self, hostname, ctx):
self._ctx = ctx
self._hostname = hostname
self._hostnameBytes = _idnaBytes(hostname)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to copy _idnaBytes and _tolerateErrors (both of which seem to be trivial) rather than using the private impls from twisted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not best practice to use private implementations. But, if I just copy _tolerateErrors, the pep8 test fails as it contains a bare "except:" statement. I'm not familiar enough with twisted / openssl to know which errors this try/except clause should catch. I can change this to "except Exception:" to both catch all errors and pass the pep8 test, but that doesn't seem right either. However it is the easiest solution. What do you think?

This is the _tolerateErrors code:

def _tolerateErrors(wrapped):
    def infoCallback(connection, where, ret):
        try:
            return wrapped(connection, where, ret)
        except:
            f = Failure()
            logger.exception("Error during info_callback")
            connection.get_app_data().failVerification(f)
    return infoCallback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can write # noqa: E722 to disable the pep8 test. See https://github.com/matrix-org/synapse/blob/develop/synapse/handlers/message.py#L583 for an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an even easier solution. I'll add this comment.

@richvdh
Copy link
Member

richvdh commented Aug 8, 2018

(btw it's been running fine for me on my HS, so let's get the niggles cleaned up and we can merge this \o/)

@vojeroen
Copy link
Contributor Author

vojeroen commented Aug 9, 2018

I'm glad it's running fine on your HS. I haven't encountered problems yet either when running this on mine. Let me know if you have any remarks after these commits.

@richvdh
Copy link
Member

richvdh commented Aug 10, 2018

@matrixbot: test this please

@richvdh
Copy link
Member

richvdh commented Aug 10, 2018

ok, let's ship this :) thank you!

@richvdh richvdh merged commit 3c0213a into matrix-org:develop Aug 10, 2018
@richvdh
Copy link
Member

richvdh commented Aug 10, 2018

next challenge: add a test to sytest to check that it doesn't regres...

richvdh added a commit that referenced this pull request Aug 22, 2018
Features
--------

- Add support for the SNI extension to federation TLS connections. Thanks to @vojeroen! ([\#3439](#3439))
- Add /_media/r0/config ([\#3184](#3184))
- speed up /members API and add `at` and `membership` params as per MSC1227 ([\#3568](#3568))
- implement `summary` block in /sync response as per MSC688 ([\#3574](#3574))
- Add lazy-loading support to /messages as per MSC1227 ([\#3589](#3589))
- Add ability to limit number of monthly active users on the server ([\#3633](#3633))
- Support more federation endpoints on workers ([\#3653](#3653))
- Basic support for room versioning ([\#3654](#3654))
- Ability to disable client/server Synapse via conf toggle ([\#3655](#3655))
- Ability to whitelist specific threepids against monthly active user limiting ([\#3662](#3662))
- Add some metrics for the appservice and federation event sending loops ([\#3664](#3664))
- Where server is disabled, block ability for locked out users to read new messages ([\#3670](#3670))
- set admin uri via config, to be used in error messages where the user should contact the administrator ([\#3687](#3687))
- Synapse's presence functionality can now be disabled with the "use_presence" configuration option. ([\#3694](#3694))
- For resource limit blocked users, prevent writing into rooms ([\#3708](#3708))

Bugfixes
--------

- Fix occasional glitches in the synapse_event_persisted_position metric ([\#3658](#3658))
- Fix bug on deleting 3pid when using identity servers that don't support unbind API ([\#3661](#3661))
- Make the tests pass on Twisted < 18.7.0 ([\#3676](#3676))
- Don’t ship recaptcha_ajax.js, use it directly from Google ([\#3677](#3677))
- Fixes test_reap_monthly_active_users so it passes under postgres ([\#3681](#3681))
- Fix mau blocking calulation bug on login ([\#3689](#3689))
- Fix missing yield in synapse.storage.monthly_active_users.initialise_reserved_users ([\#3692](#3692))
- Improve HTTP request logging to include all requests ([\#3700](#3700))
- Avoid timing out requests while we are streaming back the response ([\#3701](#3701))
- Support more federation endpoints on workers ([\#3705](#3705), [\#3713](#3713))
- Fix "Starting db txn 'get_all_updated_receipts' from sentinel context" warning ([\#3710](#3710))
- Fix bug where `state_cache` cache factor ignored environment variables ([\#3719](#3719))
- Fix bug in v0.33.3rc1 which caused infinite loops and OOMs ([\#3723](#3723))
- Fix bug introduced in v0.33.3rc1 which made the ToS give a 500 error ([\#3732](#3732))

Deprecations and Removals
-------------------------

- The Shared-Secret registration method of the legacy v1/register REST endpoint has been removed. For a replacement, please see [the admin/register API documentation](https://github.com/matrix-org/synapse/blob/master/docs/admin_api/register_api.rst). ([\#3703](#3703))

Internal Changes
----------------

- The test suite now can run under PostgreSQL. ([\#3423](#3423))
- Refactor HTTP replication endpoints to reduce code duplication ([\#3632](#3632))
- Tests now correctly execute on Python 3. ([\#3647](#3647))
- Sytests can now be run inside a Docker container. ([\#3660](#3660))
- Port over enough to Python 3 to allow the sytests to start. ([\#3668](#3668))
- Update docker base image from alpine 3.7 to 3.8. ([\#3669](#3669))
- Rename synapse.util.async to synapse.util.async_helpers to mitigate async becoming a keyword on Python 3.7. ([\#3678](#3678))
- Synapse's tests are now formatted with the black autoformatter. ([\#3679](#3679))
- Implemented a new testing base class to reduce test boilerplate. ([\#3684](#3684))
- Rename MAU prometheus metrics ([\#3690](#3690))
- add new error type ResourceLimit ([\#3707](#3707))
- Logcontexts for replication command handlers ([\#3709](#3709))
- Update admin register API documentation to reference a real user ID. ([\#3712](#3712))
@vojeroen
Copy link
Contributor Author

I'm afraid I can't really help extending sytest, I'm not familiar with perl.

richvdh added a commit that referenced this pull request Sep 6, 2018
This probably needs to wait for #3439
to land to avoid conflicting with it.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants