Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Ratelimit cross user room key share requests. #8957

Merged
merged 18 commits into from
Feb 19, 2021

Conversation

clokep
Copy link
Member

@clokep clokep commented Dec 16, 2020

This is an attempt to fix #8677.

A couple of open questions:

  • The current rate limit configuration numbers are arbitrary. They need more thought. (OK, this wasn't a question...)

@clokep clokep force-pushed the clokep/key-share-rate-limit branch from 1797e64 to 142ee5a Compare December 16, 2020 19:36
@clokep clokep requested a review from a team December 16, 2020 19:39
@clokep
Copy link
Member Author

clokep commented Dec 16, 2020

There's some open questions here so requesting review on it, although I'm not quite sure it is ready for merging.

@clokep clokep marked this pull request as draft December 16, 2020 19:40
@clokep clokep removed the request for review from a team December 18, 2020 13:41
@clokep clokep requested a review from erikjohnston December 18, 2020 13:57
@clokep
Copy link
Member Author

clokep commented Dec 18, 2020

@erikjohnston I've made some updates, let me know if this looks better. It is much closer to #8675 now.

Copy link
Member

@erikjohnston erikjohnston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this looks good! I should say that I do think it would be good to rate limit EDUs more broadly, but it's a bit complicated to work out the correct strategy for the various types.

synapse/api/constants.py Show resolved Hide resolved
synapse/handlers/devicemessage.py Outdated Show resolved Hide resolved
docs/sample_config.yaml Outdated Show resolved Hide resolved
@clokep clokep force-pushed the clokep/key-share-rate-limit branch from a4568fb to d36c7b0 Compare December 18, 2020 20:34
@clokep clokep changed the title Ratelimit incoming EDUs over federation. Ratelimit key sharing requests. Dec 21, 2020
@clokep
Copy link
Member Author

clokep commented Dec 21, 2020

I think the failing builds are due to not updating the sytest config. We should settle on reasonable defaults first before updating it.

@clokep
Copy link
Member Author

clokep commented Jan 15, 2021

I think the failing builds are due to not updating the sytest config. We should settle on reasonable defaults first before updating it.

@erikjohnston Any idea of how to get this? Is there something in grafana we can check?

@erikjohnston
Copy link
Member

I think the failing builds are due to not updating the sytest config. We should settle on reasonable defaults first before updating it.

@erikjohnston Any idea of how to get this? Is there something in grafana we can check?

I think I'd probably look in the logs (or DB) to see what the usual pattern was. I'm OK with us having quite a high burst count, as the major issue was when we sent tonnes.

@clokep
Copy link
Member Author

clokep commented Jan 19, 2021

I think I'd probably look in the logs (or DB) to see what the usual pattern was. I'm OK with us having quite a high burst count, as the major issue was when we sent tonnes.

Looking at the logs it does not seem uncommon for a client to send several hundred of these requests per second (sometimes sustained for a few seconds). There are also instances of several thousands sends per second (again sometimes sustained for for a few seconds).

So maybe something like a 50 per second, burst count of 500? I'm not sure if that sounds too high.

@erikjohnston
Copy link
Member

I think I'd probably look in the logs (or DB) to see what the usual pattern was. I'm OK with us having quite a high burst count, as the major issue was when we sent tonnes.

Looking at the logs it does not seem uncommon for a client to send several hundred of these requests per second (sometimes sustained for a few seconds). There are also instances of several thousands sends per second (again sometimes sustained for for a few seconds).

So maybe something like a 50 per second, burst count of 500? I'm not sure if that sounds too high.

Err, wow. What's happening there? I thought you should only send key share requests to your own devices? Are these accounts with huge numbers of devices?

@clokep
Copy link
Member Author

clokep commented Jan 19, 2021

Oh wait, those numbers are all sendToDevice calls, not just room key sharing requests. Let me take a closer look!

@clokep
Copy link
Member Author

clokep commented Jan 19, 2021

Oh wait, those numbers are all sendToDevice calls, not just room key sharing requests. Let me take a closer look!

Looking just at the m.room_key_request (which is what gets rate limited via this PR), it seems that users might burst to 200 - 250 requests, there are some instances of this being sustained for several seconds.

What were you thinking of as a "high" burst count?

@clokep clokep requested a review from erikjohnston January 28, 2021 16:14
@clokep
Copy link
Member Author

clokep commented Jan 28, 2021

@erikjohnston I'm not really sure where to go with this one -- I can share the data I have with you though and see what you think?

@erikjohnston
Copy link
Member

Ugh, I really don't know TBH. I thought that cross user room key requests should be relatively rare, so I have no idea if a high burst count makes sense. OTOH, they shouldn't be toooo important so getting the value wrong isn't the end of the world, plus we only really need to protect from out of control clients.

I think next steps are:

  1. Ask the encryption and client devs what they expect to be going on here; and then
  2. Pick a burst count bigger than that, even if it's a couple of hundred.

@clokep clokep changed the title Ratelimit key sharing requests. Ratelimit cross user room key share requests. Feb 8, 2021
@clokep
Copy link
Member Author

clokep commented Feb 11, 2021

I had a discussion with @callahad where he suggested lowering these to a level we're comfortable with since it is better than the status-quo of dropping them.

@clokep clokep marked this pull request as ready for review February 11, 2021 12:58
@clokep clokep force-pushed the clokep/key-share-rate-limit branch from e363c53 to 4f72389 Compare February 11, 2021 15:35
@clokep clokep requested a review from a team February 17, 2021 18:42
synapse/config/ratelimiting.py Outdated Show resolved Hide resolved
synapse/handlers/devicemessage.py Outdated Show resolved Hide resolved
changelog.d/8957.feature Outdated Show resolved Hide resolved
@clokep clokep merged commit fc8b3d8 into develop Feb 19, 2021
@clokep clokep deleted the clokep/key-share-rate-limit branch February 19, 2021 18:20
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Mar 14, 2021
Synapse 1.29.0 (2021-03-08)
===========================

Note that synapse now expects an `X-Forwarded-Proto` header when used with a reverse proxy. Please see [UPGRADE.rst](UPGRADE.rst#upgrading-to-v1290) for more details on this change.


No significant changes.


Synapse 1.29.0rc1 (2021-03-04)
==============================

Features
--------

- Add rate limiters to cross-user key sharing requests. ([\#8957](matrix-org/synapse#8957))
- Add `order_by` to the admin API `GET /_synapse/admin/v1/users/<user_id>/media`. Contributed by @dklimpel. ([\#8978](matrix-org/synapse#8978))
- Add some configuration settings to make users' profile data more private. ([\#9203](matrix-org/synapse#9203))
- The `no_proxy` and `NO_PROXY` environment variables are now respected in proxied HTTP clients with the lowercase form taking precedence if both are present. Additionally, the lowercase `https_proxy` environment variable is now respected in proxied HTTP clients on top of existing support for the uppercase `HTTPS_PROXY` form and takes precedence if both are present. Contributed by Timothy Leung. ([\#9372](matrix-org/synapse#9372))
- Add a configuration option, `user_directory.prefer_local_users`, which when enabled will make it more likely for users on the same server as you to appear above other users. ([\#9383](matrix-org/synapse#9383), [\#9385](matrix-org/synapse#9385))
- Add support for regenerating thumbnails if they have been deleted but the original image is still stored. ([\#9438](matrix-org/synapse#9438))
- Add support for `X-Forwarded-Proto` header when using a reverse proxy. ([\#9472](matrix-org/synapse#9472), [\#9501](matrix-org/synapse#9501), [\#9512](matrix-org/synapse#9512), [\#9539](matrix-org/synapse#9539))


Bugfixes
--------

- Fix a bug where users' pushers were not all deleted when they deactivated their account. ([\#9285](matrix-org/synapse#9285), [\#9516](matrix-org/synapse#9516))
- Fix a bug where a lot of unnecessary presence updates were sent when joining a room. ([\#9402](matrix-org/synapse#9402))
- Fix a bug that caused multiple calls to the experimental `shared_rooms` endpoint to return stale results. ([\#9416](matrix-org/synapse#9416))
- Fix a bug in single sign-on which could cause a "No session cookie found" error. ([\#9436](matrix-org/synapse#9436))
- Fix bug introduced in v1.27.0 where allowing a user to choose their own username when logging in via single sign-on did not work unless an `idp_icon` was defined. ([\#9440](matrix-org/synapse#9440))
- Fix a bug introduced in v1.26.0 where some sequences were not properly configured when running `synapse_port_db`. ([\#9449](matrix-org/synapse#9449))
- Fix deleting pushers when using sharded pushers. ([\#9465](matrix-org/synapse#9465), [\#9466](matrix-org/synapse#9466), [\#9479](matrix-org/synapse#9479), [\#9536](matrix-org/synapse#9536))
- Fix missing startup checks for the consistency of certain PostgreSQL sequences. ([\#9470](matrix-org/synapse#9470))
- Fix a long-standing bug where the media repository could leak file descriptors while previewing media. ([\#9497](matrix-org/synapse#9497))
- Properly purge the event chain cover index when purging history. ([\#9498](matrix-org/synapse#9498))
- Fix missing chain cover index due to a schema delta not being applied correctly. Only affected servers that ran development versions. ([\#9503](matrix-org/synapse#9503))
- Fix a bug introduced in v1.25.0 where `/_synapse/admin/join/` would fail when given a room alias. ([\#9506](matrix-org/synapse#9506))
- Prevent presence background jobs from running when presence is disabled. ([\#9530](matrix-org/synapse#9530))
- Fix rare edge case that caused a background update to fail if the server had rejected an event that had duplicate auth events. ([\#9537](matrix-org/synapse#9537))


Improved Documentation
----------------------

- Update the example systemd config to propagate reloads to individual units. ([\#9463](matrix-org/synapse#9463))


Internal Changes
----------------

- Add documentation and type hints to `parse_duration`. ([\#9432](matrix-org/synapse#9432))
- Remove vestiges of `uploads_path` configuration setting. ([\#9462](matrix-org/synapse#9462))
- Add a comment about systemd-python. ([\#9464](matrix-org/synapse#9464))
- Test that we require validated email for email pushers. ([\#9496](matrix-org/synapse#9496))
- Allow python to generate bytecode for synapse. ([\#9502](matrix-org/synapse#9502))
- Fix incorrect type hints. ([\#9515](matrix-org/synapse#9515), [\#9518](matrix-org/synapse#9518))
- Add type hints to device and event report admin API. ([\#9519](matrix-org/synapse#9519))
- Add type hints to user admin API. ([\#9521](matrix-org/synapse#9521))
- Bump the versions of mypy and mypy-zope used for static type checking. ([\#9529](matrix-org/synapse#9529))
anoadragon453 added a commit that referenced this pull request Mar 23, 2021
)" (#9668)

We patched `matrix-org-hotfixes` a little while ago in #8675 to drop any cross-user key share requests while they were being accidentally spammed by a client. This was a temporary fix until we had some rate-limiting in place.

Rate-limiting landed in #8957. Note that the rate-limit can't be configured, but has what appear to be [sensible defaults](https://github.com/matrix-org/synapse/blob/db2efa9c50569adbfab102b1f447f5a8312b95f3/synapse/config/ratelimiting.py#L105-L113).

Note that the original patch was already actually overridden partially when the rate-limit PR landed, as they conflicted. So we've already lifted the restriction between local devices on matrix.org, but requests were still blocked from being sent over federation. This PR cleans up the remaining bits.

This reverts commit d60af93.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rate limit key share requests
2 participants