Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Performance improvements and refactor of Ratelimiter #7595

Merged
merged 33 commits into from
Jun 5, 2020

Conversation

anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented May 28, 2020

While working on #5665 I found myself digging into the Ratelimiter class and seeing that it was both:

  • Rather undocumented, and
  • causing a lot of config checks

This PR attempts to refactor and comment the Ratelimiter class, as well as encourage config file accesses to only be done at instantiation.

Best to be reviewed commit-by-commit.

@anoadragon453 anoadragon453 requested a review from a team May 28, 2020 22:19
@clokep clokep self-assigned this May 29, 2020
clokep
clokep previously requested changes May 29, 2020
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems OK-ish, but I'm not convinced that moving the rate limiters to the homeserver object is an improvement, it seems to just move them farther from where they're used. Wouldn't we get the same benefit by instantiating them with the config values in the each handler's __init__? Is there another benefit I'm missing.

Instantiating the rate limiters with their config is much cleaner, but it is weird that we then touch internals of the rate limiters in one case, I think that needs a bit more of a think.

synapse/api/ratelimiting.py Outdated Show resolved Hide resolved
synapse/api/ratelimiting.py Outdated Show resolved Hide resolved
tests/rest/client/v1/test_rooms.py Outdated Show resolved Hide resolved
synapse/handlers/auth.py Outdated Show resolved Hide resolved
synapse/handlers/_base.py Outdated Show resolved Hide resolved
synapse/handlers/_base.py Outdated Show resolved Hide resolved
@anoadragon453
Copy link
Member Author

anoadragon453 commented May 29, 2020

Overall this seems OK-ish, but I'm not convinced that moving the rate limiters to the homeserver object is an improvement, it seems to just move them farther from where they're used.

The benefit was supposed to be that even if it was used in multiple places in Synapse, the tests could hook in at a single point and modify it, however I now realize that:

  1. By splitting the ratelimiters out, each one is only ever pulled from a single place in Synapse (sans the tests)
  2. The tests should probably just modify the config file instead.

Edit: Ah, actually the registration ratelimiter is used in two different files. I could leave it in server or just put it in one and pull it from the other...

Edit 2:

The tests should probably just modify the config file instead.

Actually, it's a little dirty to set rate_hz and burst_count to 99999999 in the tests, so mocking the methods is probably still the cleanest way to go.

Other than the registration ratelimiter, as that's used in multiple classes and
needs to keep the same state
@anoadragon453 anoadragon453 force-pushed the anoa/ratelimit_config_perf branch 3 times, most recently from d692e1a to 301965d Compare June 1, 2020 18:13
@anoadragon453
Copy link
Member Author

Right, so in short, in this PR we:

  • Refactor Ratelimiter class to allow setting clock, default rate_hz and burst_count on object instantiation, while still keeping the ability to override per request. Nothing else should've changed here functionality-wise.
  • Move most of the instantiation of Ratelimiter into the classes that actually used them. The registration ratelimiter remains in HomeServer as it's used across multiple classes.
  • Change up the tests a bit to patch Ratelimiter in some areas instead of mocking it. This is easier than grabbing each individual ratelimiter and mocking its member functions, and the tests in those classes don't test for ratelimit stuff anywho.
  • For the tests that do care about ratelimiting, those tests have been modified to set config variables instead of directly modifying runtime objects.

This PR has the advantage of:

  • Making the Ratelimiter class easier to maintain.
  • Making ratelimiter calling code simpler.
  • Making less config calls.

@anoadragon453
Copy link
Member Author

@clokep Should hopefully be reviewable by commit from where you last left off. Excuse the cruft at the end. Let me know if I could break things down further in any way that would make review easier.

changelog.d/7595.misc Outdated Show resolved Hide resolved
synapse/api/ratelimiting.py Outdated Show resolved Hide resolved
@anoadragon453 anoadragon453 requested a review from clokep June 2, 2020 18:44
# This does not apply
# XXX: Why is this -1? This seems to only be used in
# self.ratelimit. I guess so that clients get a time in the past and don't
# feel afraid to try again immediately
Copy link
Member Author

@anoadragon453 anoadragon453 Jun 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm tempted to resolve this, but I don't want to change the behaviour of Ratelimiter in this PR any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this sounds like a follow-up!

@clokep clokep force-pushed the anoa/ratelimit_config_perf branch from 1d58e90 to 12b4d47 Compare June 4, 2020 16:14
@clokep
Copy link
Member

clokep commented Jun 4, 2020

@anoadragon453 I pushed a couple of commits c145c81 and 12b4d47 that clean-up the tests a bit. Turns out we don't seem to need those ratelimiters defined there? Let me know if you think it looks wrong though.

Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! I asked for some minor tweaks to the tests to make them clearer.

synapse/api/ratelimiting.py Outdated Show resolved Hide resolved
self.message_counts = (
OrderedDict()
) # type: OrderedDict[Any, Tuple[float, int, Optional[float]]]
def __init__(self, clock: Clock, rate_hz: float, burst_count: int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated two things here:

  • Making rate_hz / burst_count optional so we don't have the case where we define them as 0 to override them each time.
  • Having two rate limiters (with one being a sub-class of the other) for the two situations (per key limit vs. per rate limiter limit).

I think these are both just bikeshedding at this point though and it isn't worth poking at more. I just wanted to write them down!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat ideas, will keep them in mind if we decide to do another refactor at some point. Thanks!

tests/api/test_ratelimiting.py Outdated Show resolved Hide resolved
Comment on lines 59 to 66
# We expect a LimitExceededError to be raised
try:
limiter.ratelimit(("test_id",), _time_now_s=time_now, update=False)

# We shouldn't reach here
self.assertTrue(False, "LimitExceededError was not raised")
except LimitExceededError as e:
self.assertEquals(e.retry_after_ms / 1000, expected_allowed - time_now)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nicer using the context manager style:

with self.assertRaises(LimiteExceedError) as e:
    limiter.rateLimit(...)

# Use e below
self.assertEquals(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was not aware this was supported by self.assertRaises. Thanks for the tip!

tests/api/test_ratelimiting.py Outdated Show resolved Hide resolved
@anoadragon453 anoadragon453 requested a review from clokep June 4, 2020 18:54
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good. 🤞

@anoadragon453 anoadragon453 merged commit f4e6495 into develop Jun 5, 2020
@anoadragon453 anoadragon453 deleted the anoa/ratelimit_config_perf branch June 5, 2020 09:47
babolivier added a commit that referenced this pull request Jun 10, 2020
…rg-hotfixes

Synapse 1.15.0rc1 (2020-06-09)
==============================

Features
--------

- Advertise support for Client-Server API r0.6.0 and remove related unstable feature flags. ([\#6585](#6585))
- Add an option to disable autojoining rooms for guest accounts. ([\#6637](#6637))
- For SAML authentication, add the ability to pass email addresses to be added to new users' accounts via SAML attributes. Contributed by Christopher Cooper. ([\#7385](#7385))
- Add admin APIs to allow server admins to manage users' devices. Contributed by @dklimpel. ([\#7481](#7481))
- Add support for generating thumbnails for WebP images. Previously, users would see an empty box instead of preview image. ([\#7586](#7586))
- Support the standardized `m.login.sso` user-interactive authentication flow. ([\#7630](#7630))

Bugfixes
--------

- Allow new users to be registered via the admin API even if the monthly active user limit has been reached. Contributed by @dkimpel. ([\#7263](#7263))
- Fix email notifications not being enabled for new users when created via the Admin API. ([\#7267](#7267))
- Fix str placeholders in an instance of `PrepareDatabaseException`. Introduced in Synapse v1.8.0. ([\#7575](#7575))
- Fix a bug in automatic user creation during first time login with `m.login.jwt`. Regression in v1.6.0. Contributed by @olof. ([\#7585](#7585))
- Fix a bug causing the cross-signing keys to be ignored when resyncing a device list. ([\#7594](#7594))
- Fix metrics failing when there is a large number of active background processes. ([\#7597](#7597))
- Fix bug where returning rooms for a group would fail if it included a room that the server was not in. ([\#7599](#7599))
- Fix duplicate key violation when persisting read markers. ([\#7607](#7607))
- Prevent an entire iteration of the device list resync loop from failing if one server responds with a malformed result. ([\#7609](#7609))
- Fix exceptions when fetching events from a remote host fails. ([\#7622](#7622))
- Make `synctl restart` start synapse if it wasn't running. ([\#7624](#7624))
- Pass device information through to the login endpoint when using the login fallback. ([\#7629](#7629))
- Advertise the `m.login.token` login flow when OpenID Connect is enabled. ([\#7631](#7631))
- Fix bug in account data replication stream. ([\#7656](#7656))

Improved Documentation
----------------------

- Update the OpenBSD installation instructions. ([\#7587](#7587))
- Advertise Python 3.8 support in `setup.py`. ([\#7602](#7602))
- Add a link to `#synapse:matrix.org` in the troubleshooting section of the README. ([\#7603](#7603))
- Clarifications to the admin api documentation. ([\#7647](#7647))

Internal Changes
----------------

- Convert the identity handler to async/await. ([\#7561](#7561))
- Improve query performance for fetching state from a PostgreSQL database. ([\#7567](#7567))
- Speed up processing of federation stream RDATA rows. ([\#7584](#7584))
- Add comment to systemd example to show postgresql dependency. ([\#7591](#7591))
- Refactor `Ratelimiter` to limit the amount of expensive config value accesses. ([\#7595](#7595))
- Convert groups handlers to async/await. ([\#7600](#7600))
- Clean up exception handling in `SAML2ResponseResource`. ([\#7614](#7614))
- Check that all asynchronous tasks succeed and general cleanup of `MonthlyActiveUsersTestCase` and `TestMauLimit`. ([\#7619](#7619))
- Convert `get_user_id_by_threepid` to async/await. ([\#7620](#7620))
- Switch to upstream `dh-virtualenv` rather than our fork for Debian package builds. ([\#7621](#7621))
- Update CI scripts to check the number in the newsfile fragment. ([\#7623](#7623))
- Check if the localpart of a Matrix ID is reserved for guest users earlier in the registration flow, as well as when responding to requests to `/register/available`. ([\#7625](#7625))
- Minor cleanups to OpenID Connect integration. ([\#7628](#7628))
- Attempt to fix flaky test: `PhoneHomeStatsTestCase.test_performance_100`. ([\#7634](#7634))
- Fix typos of `m.olm.curve25519-aes-sha2` and `m.megolm.v1.aes-sha2` in comments, test files. ([\#7637](#7637))
- Convert user directory, state deltas, and stats handlers to async/await. ([\#7640](#7640))
- Remove some unused constants. ([\#7644](#7644))
- Fix type information on `assert_*_is_admin` methods. ([\#7645](#7645))
- Convert registration handler to async/await. ([\#7649](#7649))
phil-flex pushed a commit to phil-flex/synapse that referenced this pull request Jun 16, 2020
While working on matrix-org#5665 I found myself digging into the `Ratelimiter` class and seeing that it was both:

* Rather undocumented, and
* causing a *lot* of config checks

This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation. 

Best to be reviewed commit-by-commit.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants