Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Describe which rate limiter was hit in logs #16135

Merged
merged 9 commits into from
Aug 29, 2023

Conversation

DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson commented Aug 18, 2023

I've threatened to do this before, and found the need today while trying to run a bunch of tests locally.

I originally punted telling this to the clients. But it might be a bad idea to tell bad actors which rate limit they've tripped. Instead, err towards caution and only log this info.

Commitwise reviewable.

Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable overall.

self.per_second = config.get("per_second", defaults["per_second"])
self.burst_count = int(config.get("burst_count", defaults["burst_count"]))

def parse_rate_limit(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also use a class-method, which is what we usually do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was my first instinct---though I shied away from it because it's mildly irritating to type classmethods correctly in an inheritance-aware way. (typing.Self is the one true way, but that's in Python 3.11 and I didn't want to fiddle with typing_extensions etc).

I'll quickly make this a class method and just -> "RatelimitSettings", which is good enough.

tests/api/test_errors.py Outdated Show resolved Hide resolved
Comment on lines +525 to +527
@property
def debug_context(self) -> Optional[str]:
return self.limiter_name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug_context could probably just be a class-property too and __init__ could set self.debug_context = limiter_name. I don't feel strongly either way though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong feelings here either. Suggest we leave it as-is in the interests of landing this sooner rather than later.

@DMRobertson DMRobertson marked this pull request as ready for review August 29, 2023 17:56
@DMRobertson DMRobertson requested a review from a team as a code owner August 29, 2023 17:56
@DMRobertson
Copy link
Contributor Author

Would you mind taking another look? I had to merge in #16136, but I think I did it properly. (Let's see if the tests pass, I guess!)

@DMRobertson
Copy link
Contributor Author

Test failure was a known flake.

@DMRobertson DMRobertson merged commit 62a1a9b into develop Aug 29, 2023
35 of 37 checks passed
@DMRobertson DMRobertson deleted the dmr/ratelimit-name-in-err branch August 29, 2023 23:39
self,
store: DataStore,
clock: Clock,
cfg: RatelimitSettings,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sumnerevans pointed out to me that the docstring needs updating for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #16255.

DMRobertson pushed a commit that referenced this pull request Sep 5, 2023
- Add configuration setting for CAS protocol version. Contributed by Aurélien Grimpard. ([\#15816](#15816))
- Suppress notifications from message edits per [MSC3958](matrix-org/matrix-spec-proposals#3958). ([\#16113](#16113))
- Return a `Retry-After` with `M_LIMIT_EXCEEDED` error responses. ([\#16136](#16136))
- Add `last_seen_ts` to the [admin users API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html). ([\#16218](#16218))
- Improve resource usage when sending data to a large number of remote hosts that are marked as "down". ([\#16223](#16223))

- Fix IPv6-related bugs on SMTP settings, adding groundwork to fix similar issues. Contributed by @evilham and @telmich (ungleich.ch). ([\#16155](#16155))
- Fix a spec compliance issue where requests to the `/publicRooms` federation API would specify `include_all_networks` as a string. ([\#16185](#16185))
- Fix inaccurate error message while attempting to ban or unban a user with the same or higher PL by spliting the conditional statements. Contributed by @leviosacz. ([\#16205](#16205))
- Fix a rare bug that broke looping calls, which could lead to e.g. linearly increasing memory usage. Introduced in v1.90.0. ([\#16210](#16210))
- Fix a long-standing bug where uploading images would fail if we could not generate thumbnails for them. ([\#16211](#16211))
- Fix a long-standing bug where we did not correctly back off from servers that had "gone" if they returned 4xx series error codes. ([\#16221](#16221))

- Update links to the [matrix.org blog](https://matrix.org/blog/). ([\#16008](#16008))
- Document which [admin APIs](https://matrix-org.github.io/synapse/latest/usage/administration/admin_api/index.html) are disabled when experimental [MSC3861](matrix-org/matrix-spec-proposals#3861) support is enabled. ([\#16168](#16168))
- Document [`exclude_rooms_from_sync`](https://matrix-org.github.io/synapse/v1.92/usage/configuration/config_documentation.html#exclude_rooms_from_sync) configuration option. ([\#16178](#16178))

- Prepare unit tests for Python 3.12. ([\#16099](#16099))
- Fix nightly CI jobs. ([\#16121](#16121), [\#16213](#16213))
- Describe which rate limiter was hit in logs. ([\#16135](#16135))
- Simplify presence code when using workers. ([\#16170](#16170))
- Track per-device information in the presence code. ([\#16171](#16171), [\#16172](#16172))
- Stop using the `event_txn_id` table. ([\#16175](#16175))
- Use `AsyncMock` instead of custom code. ([\#16179](#16179), [\#16180](#16180))
- Improve error reporting of invalid data passed to `/_matrix/key/v2/query`. ([\#16183](#16183))
- Task scheduler: add replication notify for new task to launch ASAP. ([\#16184](#16184))
- Improve type hints. ([\#16186](#16186), [\#16188](#16188), [\#16201](#16201))
- Bump black version to 23.7.0. ([\#16187](#16187))
- Log the details of background update failures. ([\#16212](#16212))
- Cache device resync requests over replication. ([\#16241](#16241))

* Bump anyhow from 1.0.72 to 1.0.75. ([\#16141](#16141))
* Bump furo from 2023.7.26 to 2023.8.19. ([\#16238](#16238))
* Bump phonenumbers from 8.13.18 to 8.13.19. ([\#16237](#16237))
* Bump psycopg2 from 2.9.6 to 2.9.7. ([\#16196](#16196))
* Bump regex from 1.9.3 to 1.9.4. ([\#16195](#16195))
* Bump ruff from 0.0.277 to 0.0.286. ([\#16198](#16198))
* Bump sentry-sdk from 1.29.2 to 1.30.0. ([\#16236](#16236))
* Bump serde from 1.0.184 to 1.0.188. ([\#16194](#16194))
* Bump serde_json from 1.0.104 to 1.0.105. ([\#16140](#16140))
* Bump types-psycopg2 from 2.9.21.10 to 2.9.21.11. ([\#16200](#16200))
* Bump types-pyyaml from 6.0.12.10 to 6.0.12.11. ([\#16199](#16199))
@jcgruenhage
Copy link
Contributor

I originally punted telling this to the clients. But it might be a bad idea to tell bad actors which rate limit they've tripped. Instead, err towards caution and only log this info.

We're already telling the client exactly how long they have to wait. How is telling them which of the limits they ran into worse? I'd like to return this info to the clients, as it's more accessible than the logs.

Would you be open to a PR for changing that behaviour? If necessary, behind a config flag?

@clokep
Copy link
Member

clokep commented Sep 12, 2023

We're already telling the client exactly how long they have to wait. How is telling them which of the limits they ran into worse? I'd like to return this info to the clients, as it's more accessible than the logs.

Would you be open to a PR for changing that behaviour? If necessary, behind a config flag?

I do not believe we'd accept a PR to change this behavior. There's two reasons:

  1. It leaks additional information that the client does not need to know. (It would allow probing of the exact configuration of rate limiters. This might be possible already today, but it doesn't seem prudent to make this easier.)
  2. It would cause the rate limiting error responses of Synapse to no longer be spec compliant. (And rate limiters across homeserver implementations are not likely to be similar anyway, so this can't be used in a spec compliant manner.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants