-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Track when the pulled event signature fails #13815
Track when the pulled event signature fails #13815
Conversation
@@ -116,6 +122,9 @@ async def _check_sigs_and_hash( | |||
"event_id": pdu.event_id, | |||
} | |||
) | |||
await self._store.record_event_failed_pull_attempt( | |||
pdu.room_id, pdu.event_id, "Event content has been tampered with" | |||
) | |||
return redacted_event | |||
|
|||
spam_check = await self.spam_checker.check_event_for_spam(pdu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, we will potentially record a failed attempt when the spam checker soft-fails an event. Part of #13700
…e is invalid Related to - #13622 - #13635 - #13676 Follow-up to #13815 which tracks event signature failures. This PR aims to stop us from trying to `_get_state_ids_after_missing_prev_event` after we already know that the prev_event will fail from a previous attempt To explain an exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid. - `backfill` - `outgoing-federation-request` `/backfill` - `_check_sigs_and_hash_and_fetch` - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - `_process_pulled_events` - `_process_pulled_event` for each validated event - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it - `_get_state_ids_after_missing_prev_event` - `outgoing-federation-request` `/state_ids` - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check With this PR, we no longer call the burdensome `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before we try to run this.
) | ||
|
||
return valid_state_events, valid_auth_events | ||
|
||
@trace | ||
async def _check_sigs_and_hash_and_fetch( | ||
async def _check_sigs_and_hash_for_pulled_events_and_fetch( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed this so we stick to using this only for pulled event scenarios
record_failure_callback: Optional[ | ||
Callable[[EventBase, str], Awaitable[None]] | ||
] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For context, see #13815 (comment) for why we're using a callback pattern here.
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||
try: | ||
await _check_sigs_on_pdu(self.keyring, room_version, pdu) | ||
except Exception as exc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This except clause seems too broad; I'd worry it might swallow errors arising from bugs rather than solely invalid signatures.
Can we narrow down the interesting types of Exceptions that we care about here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. this will catch CancelledError
s which sounds dodgy (I don't know if cancellation is enabled for backfill, but it doesn't sound like a good reason to record a failed pull attempt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we can just worry about InvalidEventSignatureError
here 👍
But all of the downstream places in _check_sigs_on_pdu
do the same except Exception as e:
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What we decide here, we should also apply to #13814
Is there a way we can make the errors we care about opt-out instead of opt-in? It seems like it's going to suck to keep some of these error lists up to date whenever the downstream functions change and raise different errors.
Maybe it's better to be opt-in to handle just what you know but as an example at the following spot, we probably need to look at FederationError
, RuntimeError
, InvalidResponseError
from a quick glance at some of the downstream code and this list seems like it will morph under us since there is so much going on in that block. Is there a good way to get a general sense of all exception types that a code path can raise (doesn't need to be comprehensive, maybe info just from docstrings)?
synapse/synapse/handlers/federation_event.py
Lines 916 to 920 in 6f0c3e6
except Exception as exc: | |
await self._store.record_event_failed_pull_attempt( | |
event.room_id, event_id, str(exc) | |
) | |
raise exc |
(created a draft PR to track this point, #13969, and will update with what we think about here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine opt-out is easier — I don't think RuntimeError
, ValueError
, AssertionError
, KeyError
etc should be included, for example. These represent programming errors rather than the remote not acting according to the way we expect.
Tracking the set of exceptions that some code can throw is difficult in Python, unfortunately.
I would suggest trying some cases of where you want to track a failed pull attempt, perhaps even adding tests that reproduce these cases. I can think of at least these classes of problems:
- DNS errors
- TCP-level Connection errors
- TLS problems such as untrusted self-signed certs
- Homeserver not on federation whitelist (? what should we do here?)
- Bad response code
- Response isn't JSON
- JSON response isn't in an expected format (e.g. doesn't include the right keys)... bearing in mind that our validation is probably rubbish and this might currently manifest itself as a
KeyError
(IMO this should be fixed so it sounds more like a validation error)
I realise this sounds a bit painful but we ought to figure out what we should do for these cases. Should a network problem actually be tracked against the event at all; won't the existing backoff behaviour be sufficient there? ....
Perhaps, rather than accepting that the code underneath is always going to be messy, try and improve the downstream situation — can we define superclasses for some of these interesting groups of exceptions a e.g. TemporaryNetworkError
exception class and make relevant exceptions derived from those or get re-raised as those?
We can then add to the docstrings of the downstream functions what exceptions they're allowed to raise for these kinds of circumstances.
Just some thoughts, realise it may not necessarily be easy..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think RuntimeError, ValueError, AssertionError, KeyError etc should be included, for example. These represent programming errors rather than the remote not acting according to the way we expect.
Fair enough on RuntimeError
and friends, I saw this case and during my quick scan and seemed relevant (it's not) although if it was, would be a good candidate to make it throw the proper error type 👍
I would suggest trying some cases of where you want to track a failed pull attempt, perhaps even adding tests that reproduce these cases
Having these tests would be great. We'd want to apply them to /event/{eventId}
, /event_auth/{roomId}/{eventId}
, /state/{roomId}?event_id=$xxx
, /state_ids/{roomId}?event_id=$xxx
(although related and probably covered by the existing backoff and NotRetryingDestination
in the next point below). For /backfill
and /get_missing_events
, we can't track server/network problems because there is no specific event we're requesting but we can track whether the events in their responses are rejected, etc.
Should a network problem actually be tracked against the event at all; won't the existing backoff behaviour be sufficient there? ....
In general, if anything goes wrong with trying to pull an event from another server, we should backoff and do it in the background next time (#13623).
The existing backoff underlying all of the federation requests (matrixfederationclient.py
and NotRetryingDestination
) seems like it could be sufficient for network related errors. I haven't experienced network problems being the cause of /messages
being slow so I haven't put as much thought in this area.
Homeserver not on federation whitelist (? what should we do here?)
Seems like we shouldn't be asking homeservers if they aren't on our whitelist in the first place. I guess we just let FederationDeniedError
handle this and continue
to the next destination which accomplishes this though.
Thanks for the thoughtful response on this @reivilibre! Really helping me form the right thoughts around all of this!
pdu: EventBase, | ||
record_failure_callback: Optional[ | ||
Callable[[EventBase, str], Awaitable[None]] | ||
] = None, | ||
) -> EventBase: | ||
"""Checks that event is correctly signed by the sending server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to see a description of at least record_failure_callback
— I think the other 2 are fairly self-evident so not fussed about those particularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done in a later commit; in future please consider collapsing those into the same commit so that commit by commit review is possible for mere mortals like me who can't track the full thing ;-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My PR's are definitely not commit by commit review friendly. I find working that way in the first place is impossible.
And rebasing and splitting multiple changes in the same file is a super chore. I'd rather split stuff by PR which feel free to call out if you see something to split out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can make them a bit more CbC friendly, I would really appreciate it to make reviews less daunting if anything. (Maybe I'm the only one though? Definitely not a requirement, but just a preference from me especially when I'm not familiar with the area of code)
Since you seem to have the idea that it's difficult to do: My technique for things like this is to use lazygit
(a TUI git frontend but with good mouse support too), which lets you stage changes line-by-line and easily amend (shift+A) them into earlier commits. It also makes it easy to reorder commits etc. I think some other tools support this kind of thing as well, but lazygit is the best I've found for easily crafting the commits into a meaningful arrangement.
I think you may be referring to git's interactive rebasing and interactive staging; you can definitely do this a lot more efficiently than that :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. I find splitting by commits especially useful to avoid conflating 'style' changes (e.g. moving methods, renaming methods) and semantic changes. I think asking you to put these in separate PRs would be too much, though and it would certainly hurt the latency of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the team thoughts around rebasing? Anyone annoyed/worried about references changing after they already pulled the branch?
If you're just looking for some of the low hanging fruit of pulling out some style changes or merging some subsequent changes back to the original chunk, that seems a lot more do-able. If we're talking about building up iteratively commit by commit, I have a lot more apprehension to put effort into doing that.
I assume you don't want me to rebase things together from different review cycles so you can easily review the things that have changed.
What about merge commits? I find it a lot harder to rebase around merge commits and would prefer to just rebase my branch than merge develop
if I started doing this. But then the rebase would rewrite all of the history including stuff you already reviewed. Do you have a good way to see what's new even though I rebased on the latest develop
? (does that even matter to you)
A note around rebased history: I really like the GitLab diff versioning so even if you rebase, you can still reference old diff. I used to use this all the time over there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note around rebased history: I really like the GitLab diff versioning so even if you rebase, you can still reference old diff. I used to use this all the time over there.
I have similar experiences with GL and reviewboard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the team thoughts around rebasing? Anyone annoyed/worried about references changing after they already pulled the branch?
I believe the rough consensus is: "don't rebase after someone has started to review the changes", because doing so resets the "I have viewed this file" in Github's review UI.
See #13815 (comment) So wes can avoid things like `CancelledError` which is a valid error but probably should not count as an error
* Maybe not catch all errors to avoid things in the nature-of CancelledError See #13815 (comment) * Remove general exception tracking * Add changelog
Thanks for the review @reivilibre 🐹 |
Synapse 1.69.0rc1 (2022-10-04) ============================== Please note that legacy Prometheus metric names are now deprecated and will be removed in Synapse 1.73.0. Server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.69/upgrade.html#upgrading-to-v1690) for more details. Features -------- - Allow application services to set the `origin_server_ts` of a state event by providing the query parameter `ts` in [`PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey), per [MSC3316](matrix-org/matrix-spec-proposals#3316). Contributed by @lukasdenk. ([\#11866](#11866)) - Allow server admins to require a manual approval process before new accounts can be used (using [MSC3866](matrix-org/matrix-spec-proposals#3866)). ([\#13556](#13556)) - Exponentially backoff from backfilling the same event over and over. ([\#13635](#13635), [\#13936](#13936)) - Add cache invalidation across workers to module API. ([\#13667](#13667), [\#13947](#13947)) - Experimental implementation of [MSC3882](matrix-org/matrix-spec-proposals#3882) to allow an existing device/session to generate a login token for use on a new device/session. ([\#13722](#13722), [\#13868](#13868)) - Experimental support for thread-specific receipts ([MSC3771](matrix-org/matrix-spec-proposals#3771)). ([\#13782](#13782), [\#13893](#13893), [\#13932](#13932), [\#13937](#13937), [\#13939](#13939)) - Add experimental support for [MSC3881: Remotely toggle push notifications for another client](matrix-org/matrix-spec-proposals#3881). ([\#13799](#13799), [\#13831](#13831), [\#13860](#13860)) - Keep track when an event pulled over federation fails its signature check so we can intelligently back-off in the future. ([\#13815](#13815)) - Improve validation for the unspecced, internal-only `_matrix/client/unstable/add_threepid/msisdn/submit_token` endpoint. ([\#13832](#13832)) - Faster remote room joins: record _when_ we first partial-join to a room. ([\#13892](#13892)) - Support a `dir` parameter on the `/relations` endpoint per [MSC3715](matrix-org/matrix-spec-proposals#3715). ([\#13920](#13920)) - Ask mail servers receiving emails from Synapse to not send automatic replies (e.g. out-of-office responses). ([\#13957](#13957)) Bugfixes -------- - Send push notifications for invites received over federation. ([\#13719](#13719), [\#14014](#14014)) - Fix a long-standing bug where typing events would be accepted from remote servers not present in a room. Also fix a bug where incoming typing events would cause other incoming events to get stuck during a fast join. ([\#13830](#13830)) - Fix a bug introduced in Synapse v1.53.0 where the experimental implementation of [MSC3715](matrix-org/matrix-spec-proposals#3715) would give incorrect results when paginating forward. ([\#13840](#13840)) - Fix access token leak to logs from proxy agent. ([\#13855](#13855)) - Fix `have_seen_event` cache not being invalidated after we persist an event which causes inefficiency effects like extra `/state` federation calls. ([\#13863](#13863)) - Faster room joins: Fix a bug introduced in 1.66.0 where an error would be logged when syncing after joining a room. ([\#13872](#13872)) - Fix a bug introduced in 1.66.0 where some required fields in the pushrules sent to clients were not present anymore. Contributed by Nico. ([\#13904](#13904)) - Fix packaging to include `Cargo.lock` in `sdist`. ([\#13909](#13909)) - Fix a long-standing bug where device updates could cause delays sending out to-device messages over federation. ([\#13922](#13922)) - Fix a bug introduced in v1.68.0 where Synapse would require `setuptools_rust` at runtime, even though the package is only required at build time. ([\#13952](#13952)) - Fix a long-standing bug where `POST /_matrix/client/v3/keys/query` requests could result in excessively large SQL queries. ([\#13956](#13956)) - Fix a performance regression in the `get_users_in_room` database query. Introduced in v1.67.0. ([\#13972](#13972)) - Fix a bug introduced in v1.68.0 bug where Rust extension wasn't built in `release` mode when using `poetry install`. ([\#14009](#14009)) - Do not return an unspecified `original_event` field when using the stable `/relations` endpoint. Introduced in Synapse v1.57.0. ([\#14025](#14025)) - Correctly handle a race with device lists when a remote user leaves during a partial join. ([\#13885](#13885)) - Correctly handle sending local device list updates to remote servers during a partial join. ([\#13934](#13934)) Improved Documentation ---------------------- - Add `worker_main_http_uri` for the worker generator bash script. ([\#13772](#13772)) - Update URL for the NixOS module for Synapse. ([\#13818](#13818)) - Fix a mistake in sso_mapping_providers.md: `map_user_attributes` is expected to return `display_name`, not `displayname`. ([\#13836](#13836)) - Fix a cross-link from the registration admin API to the `registration_shared_secret` configuration documentation. ([\#13870](#13870)) - Update the man page for the `hash_password` script to correct the default number of bcrypt rounds performed. ([\#13911](#13911), [\#13930](#13930)) - Emphasize the right reasons when to use `(room_id, event_id)` in a database schema. ([\#13915](#13915)) - Add instruction to contributing guide for running unit tests in parallel. Contributed by @ashfame. ([\#13928](#13928)) - Clarify that the `auto_join_rooms` config option can also be used with Space aliases. ([\#13931](#13931)) - Add some cross references to worker documentation. ([\#13974](#13974)) - Linkify urls in config documentation. ([\#14003](#14003)) Deprecations and Removals ------------------------- - Remove the `complete_sso_login` method from the Module API which was deprecated in Synapse 1.13.0. ([\#13843](#13843)) - Announce that legacy metric names are deprecated, will be turned off by default in Synapse v1.71.0 and removed altogether in Synapse v1.73.0. See the upgrade notes for more information. ([\#14024](#14024)) Internal Changes ---------------- - Speed up creation of DM rooms. ([\#13487](#13487), [\#13800](#13800)) - Port push rules to using Rust. ([\#13768](#13768), [\#13838](#13838), [\#13889](#13889)) - Optimise get rooms for user calls. Contributed by Nick @ Beeper (@Fizzadar). ([\#13787](#13787)) - Update the script which makes full schema dumps. ([\#13792](#13792)) - Use shared methods for cache invalidation when persisting events, remove duplicate codepaths. Contributed by Nick @ Beeper (@Fizzadar). ([\#13796](#13796)) - Improve the `synapse.api.auth.Auth` mock used in unit tests. ([\#13809](#13809)) - Faster Remote Room Joins: tell remote homeservers that we are unable to authorise them if they query a room which has partial state on our server. ([\#13823](#13823)) - Carry IdP Session IDs through user-mapping sessions. ([\#13839](#13839)) - Fix the release script not publishing binary wheels. ([\#13850](#13850)) - Raise issue if complement fails with latest deps. ([\#13859](#13859)) - Correct the comments in the complement dockerfile. ([\#13867](#13867)) - Create a new snapshot of the database schema. ([\#13873](#13873)) - Faster room joins: Send device list updates to most servers in rooms with partial state. ([\#13874](#13874), [\#14013](#14013)) - Add comments to the Prometheus recording rules to make it clear which set of rules you need for Grafana or Prometheus Console. ([\#13876](#13876)) - Only pull relevant backfill points from the database based on the current depth and limit (instead of all) every time we want to `/backfill`. ([\#13879](#13879)) - Faster room joins: Avoid waiting for full state when processing `/keys/changes` requests. ([\#13888](#13888)) - Improve backfill robustness by trying more servers when we get a `4xx` error back. ([\#13890](#13890)) - Fix mypy errors with canonicaljson 1.6.3. ([\#13905](#13905)) - Faster remote room joins: correctly handle remote device list updates during a partial join. ([\#13913](#13913)) - Complement image: propagate SIGTERM to all workers. ([\#13914](#13914)) - Update an innaccurate comment in Synapse's upsert database helper. ([\#13924](#13924)) - Update mypy (0.950 -> 0.981) and mypy-zope (0.3.7 -> 0.3.11). ([\#13925](#13925), [\#13993](#13993)) - Use dedicated `get_local_users_in_room(room_id)` function to find local users when calculating users to copy over during a room upgrade. ([\#13960](#13960)) - Refactor language in user directory `_track_user_joined_room` code to make it more clear that we use both local and remote users. ([\#13966](#13966)) - Revert catch-all exceptions being recorded as event pull attempt failures (only handle what we know about). ([\#13969](#13969)) - Speed up calculating push actions in large rooms. ([\#13973](#13973), [\#13992](#13992)) - Enable update notifications from Github's dependabot. ([\#13976](#13976)) - Prototype a workflow to automatically add changelogs to dependabot PRs. ([\#13998](#13998), [\#14011](#14011), [\#14017](#14017), [\#14021](#14021), [\#14027](#14027)) - Fix type annotations to be compatible with new annotations in development versions of twisted. ([\#14012](#14012)) - Clear out stale entries in `event_push_actions_staging` table. ([\#14020](#14020)) - Bump versions of GitHub actions. ([\#13978](#13978), [\#13979](#13979), [\#13980](#13980), [\#13982](#13982), [\#14015](#14015), [\#14019](#14019), [\#14022](#14022), [\#14023](#14023))
…ure is invalid (#13816) While #13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place. Related to - #13622 - #13635 - #13676 Part of #13356 Follow-up to #13815 which tracks event signature failures. With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now. For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761 To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid. - `backfill` - `outgoing-federation-request` `/backfill` - `_check_sigs_and_hash_and_fetch` - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - `_process_pulled_events` - `_process_pulled_event` for each validated event - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it - `_get_state_ids_after_missing_prev_event` - `outgoing-federation-request` `/state_ids` - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
NOTE: this is absolutely *not* safe for Beeper usage as-is. I have merged all of the Python code in but all our customizations to the base rules and push rule evaluator are not yet present in the new Rust module. This will fail tests as-is and future commits will re-apply our changes in Rust. Synapse 1.69.0 (2022-10-17) =========================== Please note that legacy Prometheus metric names are now deprecated and will be removed in Synapse 1.73.0. Server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.69/upgrade.html#upgrading-to-v1690) for more details. No significant changes since 1.69.0rc4. Synapse 1.69.0rc4 (2022-10-14) ============================== Bugfixes -------- - Fix poor performance of the `event_push_backfill_thread_id` background update, which was introduced in Synapse 1.68.0rc1. ([\matrix-org#14172](matrix-org#14172), [\matrix-org#14181](matrix-org#14181)) Updates to the Docker image --------------------------- - Fix docker build OOMing in CI for arm64 builds. ([\matrix-org#14173](matrix-org#14173)) Synapse 1.69.0rc3 (2022-10-12) ============================== Bugfixes -------- - Fix an issue with Docker images causing the Rust dependencies to not be pinned correctly. Introduced in v1.68.0 ([\matrix-org#14129](matrix-org#14129)) - Fix a bug introduced in Synapse 1.69.0rc1 which would cause registration replication requests to fail if the worker sending the request is not running Synapse 1.69. ([\matrix-org#14135](matrix-org#14135)) - Fix error in background update when rotating existing notifications. Introduced in v1.69.0rc2. ([\matrix-org#14138](matrix-org#14138)) Internal Changes ---------------- - Rename the `url_preview` extra to `url-preview`, for compatability with poetry-core 1.3.0 and [PEP 685](https://peps.python.org/pep-0685/). From-source installations using this extra will need to install using the new name. ([\matrix-org#14085](matrix-org#14085)) Synapse 1.69.0rc2 (2022-10-06) ============================== Deprecations and Removals ------------------------- - Deprecate the `generate_short_term_login_token` method in favor of an async `create_login_token` method in the Module API. ([\matrix-org#13842](matrix-org#13842)) Internal Changes ---------------- - Ensure Synapse v1.69 works with upcoming database changes in v1.70. ([\matrix-org#14045](matrix-org#14045)) - Fix a bug introduced in Synapse v1.68.0 where messages could not be sent in rooms with non-integer `notifications` power level. ([\matrix-org#14073](matrix-org#14073)) - Temporarily pin build-system requirements to workaround an incompatibility with poetry-core 1.3.0. This will be reverted before the v1.69.0 release proper, see [\matrix-org#14079](matrix-org#14079). ([\matrix-org#14080](matrix-org#14080)) Synapse 1.69.0rc1 (2022-10-04) ============================== Features -------- - Allow application services to set the `origin_server_ts` of a state event by providing the query parameter `ts` in [`PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey), per [MSC3316](matrix-org/matrix-spec-proposals#3316). Contributed by @lukasdenk. ([\matrix-org#11866](matrix-org#11866)) - Allow server admins to require a manual approval process before new accounts can be used (using [MSC3866](matrix-org/matrix-spec-proposals#3866)). ([\matrix-org#13556](matrix-org#13556)) - Exponentially backoff from backfilling the same event over and over. ([\matrix-org#13635](matrix-org#13635), [\matrix-org#13936](matrix-org#13936)) - Add cache invalidation across workers to module API. ([\matrix-org#13667](matrix-org#13667), [\matrix-org#13947](matrix-org#13947)) - Experimental implementation of [MSC3882](matrix-org/matrix-spec-proposals#3882) to allow an existing device/session to generate a login token for use on a new device/session. ([\matrix-org#13722](matrix-org#13722), [\matrix-org#13868](matrix-org#13868)) - Experimental support for thread-specific receipts ([MSC3771](matrix-org/matrix-spec-proposals#3771)). ([\matrix-org#13782](matrix-org#13782), [\matrix-org#13893](matrix-org#13893), [\matrix-org#13932](matrix-org#13932), [\matrix-org#13937](matrix-org#13937), [\matrix-org#13939](matrix-org#13939)) - Add experimental support for [MSC3881: Remotely toggle push notifications for another client](matrix-org/matrix-spec-proposals#3881). ([\matrix-org#13799](matrix-org#13799), [\matrix-org#13831](matrix-org#13831), [\matrix-org#13860](matrix-org#13860)) - Keep track when an event pulled over federation fails its signature check so we can intelligently back-off in the future. ([\matrix-org#13815](matrix-org#13815)) - Improve validation for the unspecced, internal-only `_matrix/client/unstable/add_threepid/msisdn/submit_token` endpoint. ([\matrix-org#13832](matrix-org#13832)) - Faster remote room joins: record _when_ we first partial-join to a room. ([\matrix-org#13892](matrix-org#13892)) - Support a `dir` parameter on the `/relations` endpoint per [MSC3715](matrix-org/matrix-spec-proposals#3715). ([\matrix-org#13920](matrix-org#13920)) - Ask mail servers receiving emails from Synapse to not send automatic replies (e.g. out-of-office responses). ([\matrix-org#13957](matrix-org#13957)) Bugfixes -------- - Send push notifications for invites received over federation. ([\matrix-org#13719](matrix-org#13719), [\matrix-org#14014](matrix-org#14014)) - Fix a long-standing bug where typing events would be accepted from remote servers not present in a room. Also fix a bug where incoming typing events would cause other incoming events to get stuck during a fast join. ([\matrix-org#13830](matrix-org#13830)) - Fix a bug introduced in Synapse v1.53.0 where the experimental implementation of [MSC3715](matrix-org/matrix-spec-proposals#3715) would give incorrect results when paginating forward. ([\matrix-org#13840](matrix-org#13840)) - Fix access token leak to logs from proxy agent. ([\matrix-org#13855](matrix-org#13855)) - Fix `have_seen_event` cache not being invalidated after we persist an event which causes inefficiency effects like extra `/state` federation calls. ([\matrix-org#13863](matrix-org#13863)) - Faster room joins: Fix a bug introduced in 1.66.0 where an error would be logged when syncing after joining a room. ([\matrix-org#13872](matrix-org#13872)) - Fix a bug introduced in 1.66.0 where some required fields in the pushrules sent to clients were not present anymore. Contributed by Nico. ([\matrix-org#13904](matrix-org#13904)) - Fix packaging to include `Cargo.lock` in `sdist`. ([\matrix-org#13909](matrix-org#13909)) - Fix a long-standing bug where device updates could cause delays sending out to-device messages over federation. ([\matrix-org#13922](matrix-org#13922)) - Fix a bug introduced in v1.68.0 where Synapse would require `setuptools_rust` at runtime, even though the package is only required at build time. ([\matrix-org#13952](matrix-org#13952)) - Fix a long-standing bug where `POST /_matrix/client/v3/keys/query` requests could result in excessively large SQL queries. ([\matrix-org#13956](matrix-org#13956)) - Fix a performance regression in the `get_users_in_room` database query. Introduced in v1.67.0. ([\matrix-org#13972](matrix-org#13972)) - Fix a bug introduced in v1.68.0 bug where Rust extension wasn't built in `release` mode when using `poetry install`. ([\matrix-org#14009](matrix-org#14009)) - Do not return an unspecified `original_event` field when using the stable `/relations` endpoint. Introduced in Synapse v1.57.0. ([\matrix-org#14025](matrix-org#14025)) - Correctly handle a race with device lists when a remote user leaves during a partial join. ([\matrix-org#13885](matrix-org#13885)) - Correctly handle sending local device list updates to remote servers during a partial join. ([\matrix-org#13934](matrix-org#13934)) Improved Documentation ---------------------- - Add `worker_main_http_uri` for the worker generator bash script. ([\matrix-org#13772](matrix-org#13772)) - Update URL for the NixOS module for Synapse. ([\matrix-org#13818](matrix-org#13818)) - Fix a mistake in sso_mapping_providers.md: `map_user_attributes` is expected to return `display_name`, not `displayname`. ([\matrix-org#13836](matrix-org#13836)) - Fix a cross-link from the registration admin API to the `registration_shared_secret` configuration documentation. ([\matrix-org#13870](matrix-org#13870)) - Update the man page for the `hash_password` script to correct the default number of bcrypt rounds performed. ([\matrix-org#13911](matrix-org#13911), [\matrix-org#13930](matrix-org#13930)) - Emphasize the right reasons when to use `(room_id, event_id)` in a database schema. ([\matrix-org#13915](matrix-org#13915)) - Add instruction to contributing guide for running unit tests in parallel. Contributed by @ashfame. ([\matrix-org#13928](matrix-org#13928)) - Clarify that the `auto_join_rooms` config option can also be used with Space aliases. ([\matrix-org#13931](matrix-org#13931)) - Add some cross references to worker documentation. ([\matrix-org#13974](matrix-org#13974)) - Linkify urls in config documentation. ([\matrix-org#14003](matrix-org#14003)) Deprecations and Removals ------------------------- - Remove the `complete_sso_login` method from the Module API which was deprecated in Synapse 1.13.0. ([\matrix-org#13843](matrix-org#13843)) - Announce that legacy metric names are deprecated, will be turned off by default in Synapse v1.71.0 and removed altogether in Synapse v1.73.0. See the upgrade notes for more information. ([\matrix-org#14024](matrix-org#14024)) Internal Changes ---------------- - Speed up creation of DM rooms. ([\matrix-org#13487](matrix-org#13487), [\matrix-org#13800](matrix-org#13800)) - Port push rules to using Rust. ([\matrix-org#13768](matrix-org#13768), [\matrix-org#13838](matrix-org#13838), [\matrix-org#13889](matrix-org#13889)) - Optimise get rooms for user calls. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13787](matrix-org#13787)) - Update the script which makes full schema dumps. ([\matrix-org#13792](matrix-org#13792)) - Use shared methods for cache invalidation when persisting events, remove duplicate codepaths. Contributed by Nick @ Beeper (@Fizzadar). ([\matrix-org#13796](matrix-org#13796)) - Improve the `synapse.api.auth.Auth` mock used in unit tests. ([\matrix-org#13809](matrix-org#13809)) - Faster Remote Room Joins: tell remote homeservers that we are unable to authorise them if they query a room which has partial state on our server. ([\matrix-org#13823](matrix-org#13823)) - Carry IdP Session IDs through user-mapping sessions. ([\matrix-org#13839](matrix-org#13839)) - Fix the release script not publishing binary wheels. ([\matrix-org#13850](matrix-org#13850)) - Raise issue if complement fails with latest deps. ([\matrix-org#13859](matrix-org#13859)) - Correct the comments in the complement dockerfile. ([\matrix-org#13867](matrix-org#13867)) - Create a new snapshot of the database schema. ([\matrix-org#13873](matrix-org#13873)) - Faster room joins: Send device list updates to most servers in rooms with partial state. ([\matrix-org#13874](matrix-org#13874), [\matrix-org#14013](matrix-org#14013)) - Add comments to the Prometheus recording rules to make it clear which set of rules you need for Grafana or Prometheus Console. ([\matrix-org#13876](matrix-org#13876)) - Only pull relevant backfill points from the database based on the current depth and limit (instead of all) every time we want to `/backfill`. ([\matrix-org#13879](matrix-org#13879)) - Faster room joins: Avoid waiting for full state when processing `/keys/changes` requests. ([\matrix-org#13888](matrix-org#13888)) - Improve backfill robustness by trying more servers when we get a `4xx` error back. ([\matrix-org#13890](matrix-org#13890)) - Fix mypy errors with canonicaljson 1.6.3. ([\matrix-org#13905](matrix-org#13905)) - Faster remote room joins: correctly handle remote device list updates during a partial join. ([\matrix-org#13913](matrix-org#13913)) - Complement image: propagate SIGTERM to all workers. ([\matrix-org#13914](matrix-org#13914)) - Update an innaccurate comment in Synapse's upsert database helper. ([\matrix-org#13924](matrix-org#13924)) - Update mypy (0.950 -> 0.981) and mypy-zope (0.3.7 -> 0.3.11). ([\matrix-org#13925](matrix-org#13925), [\matrix-org#13993](matrix-org#13993)) - Use dedicated `get_local_users_in_room(room_id)` function to find local users when calculating users to copy over during a room upgrade. ([\matrix-org#13960](matrix-org#13960)) - Refactor language in user directory `_track_user_joined_room` code to make it more clear that we use both local and remote users. ([\matrix-org#13966](matrix-org#13966)) - Revert catch-all exceptions being recorded as event pull attempt failures (only handle what we know about). ([\matrix-org#13969](matrix-org#13969)) - Speed up calculating push actions in large rooms. ([\matrix-org#13973](matrix-org#13973), [\matrix-org#13992](matrix-org#13992)) - Enable update notifications from Github's dependabot. ([\matrix-org#13976](matrix-org#13976)) - Prototype a workflow to automatically add changelogs to dependabot PRs. ([\matrix-org#13998](matrix-org#13998), [\matrix-org#14011](matrix-org#14011), [\matrix-org#14017](matrix-org#14017), [\matrix-org#14021](matrix-org#14021), [\matrix-org#14027](matrix-org#14027)) - Fix type annotations to be compatible with new annotations in development versions of twisted. ([\matrix-org#14012](matrix-org#14012)) - Clear out stale entries in `event_push_actions_staging` table. ([\matrix-org#14020](matrix-org#14020)) - Bump versions of GitHub actions. ([\matrix-org#13978](matrix-org#13978), [\matrix-org#13979](matrix-org#13979), [\matrix-org#13980](matrix-org#13980), [\matrix-org#13982](matrix-org#13982), [\matrix-org#14015](matrix-org#14015), [\matrix-org#14019](matrix-org#14019), [\matrix-org#14022](matrix-org#14022), [\matrix-org#14023](matrix-org#14023)) # -----BEGIN PGP SIGNATURE----- # # iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmNNMOMQHGVyaWtAbWF0 # cml4Lm9yZwAKCRClQuTtGw+sCVnjB/9jpJRVnicteEpDfVX9iLo2qfIfcO/GhUJK # pJhv4yuY9whAldvJpmNw2f9tfUbAMcvrjlFvNrjihWmXcAGFazC6i3fNBjPgZW2e # Sxsuuy8xc9X/OqH2EUpHtNZQX3FfSbdBS93Z62ZO3R8tEbCQvjw6FXBdjjjf5uLO # y5Lsx94+41FJYOhs1Kt4fN92B9WMACR6e/O1YcsDjIXsoZI3uqO1h8filbQIZee7 # DTATE7eIPtShs2Ezaaeuc7tZGVDyPvgWIbuxuT6OGx20zmuChYJgIcVaD1me4UzJ # i9bVigtpYN0eUxuWnjLf7YC6Ys/Y9wZ7/lhdgaBwdbQKEJdpi+S4 # =JWaO # -----END PGP SIGNATURE----- # gpg: Signature made Mon Oct 17 11:39:31 2022 BST # gpg: using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09 # gpg: issuer "erik@matrix.org" # gpg: Can't check signature: No public key # Conflicts: # docker/Dockerfile # pyproject.toml # synapse/_scripts/update_synapse_database.py # synapse/handlers/message.py # synapse/handlers/receipts.py # synapse/logging/context.py # synapse/push/baserules.py # synapse/push/bulk_push_rule_evaluator.py # synapse/push/push_rule_evaluator.py # synapse/replication/http/send_event.py # synapse/rest/admin/users.py # synapse/rest/client/read_marker.py # synapse/rest/client/receipts.py # synapse/rest/client/room.py # synapse/storage/_base.py # synapse/storage/databases/main/__init__.py # synapse/storage/databases/main/cache.py # synapse/storage/databases/main/events.py # synapse/storage/databases/main/receipts.py # tests/push/test_push_rule_evaluator.py
Synapse 1.69.0 (2022-10-17) =========================== Please note that legacy Prometheus metric names are now deprecated and will be removed in Synapse 1.73.0. Server administrators should update their dashboards and alerting rules to avoid using the deprecated metric names. See the [upgrade notes](https://matrix-org.github.io/synapse/v1.69/upgrade.html#upgrading-to-v1690) for more details. Deprecations and Removals ------------------------- - Remove the `complete_sso_login` method from the Module API which was deprecated in Synapse 1.13.0. ([\#13843](matrix-org/synapse#13843)) - Announce that legacy metric names are deprecated, will be turned off by default in Synapse v1.71.0 and removed altogether in Synapse v1.73.0. See the upgrade notes for more information. ([\#14024](matrix-org/synapse#14024)) - Deprecate the `generate_short_term_login_token` method in favor of an async `create_login_token` method in the Module API. ([\#13842](matrix-org/synapse#13842)) Features -------- - Allow application services to set the `origin_server_ts` of a state event by providing the query parameter `ts` in [`PUT /_matrix/client/r0/rooms/{roomId}/state/{eventType}/{stateKey}`](https://spec.matrix.org/v1.4/client-server-api/#put_matrixclientv3roomsroomidstateeventtypestatekey), per [MSC3316](matrix-org/matrix-spec-proposals#3316). Contributed by @lukasdenk. ([\#11866](matrix-org/synapse#11866)) - Allow server admins to require a manual approval process before new accounts can be used (using [MSC3866](matrix-org/matrix-spec-proposals#3866)). ([\#13556](matrix-org/synapse#13556)) - Exponentially backoff from backfilling the same event over and over. ([\#13635](matrix-org/synapse#13635), [\#13936](matrix-org/synapse#13936)) - Add cache invalidation across workers to module API. ([\#13667](matrix-org/synapse#13667), [\#13947](matrix-org/synapse#13947)) - Experimental implementation of [MSC3882](matrix-org/matrix-spec-proposals#3882) to allow an existing device/session to generate a login token for use on a new device/session. ([\#13722](matrix-org/synapse#13722), [\#13868](matrix-org/synapse#13868)) - Experimental support for thread-specific receipts ([MSC3771](matrix-org/matrix-spec-proposals#3771)). ([\#13782](matrix-org/synapse#13782), [\#13893](matrix-org/synapse#13893), [\#13932](matrix-org/synapse#13932), [\#13937](matrix-org/synapse#13937), [\#13939](matrix-org/synapse#13939)) - Add experimental support for [MSC3881: Remotely toggle push notifications for another client](matrix-org/matrix-spec-proposals#3881). ([\#13799](matrix-org/synapse#13799), [\#13831](matrix-org/synapse#13831), [\#13860](matrix-org/synapse#13860)) - Keep track when an event pulled over federation fails its signature check so we can intelligently back-off in the future. ([\#13815](matrix-org/synapse#13815)) - Improve validation for the unspecced, internal-only `_matrix/client/unstable/add_threepid/msisdn/submit_token` endpoint. ([\#13832](matrix-org/synapse#13832)) - Faster remote room joins: record _when_ we first partial-join to a room. ([\#13892](matrix-org/synapse#13892)) - Support a `dir` parameter on the `/relations` endpoint per [MSC3715](matrix-org/matrix-spec-proposals#3715). ([\#13920](matrix-org/synapse#13920)) - Ask mail servers receiving emails from Synapse to not send automatic replies (e.g. out-of-office responses). ([\#13957](matrix-org/synapse#13957))
Track when the pulled event signature fails. Because we're doing the recording in
_check_sigs_and_hash_for_pulled_events_and_fetch
(previously named_check_sigs_and_hash_and_fetch
), this means we will track signature failures forbackfill
,get_room_state
,get_event_auth
, andget_missing_events
(all pulled event scenarios). And we also record signature failures fromget_pdu
.Part of #13700
Part of #13676 and #13356
This PR will be especially important for #13816 so we can avoid the costly
_get_state_ids_after_missing_prev_event
down the line when/messages
calls backfill.Dev notes
Other usages of
_check_sigs_and_hash
. It looks like it's all processing events that come from another homeserver.Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.Pull request includes a sign off(run the linters)