This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix race in federation sender that delayed device updates. #6799
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
We were sending device updates down both the federation stream and device streams. This mean there was a race if the federation sender worker processed the federation stream first, as when the sender checked if there were new device updates the slaved ID generator hadn't been updated with the new stream IDs and so returned nothing. This situation is correctly handled by events/receipts/etc by not sending updates down the federation stream and instead having the federation sender worker listen on the other streams and poke the transaction queues as appropriate.
babolivier
approved these changes
Jan 29, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm provided the CI agrees that it deflakes the flakey test
richvdh
reviewed
Jan 29, 2020
device_messages = {v: k for k, v in self.device_messages.items()[i:j]} | ||
|
||
for (destination, pos) in iteritems(device_messages): | ||
rows.append((pos, DeviceRow(destination=destination))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that DeviceRow can be removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, yes: #6800
babolivier
added a commit
that referenced
this pull request
Feb 12, 2020
Synapse 1.10.0 (2020-02-12) =========================== **WARNING to client developers**: As of this release Synapse validates `client_secret` parameters in the Client-Server API as per the spec. See [\#6766](#6766) for details. Updates to the Docker image --------------------------- - Update the docker images to Alpine Linux 3.11. ([\#6897](#6897)) Synapse 1.10.0rc5 (2020-02-11) ============================== Bugfixes -------- - Fix the filtering introduced in 1.10.0rc3 to also apply to the state blocks returned by `/sync`. ([\#6884](#6884)) Synapse 1.10.0rc4 (2020-02-11) ============================== This release candidate was built incorrectly and is superceded by 1.10.0rc5. Synapse 1.10.0rc3 (2020-02-10) ============================== Features -------- - Filter out `m.room.aliases` from the CS API to mitigate abuse while a better solution is specced. ([\#6878](#6878)) Internal Changes ---------------- - Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](#6880)) Synapse 1.10.0rc2 (2020-02-06) ============================== Bugfixes -------- - Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](#6844)) - Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](#6848)) Internal Changes ---------------- - Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](#6850)) Synapse 1.10.0rc1 (2020-01-31) ============================== Features -------- - Add experimental support for updated authorization rules for aliases events, from [MSC2260](matrix-org/matrix-spec-proposals#2260). ([\#6787](#6787), [\#6790](#6790), [\#6794](#6794)) Bugfixes -------- - Warn if postgres database has a non-C locale, as that can cause issues when upgrading locales (e.g. due to upgrading OS). ([\#6734](#6734)) - Minor fixes to `PUT /_synapse/admin/v2/users` admin api. ([\#6761](#6761)) - Validate `client_secret` parameter using the regex provided by the Client-Server API, temporarily allowing `:` characters for older clients. The `:` character will be removed in a future release. ([\#6767](#6767)) - Fix persisting redaction events that have been redacted (or otherwise don't have a redacts key). ([\#6771](#6771)) - Fix outbound federation request metrics. ([\#6795](#6795)) - Fix bug where querying a remote user's device keys that weren't cached resulted in only returning a single device. ([\#6796](#6796)) - Fix race in federation sender worker that delayed sending of device updates. ([\#6799](#6799), [\#6800](#6800)) - Fix bug where Synapse didn't invalidate cache of remote users' devices when Synapse left a room. ([\#6801](#6801)) - Fix waking up other workers when remote server is detected to have come back online. ([\#6811](#6811)) Improved Documentation ---------------------- - Clarify documentation related to `user_dir` and `federation_reader` workers. ([\#6775](#6775)) Internal Changes ---------------- - Record room versions in the `rooms` table. ([\#6729](#6729), [\#6788](#6788), [\#6810](#6810)) - Propagate cache invalidates from workers to other workers. ([\#6748](#6748)) - Remove some unnecessary admin handler abstraction methods. ([\#6751](#6751)) - Add some debugging for media storage providers. ([\#6757](#6757)) - Detect unknown remote devices and mark cache as stale. ([\#6776](#6776), [\#6819](#6819)) - Attempt to resync remote users' devices when detected as stale. ([\#6786](#6786)) - Delete current state from the database when server leaves a room. ([\#6792](#6792)) - When a client asks for a remote user's device keys check if the local cache for that user has been marked as potentially stale. ([\#6797](#6797)) - Add background update to clean out left rooms from current state. ([\#6802](#6802), [\#6816](#6816)) - Refactoring work in preparation for changing the event redaction algorithm. ([\#6803](#6803), [\#6805](#6805), [\#6806](#6806), [\#6807](#6807), [\#6820](#6820))
babolivier
pushed a commit
that referenced
this pull request
Sep 1, 2021
* commit '6b9e1014c': Fix race in federation sender that delayed device updates. (#6799)
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.
This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.