Fix race in federation sender that delayed device updates. #6799

erikjohnston · 2020-01-29T10:51:43Z

We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.

This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.

We were sending device updates down both the federation stream and device streams. This mean there was a race if the federation sender worker processed the federation stream first, as when the sender checked if there were new device updates the slaved ID generator hadn't been updated with the new stream IDs and so returned nothing. This situation is correctly handled by events/receipts/etc by not sending updates down the federation stream and instead having the federation sender worker listen on the other streams and poke the transaction queues as appropriate.

babolivier

lgtm provided the CI agrees that it deflakes the flakey test

richvdh · 2020-01-29T11:41:00Z

synapse/federation/send_queue.py

-        device_messages = {v: k for k, v in self.device_messages.items()[i:j]}
-
-        for (destination, pos) in iteritems(device_messages):
-            rows.append((pos, DeviceRow(destination=destination)))


does this mean that DeviceRow can be removed?

Err, yes: #6800

Synapse 1.10.0 (2020-02-12) =========================== **WARNING to client developers**: As of this release Synapse validates `client_secret` parameters in the Client-Server API as per the spec. See [\#6766](#6766) for details. Updates to the Docker image --------------------------- - Update the docker images to Alpine Linux 3.11. ([\#6897](#6897)) Synapse 1.10.0rc5 (2020-02-11) ============================== Bugfixes -------- - Fix the filtering introduced in 1.10.0rc3 to also apply to the state blocks returned by `/sync`. ([\#6884](#6884)) Synapse 1.10.0rc4 (2020-02-11) ============================== This release candidate was built incorrectly and is superceded by 1.10.0rc5. Synapse 1.10.0rc3 (2020-02-10) ============================== Features -------- - Filter out `m.room.aliases` from the CS API to mitigate abuse while a better solution is specced. ([\#6878](#6878)) Internal Changes ---------------- - Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](#6880)) Synapse 1.10.0rc2 (2020-02-06) ============================== Bugfixes -------- - Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](#6844)) - Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](#6848)) Internal Changes ---------------- - Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](#6850)) Synapse 1.10.0rc1 (2020-01-31) ============================== Features -------- - Add experimental support for updated authorization rules for aliases events, from [MSC2260](matrix-org/matrix-spec-proposals#2260). ([\#6787](#6787), [\#6790](#6790), [\#6794](#6794)) Bugfixes -------- - Warn if postgres database has a non-C locale, as that can cause issues when upgrading locales (e.g. due to upgrading OS). ([\#6734](#6734)) - Minor fixes to `PUT /_synapse/admin/v2/users` admin api. ([\#6761](#6761)) - Validate `client_secret` parameter using the regex provided by the Client-Server API, temporarily allowing `:` characters for older clients. The `:` character will be removed in a future release. ([\#6767](#6767)) - Fix persisting redaction events that have been redacted (or otherwise don't have a redacts key). ([\#6771](#6771)) - Fix outbound federation request metrics. ([\#6795](#6795)) - Fix bug where querying a remote user's device keys that weren't cached resulted in only returning a single device. ([\#6796](#6796)) - Fix race in federation sender worker that delayed sending of device updates. ([\#6799](#6799), [\#6800](#6800)) - Fix bug where Synapse didn't invalidate cache of remote users' devices when Synapse left a room. ([\#6801](#6801)) - Fix waking up other workers when remote server is detected to have come back online. ([\#6811](#6811)) Improved Documentation ---------------------- - Clarify documentation related to `user_dir` and `federation_reader` workers. ([\#6775](#6775)) Internal Changes ---------------- - Record room versions in the `rooms` table. ([\#6729](#6729), [\#6788](#6788), [\#6810](#6810)) - Propagate cache invalidates from workers to other workers. ([\#6748](#6748)) - Remove some unnecessary admin handler abstraction methods. ([\#6751](#6751)) - Add some debugging for media storage providers. ([\#6757](#6757)) - Detect unknown remote devices and mark cache as stale. ([\#6776](#6776), [\#6819](#6819)) - Attempt to resync remote users' devices when detected as stale. ([\#6786](#6786)) - Delete current state from the database when server leaves a room. ([\#6792](#6792)) - When a client asks for a remote user's device keys check if the local cache for that user has been marked as potentially stale. ([\#6797](#6797)) - Add background update to clean out left rooms from current state. ([\#6802](#6802), [\#6816](#6816)) - Refactoring work in preparation for changing the event redaction algorithm. ([\#6803](#6803), [\#6805](#6805), [\#6806](#6806), [\#6807](#6807), [\#6820](#6820))

* commit '6b9e1014c': Fix race in federation sender that delayed device updates. (#6799)

erikjohnston added 2 commits January 29, 2020 10:47

Newsfile

c29d911

erikjohnston requested a review from a team January 29, 2020 10:55

babolivier approved these changes Jan 29, 2020

View reviewed changes

erikjohnston merged commit 6b9e101 into develop Jan 29, 2020

richvdh reviewed Jan 29, 2020

View reviewed changes

erikjohnston mentioned this pull request Jan 29, 2020

Remove unused DeviceRow class #6800

Merged

erikjohnston deleted the erikj/device_message_streams branch February 5, 2020 17:34

babolivier pushed a commit that referenced this pull request Sep 1, 2021

Fix race in federation sender that delayed device updates. (#6799)

70e37d5

* commit '6b9e1014c': Fix race in federation sender that delayed device updates. (#6799)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race in federation sender that delayed device updates. #6799

Fix race in federation sender that delayed device updates. #6799

erikjohnston commented Jan 29, 2020

babolivier left a comment

richvdh Jan 29, 2020

erikjohnston Jan 29, 2020

Fix race in federation sender that delayed device updates. #6799

Fix race in federation sender that delayed device updates. #6799

Conversation

erikjohnston commented Jan 29, 2020

babolivier left a comment

Choose a reason for hiding this comment

richvdh Jan 29, 2020

Choose a reason for hiding this comment

erikjohnston Jan 29, 2020

Choose a reason for hiding this comment