Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix race in federation sender that delayed device updates. #6799

Merged
merged 2 commits into from
Jan 29, 2020

Conversation

erikjohnston
Copy link
Member

We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.

This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.

We were sending device updates down both the federation stream and
device streams. This mean there was a race if the federation sender
worker processed the federation stream first, as when the sender checked
if there were new device updates the slaved ID generator hadn't been
updated with the new stream IDs and so returned nothing.

This situation is correctly handled by events/receipts/etc by not
sending updates down the federation stream and instead having the
federation sender worker listen on the other streams and poke the
transaction queues as appropriate.
@erikjohnston erikjohnston requested a review from a team January 29, 2020 10:55
Copy link
Contributor

@babolivier babolivier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm provided the CI agrees that it deflakes the flakey test

@erikjohnston erikjohnston merged commit 6b9e101 into develop Jan 29, 2020
device_messages = {v: k for k, v in self.device_messages.items()[i:j]}

for (destination, pos) in iteritems(device_messages):
rows.append((pos, DeviceRow(destination=destination)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean that DeviceRow can be removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, yes: #6800

@erikjohnston erikjohnston deleted the erikj/device_message_streams branch February 5, 2020 17:34
babolivier added a commit that referenced this pull request Feb 12, 2020
Synapse 1.10.0 (2020-02-12)
===========================

**WARNING to client developers**: As of this release Synapse validates `client_secret` parameters in the Client-Server API as per the spec. See [\#6766](#6766) for details.

Updates to the Docker image
---------------------------

- Update the docker images to Alpine Linux 3.11. ([\#6897](#6897))

Synapse 1.10.0rc5 (2020-02-11)
==============================

Bugfixes
--------

- Fix the filtering introduced in 1.10.0rc3 to also apply to the state blocks returned by `/sync`. ([\#6884](#6884))

Synapse 1.10.0rc4 (2020-02-11)
==============================

This release candidate was built incorrectly and is superceded by 1.10.0rc5.

Synapse 1.10.0rc3 (2020-02-10)
==============================

Features
--------

- Filter out `m.room.aliases` from the CS API to mitigate abuse while a better solution is specced. ([\#6878](#6878))

Internal Changes
----------------

- Fix continuous integration failures with old versions of `pip`, which were introduced by a release of the `zipp` library. ([\#6880](#6880))

Synapse 1.10.0rc2 (2020-02-06)
==============================

Bugfixes
--------

- Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](#6844))
- Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](#6848))

Internal Changes
----------------

- Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](#6850))

Synapse 1.10.0rc1 (2020-01-31)
==============================

Features
--------

- Add experimental support for updated authorization rules for aliases events, from [MSC2260](matrix-org/matrix-spec-proposals#2260). ([\#6787](#6787), [\#6790](#6790), [\#6794](#6794))

Bugfixes
--------

- Warn if postgres database has a non-C locale, as that can cause issues when upgrading locales (e.g. due to upgrading OS). ([\#6734](#6734))
- Minor fixes to `PUT /_synapse/admin/v2/users` admin api. ([\#6761](#6761))
- Validate `client_secret` parameter using the regex provided by the Client-Server API, temporarily allowing `:` characters for older clients. The `:` character will be removed in a future release. ([\#6767](#6767))
- Fix persisting redaction events that have been redacted (or otherwise don't have a redacts key). ([\#6771](#6771))
- Fix outbound federation request metrics. ([\#6795](#6795))
- Fix bug where querying a remote user's device keys that weren't cached resulted in only returning a single device. ([\#6796](#6796))
- Fix race in federation sender worker that delayed sending of device updates. ([\#6799](#6799), [\#6800](#6800))
- Fix bug where Synapse didn't invalidate cache of remote users' devices when Synapse left a room. ([\#6801](#6801))
- Fix waking up other workers when remote server is detected to have come back online. ([\#6811](#6811))

Improved Documentation
----------------------

- Clarify documentation related to `user_dir` and `federation_reader` workers. ([\#6775](#6775))

Internal Changes
----------------

- Record room versions in the `rooms` table. ([\#6729](#6729), [\#6788](#6788), [\#6810](#6810))
- Propagate cache invalidates from workers to other workers. ([\#6748](#6748))
- Remove some unnecessary admin handler abstraction methods. ([\#6751](#6751))
- Add some debugging for media storage providers. ([\#6757](#6757))
- Detect unknown remote devices and mark cache as stale. ([\#6776](#6776), [\#6819](#6819))
- Attempt to resync remote users' devices when detected as stale. ([\#6786](#6786))
- Delete current state from the database when server leaves a room. ([\#6792](#6792))
- When a client asks for a remote user's device keys check if the local cache for that user has been marked as potentially stale. ([\#6797](#6797))
- Add background update to clean out left rooms from current state. ([\#6802](#6802), [\#6816](#6816))
- Refactoring work in preparation for changing the event redaction algorithm. ([\#6803](#6803), [\#6805](#6805), [\#6806](#6806), [\#6807](#6807), [\#6820](#6820))
babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit '6b9e1014c':
  Fix race in federation sender that delayed device updates. (#6799)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants