Synapse doesn't send out device list updates to previously unseen homeservers when joining a room #11374

matrixbot · 2023-12-19T05:03:11Z

This issue has been migrated from #11374.

Servers must send m.device_list_update EDUs to all the servers who share a room with a given local user, and must be sent whenever that user’s device list changes (i.e. for new or deleted devices, when that user joins a room which contains servers which are not already receiving updates for that user’s device list, or changes in device information such as the device’s human-readable name).

Which to be clear, means that upon a local user joining a room, we should:

Check if any of the other homeservers in the room are new to us (their users don't share any other rooms with us)
Send device list updates of the joining user to those homeservers.

It doesn't appear that Synapse actually does this anywhere, currently.

We also need to do this for presence (matrix-org/synapse#8956), but the current presence-related TODO in the code may be a good inspiration for what a device list related implementation would look like:

https://github.com/matrix-org/synapse/blob/75ca0a6168f92dab3255839cf85fb0df3a0076c3/synapse/handlers/presence.py#L1368-L1379

richvdh · 2024-01-12T15:28:48Z

@kegsay questions whether this is still true, and will test

kegsay · 2024-01-24T14:37:06Z

I can unfortunately confirm that Synapse does not send an m.device_list_update EDU over federation to the joinee when joining a room. At best, this is a spec violation. At worst, this could cause device lists to not sync correctly.

I'm trying to see what the impact is of this on a real homeserver next, rather than a test federation server. For example, it could be that:

The federated server sees the join event.
When a user on the federated server syncs, it puts the joined user in device_lists.changed.
The user sees this and hits /keys/query.
The server hits the origin server to satisfy this query.
This then synchronises the device list correctly.

If this is happening, then aside from it being a bit messy, I think this isn't a UTD cause in the general case. It would be in the edge case where the origin server is dead/offline by the time the remote server hits the origin though...

Either way, writing more tests to confirm what is happening here...

kegsay · 2024-01-24T15:03:47Z

...which is exactly what is happening here. Added another Complement test to assert this.

So what does this mean?

Synapse (and Dendrite!) are not spec compliant here.
The EDU is not eagerly sent to the other server, but this is okay because we fetch the device keys when the receiver asks for them via /keys/query. This is why this isn't a big problem in the wild.
But... this assumes the server is reachable then. It may not be, in which case we will be unable to secure an Olm session for that user, causing a UTD.

In general, I would advocate for eager sending of necessary data (see MSC4081 for reasoned arguments to this effect).

It's hard to see how frequently this would cause UTDs. The type of UTD would be m.no_olm, but existing telemetry via m.room_key.withheld is insufficient here as we don't know the device ID to send the withheld message to! This would eventually fix itself when the sender manages to /keys/query, and then /keys/claim for each device.

kegsay · 2024-01-24T15:45:31Z

The current behaviour (Alice and Bob on different homeservers):

Alice joins a room with Bob. Alice and Bob have previously never shared a room.
Alice's server does not send a m.device_list_update EDU to Bob's server.
Bob syncs.
Bob's server puts Alice's user ID in device_lists.changed in the /sync response.
Bob's client hits /keys/query as a result, asking about Alice's user ID.
Bob's server then asks Alice's server about Alice's devices and device keys, returning the response to Bob's client.
Bob's client now knows the keys and devices for Alice, so all is well.

Versus what the specification says:

Alice joins a room with Bob. Alice and Bob have previously never shared a room.
Alice's server sends a m.device_list_update EDU to Bob's server.
Bob syncs.
Bob's server puts Alice's user ID in device_lists.changed in the /sync response.
Bob's client hits /keys/query as a result, asking about Alice's user ID.
Bob's server has a cached copy it can return in case Alice's server is down.
Bob's server then asks Alice's server about Alice's devices and device keys, returning the response to Bob's client.
Bob's client now knows the keys and devices for Alice, so all is well.

Things break currently when:

Alice joins a room with Bob. Alice and Bob have previously never shared a room.
Bob does not sync because he is offline.
Alice's server goes down / is network partitioned.
Bob comes online and syncs.

In this scenario, Synapse does not have cached keys to give to Bob, so a UTD is inevitable. If keys were cached, there would be no UTD. In other words, the specification solution is more robust to network failures than Synapse is currently.

Fixes #11374 Tested in matrix-org/complement#704

kegsay · 2024-09-06T11:53:42Z

Simplified Sliding Sync exacerbates this currently because it seems to not return data immediately to the client. Because of this, the /keys/query request will be delayed. This is important because that request clears backoff timers on the other HS, and until that is done, any /keys/claim requests will immediately fail with failures={"hs2": Object {"status": Number(503), "message": String("Not ready for retry")}}.

See matrix-org/complement-crypto#129 for a test which failed in SSS due to this.

matrixbot closed this as completed Dec 19, 2023

matrixbot changed the title ~~Dummy issue~~ Synapse doesn't send out device list updates to previously unseen homeservers when joining a room Dec 21, 2023

matrixbot added A-Spec-Compliance S-Minor T-Defect labels Dec 21, 2023

matrixbot reopened this Dec 21, 2023

kegsay self-assigned this Jan 12, 2024

kegsay mentioned this issue Jan 24, 2024

Add device list update regression test matrix-org/complement#704

Closed

kegsay added a commit that referenced this issue Jan 30, 2024

Send device list update EDUs on room join

38b304e

Fixes #11374 Tested in matrix-org/complement#704

This was referenced Jan 30, 2024

Send device list update EDUs on room join #16875

Closed

Add device list update regression test matrix-org/complement#706

Merged

richvdh mentioned this issue Feb 29, 2024

Users whose servers were unreachable will receive undecryptable messages due to failed OTK claim element-hq/element-meta#2154

Open

11 tasks

richvdh mentioned this issue Apr 30, 2024

E2E Device lists can get out of sync with the devices actually present in a room. element-hq/element-meta#2411

Open

7 tasks

kegsay mentioned this issue May 28, 2024

MSC4081: Eagerly sharing fallback keys with federated servers matrix-org/matrix-spec-proposals#4081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synapse doesn't send out device list updates to previously unseen homeservers when joining a room #11374

Synapse doesn't send out device list updates to previously unseen homeservers when joining a room #11374

matrixbot commented Dec 19, 2023 •

edited

Loading

richvdh commented Jan 12, 2024

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Sep 6, 2024 •

edited

Loading

Synapse doesn't send out device list updates to previously unseen homeservers when joining a room #11374

Synapse doesn't send out device list updates to previously unseen homeservers when joining a room #11374

Comments

matrixbot commented Dec 19, 2023 • edited Loading

richvdh commented Jan 12, 2024

kegsay commented Jan 24, 2024 • edited Loading

kegsay commented Jan 24, 2024 • edited Loading

kegsay commented Jan 24, 2024 • edited Loading

kegsay commented Sep 6, 2024 • edited Loading

matrixbot commented Dec 19, 2023 •

edited

Loading

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Jan 24, 2024 •

edited

Loading

kegsay commented Sep 6, 2024 •

edited

Loading