Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Make outgoing device list tracking work after we partial-join #13891

Closed
DMRobertson opened this issue Sep 23, 2022 · 6 comments · Fixed by #13934 or #13874
Closed

Make outgoing device list tracking work after we partial-join #13891

DMRobertson opened this issue Sep 23, 2022 · 6 comments · Fixed by #13934 or #13874
Assignees
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. A-Federated-Join joins over federation generally suck T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 23, 2022

To fix "/keys/query requests on a remote homeserver must not return stale data.":

  • Store the current device list stream ID when we start a partial join
  • When we transition from partial to fully joined, send all device list updates in device_list_changes_in_room since we last joined the room (by referring to the recorded stream ID) for the room to all servers that are either:
    1. are in the room; or
    2. have had a membership change between the state at the initial join and the new current state.

Originally posted by @erikjohnston in #12993 (comment)

@squahtx
Copy link
Contributor

squahtx commented Sep 23, 2022

There are complement tests for this on the squah/faster_room_joins_device_list_tracking branch (see the TestPartialStateJoin/Outgoing_device_list_updates/* tests)

@DMRobertson
Copy link
Contributor Author

To do this:

  • New migration that adds another column to the "paritally joined rooms" table

  • In/near store_partial_state_room write the device list stream ID to that column

  • maybe also store the join event ID we get back from the send_join response?

  • When we transition ->
    https://github.com/matrix-org/synapse/blob/e3512a7719e90f49e1056e1480358e9ff9de1d9d/synapse/storage /controllers/persist_events.py#L481:L481

    1. work out which servers to send to. For a): standard function get current hosts or similar. For b) get state at the join event ID (which should now exist) and compare to the current state. Want some kind of symmetric diference of dicts.
    2. work out what to send them. Should be a simple select from device_list_changes_in_room with a fixed room_id and where stream_id > ?.

Send out device list updates: add_device_list_outbound_pokes

@squahtx
Copy link
Contributor

squahtx commented Sep 23, 2022

I should mention that while we have partial state, we always send out device list updates to the servers listed in the servers_in_room /send_join response: #13874
so there may be opportunity for optimization based on that

@DMRobertson DMRobertson added A-Federated-Join joins over federation generally suck A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. labels Sep 27, 2022
@erikjohnston
Copy link
Member

There are complement tests for this on the squah/faster_room_joins_device_list_tracking branch (see the TestPartialStateJoin/Outgoing_device_list_updates/* tests)

I get:

=== RUN   TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_absent_servers_once_partial_state_join_completes_even_though_remote_server_left_room
    client.go:604: [CSAPI] POST hs1/_matrix/client/v3/register => 400 Bad Request (11.298801ms)
    federation_room_join_partial_state_test.go:1508: CSAPI.MustDo POST http://localhost:49249/_matrix/client/v3/register returned HTTP 400 : {"errcode":"M_USER_IN_USE","error":"User ID already taken."}

failure on that branch

@squahtx
Copy link
Contributor

squahtx commented Sep 28, 2022

=== RUN   TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_absent_servers_once_partial_state_join_completes_even_though_remote_server_left_room
    client.go:604: [CSAPI] POST hs1/_matrix/client/v3/register => 400 Bad Request (11.298801ms)
    federation_room_join_partial_state_test.go:1508: CSAPI.MustDo POST http://localhost:49249/_matrix/client/v3/register returned HTTP 400 : {"errcode":"M_USER_IN_USE","error":"User ID already taken."}

failure on that branch

My mistake, that branch is ever so slightly out of date. squah/faster_room_joins_device_list_tracking_outgoing_2 is the right one to use and all its tests are passing with #13934.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Device-List-Tracking Telling clients about other devices. Often related to E2EE. A-Federated-Join joins over federation generally suck T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
3 participants