Faster remote room joins: Support partial join re-syncing on workers other than the master #14544

matrixbot · 2023-12-20T19:38:22Z

This issue has been migrated from #14544.

An enhancement of: #12994 (worker-mode support for Faster Remote Room Joins).

Instead of relying on the master to perform the re-syncing of the rooms, we should allow other workers to be involved.
Part of the difficulty is in choosing a worker to perform the re-sync for a room, ensuring that even after a crash/restart, exactly one worker will pick up the job of re-syncing that room again.
We should be mindful that in a hypothetical deployment, workers can be taken out of service — a room shouldn't be locked to one worker forever in case this happens, as that would mean the re-sync would never progress.

Aside: in future we should consider moving the /send_join request out of the master process. The obvious candidate is the "client reader" that receives the client-side /join request (and hence currently makes the request to ReplicationRemoteJoinRestServlet). The main thing to worry about then is locking (to ensure that we don't have multiple workers all trying to do the remote-join dance at once). For prior art in that department, we should look at the code that handles incoming events received over federation (https://github.com/matrix-org/synapse/blob/v1.69.0rc2/synapse/federation/federation_server.py#L1108-L1116), which uses a database row to hold a lock: we can simply call try_acquire_lock before starting a resync operation.

That still leaves us with the problem of making sure we resume the partial-state resync if the client reader that is currently processing it gets restarted (or, worse, turned off, never to return). Again following the example of incoming events: in that case, we kick off a processing job as soon as a worker discovers itself to be a "federation inbound" worker by receiving a /send request. Probably we could do the same here on a /_matrix/client/v3/rooms/.*/(send|join|invite|leave|ban|unban|kick) request?
— matrix-org/synapse#12994 (comment)

matrixbot closed this as completed Dec 20, 2023

matrixbot changed the title ~~Dummy issue~~ Faster remote room joins: Support partial join re-syncing on workers other than the master Dec 21, 2023

matrixbot added A-Federated-Join T-Enhancement labels Dec 21, 2023

matrixbot reopened this Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster remote room joins: Support partial join re-syncing on workers other than the master #14544

Faster remote room joins: Support partial join re-syncing on workers other than the master #14544

matrixbot commented Dec 20, 2023 •

edited

Loading

Faster remote room joins: Support partial join re-syncing on workers other than the master #14544

Faster remote room joins: Support partial join re-syncing on workers other than the master #14544

Comments

matrixbot commented Dec 20, 2023 • edited Loading

matrixbot commented Dec 20, 2023 •

edited

Loading