Replace HTTP replication with TCP replication #2069

erikjohnston · 2017-03-27T16:15:22Z

Apologies for the size of this PR, I hope however that each individual commit will be manageable and reviewable. A large portion of this PR should also be documentation of what's going on, as opposed to code.

In short, this PR replaces the HTTP long-polled based replication with a custom TCP protocol. The reasons why this isn't completely insane are that a TCP protocol better fits with what we're trying to do:

everything is fire and forget, nothing is request/response, nothing requires reliability (other than the streams themselves, which have in built reliability due to the tokens)
the master wants to be able to poke each worker at the same time when it has updates to send, but currently the master has to wait for the workers to request the data, making it much harder to not duplicate work across all workers

The major things that we need to be careful of when creating a TCP protocol:

Ensuring we quickly detect when the other side has gone away, this is done via periodic keep alives
Handling upstream congestion, if the remote can't keep up twisted will buffer bytes in memory. This can be handled by ensuring that we knife the connection if the buffers become too large.
Making tcp reconnects relatively seamless, in particular its important that the client can reconnect quickly to the server and fetch any updates to streams it missed. (Slight) back off and retries are important here.

We also use a different philosophy with what and how we send the updates. The HTTP API was designed to replicate entire "db rows" so that a new database could theoretically be constructed from the various streams, it was also self describing, i.e. we sent the column names for each stream we sent.
Conversely, the TCP protocol aims to send only the data that is needed, and usually that is just enough to know which caches to invalidate. The format of the updates are also not self describing, they are simply rows defined in synapse (as synapse only talks to itself, there is little reason to make the updates self describing)

An example exchange:

    * connection established *
    > SERVER localhost:8823
    > PING 1490197665618
    < NAME synapse.app.appservice
    < PING 1490197665618
    < REPLICATE events 1
    < REPLICATE caches 1
    > POSITION events 1
    > POSITION caches 1
    > RDATA caches 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513]
    > RDATA events 14 ["$149019767112vOHxz:localhost:8823", "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null]
    < PING 1490197675618
    > ERROR server stopping
    * connection closed by server *

(where the arrows indicate the direction of the data being sent, and are not part of the protocol)

A full description of the protocol can be found in synapse/replication/tcp/init.py
, protocol.py and commands.py

The new replication protocol will keep all the streams separate, rather than muxing multiple streams into one.

This defines the low level TCP replication protocol

The TCP replication protocol streams deltas of who has started or stopped syncing. This is different from the HTTP API which periodically sends the full list of users who are syncing. This commit adds support for the new TCP style of sending deltas.

As the TCP replication uses a slightly different API and streams than the HTTP replication. This breaks HTTP replication.

richvdh

this is too big a change for me to hold properly in my head, but generally it seems like a sane path.

richvdh · 2017-03-29T13:55:05Z

synapse/replication/tcp/__init__.py

+"""This module implements the TCP replication protocol used by synapse to
+communicate between the master process and its workers (when they're enabled).
+
+The protocol is based on fire and forget, line based commands. An example flow


I highly recommend moving the protocol docs out to separate rst files in docs. It'll be easier to find, and easier to read, there.

richvdh · 2017-03-29T14:04:07Z