Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Replace HTTP replication with TCP replication (Server side part) #2082

Merged
merged 17 commits into from
Apr 4, 2017

Conversation

erikjohnston
Copy link
Member

@erikjohnston erikjohnston commented Mar 30, 2017

This is the server side component of #2069, including the requested changes.

Docs rendered

The new replication protocol will keep all the streams separate, rather
than muxing multiple streams into one.
This defines the low level TCP replication protocol
The TCP replication protocol streams deltas of who has started or
stopped syncing. This is different from the HTTP API which periodically
sends the full list of users who are syncing. This commit adds support
for the new TCP style of sending deltas.
"""Returns a deferred which resolves when there is new data for
replication to handle.
"""
return self.replication_deferred.observe()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be nicer to let the replication resource register a callback rather than bouncing through a deferred?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I think it would. it may be more efficient too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise I think it needs a make_deferred_yieldable.

connections if there are.

This should get called each time new data is available, even if it
is currently being executed, so that nothing gets missed
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be a lot clearer if this function looped until there were no changes? Although that is less immediately obvious is correct.

It may be more obvious if we can register this function directly with the notifer, rather than bouncing via a deferred in a loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to registering a callback
-1 to looping until there are no changes. It will complicate this function and I don't think it will make anything clearer.

The example shows the server accepting a new connection and sending its identity
with the ``SERVER`` command, followed by the client asking to subscribe to the
``events`` stream from the token ``53``. The server then periodically sends ``RDATA``
commands which have the format ``RDATA <stream_name> <token> <row>```, where the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excess `

Motivation
----------

The HTTP API used long poll from the workers to the master, this has the problem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is going to look out of date real soon. I would go straight for:

Previously the workers used an HTTP long poll mechanism to get updates from the master, which had the problem of causing a lot of duplicate work on the server. This TCP protocol replaces those APIs with the aim of increased efficiency.

[or something]

TCP Replication
===============

This describes the TCP replication protocol that replaces the HTTP protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get rid of this line, it's too vague to be useful


Blank lines are ignored.


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be nice to give a complete list of the commands here, with the command syntax, the direction of transmission, a quick summary and a reference to the section where it is explained in more detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Reliability
~~~~~~~~~~~

In general the replication stream should be consisdered an unreliable transport
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consisdered

connections if there are.

This should get called each time new data is available, even if it
is currently being executed, so that nothing gets missed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to registering a callback
-1 to looping until there are no changes. It will complicate this function and I don't think it will make anything clearer.

> RDATA events 55 ["$foo4:bar.com", ...]

The example shows the server accepting a new connection and sending its identity
with the `SERVER` command, followed by the client asking to subscribe to the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not going to insist you go through and change them, but for future reference: AIUI docstrings are interpreted as RST and really ought to have double-backticks.

"""Get all the pushers that have changed between the given tokens.

Returns:
list(tuple): each tuple consists of:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it returns a Deferred, which resolves to that lot.

(I'd prefer it if we even put this on @defer.inlineCallbacks, but I won't insist there because it's "obvious". Here though, there is no clue that it actually returns a Deferred, rather than a list.)

"""Returns a deferred which resolves when there is new data for
replication to handle.
"""
return self.replication_deferred.observe()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise I think it needs a make_deferred_yieldable.

True then limit is provided, otherwise it's not.

Returns:
list(tuple): the first entry in the tuple is the token for that
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is valid for this function to return a Deferred, say so.

@richvdh richvdh assigned erikjohnston and unassigned richvdh Mar 30, 2017
@erikjohnston
Copy link
Member Author

erikjohnston commented Mar 31, 2017

I think I've addressed all the issues raised.

I also cheekily added a <last_sync_ms> to the USER_SYNC command (which is the last sync time), as I think a) its more correct and b) is required if we want to batch up the USER_SYNC stuff on the workers to reduce traffic.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, modulo the conflict

@richvdh richvdh assigned erikjohnston and unassigned richvdh Apr 3, 2017
@erikjohnston erikjohnston merged commit 27cc627 into develop Apr 4, 2017
psaavedra added a commit to psaavedra/synapse that referenced this pull request May 19, 2017
Changes in synapse v0.21.0 (2017-05-18)
=======================================

No changes since v0.21.0-rc3

Changes in synapse v0.21.0-rc3 (2017-05-17)
===========================================

Features:

* Add per user rate-limiting overrides (PR matrix-org#2208)
* Add config option to limit maximum number of events requested by ``/sync``
  and ``/messages`` (PR matrix-org#2221) Thanks to @psaavedra!

Changes:

* Various small performance fixes (PR matrix-org#2201, matrix-org#2202, matrix-org#2224, matrix-org#2226, matrix-org#2227, matrix-org#2228,
  matrix-org#2229)
* Update username availability checker API (PR matrix-org#2209, matrix-org#2213)
* When purging, don't de-delta state groups we're about to delete (PR matrix-org#2214)
* Documentation to check synapse version (PR matrix-org#2215) Thanks to @hamber-dick!
* Add an index to event_search to speed up purge history API (PR matrix-org#2218)

Bug fixes:

* Fix API to allow clients to upload one-time-keys with new sigs (PR matrix-org#2206)

Changes in synapse v0.21.0-rc2 (2017-05-08)
===========================================

Changes:

* Always mark remotes as up if we receive a signed request from them (PR matrix-org#2190)

Bug fixes:

* Fix bug where users got pushed for rooms they had muted (PR matrix-org#2200)

Changes in synapse v0.21.0-rc1 (2017-05-08)
===========================================

Features:

* Add username availability checker API (PR matrix-org#2183)
* Add read marker API (PR matrix-org#2120)

Changes:

* Enable guest access for the 3pl/3pid APIs (PR matrix-org#1986)
* Add setting to support TURN for guests (PR matrix-org#2011)
* Various performance improvements (PR matrix-org#2075, matrix-org#2076, matrix-org#2080, matrix-org#2083, matrix-org#2108,
  matrix-org#2158, matrix-org#2176, matrix-org#2185)
* Make synctl a bit more user friendly (PR matrix-org#2078, matrix-org#2127) Thanks @APwhitehat!
* Replace HTTP replication with TCP replication (PR matrix-org#2082, matrix-org#2097, matrix-org#2098,
  matrix-org#2099, matrix-org#2103, matrix-org#2014, matrix-org#2016, matrix-org#2115, matrix-org#2116, matrix-org#2117)
* Support authenticated SMTP (PR matrix-org#2102) Thanks @DanielDent!
* Add a counter metric for successfully-sent transactions (PR matrix-org#2121)
* Propagate errors sensibly from proxied IS requests (PR matrix-org#2147)
* Add more granular event send metrics (PR matrix-org#2178)

Bug fixes:

* Fix nuke-room script to work with current schema (PR matrix-org#1927) Thanks
  @zuckschwerdt!
* Fix db port script to not assume postgres tables are in the public schema
  (PR matrix-org#2024) Thanks @jerrykan!
* Fix getting latest device IP for user with no devices (PR matrix-org#2118)
* Fix rejection of invites to unreachable servers (PR matrix-org#2145)
* Fix code for reporting old verify keys in synapse (PR matrix-org#2156)
* Fix invite state to always include all events (PR matrix-org#2163)
* Fix bug where synapse would always fetch state for any missing event (PR matrix-org#2170)
* Fix a leak with timed out HTTP connections (PR matrix-org#2180)
* Fix bug where we didn't time out HTTP requests to ASes  (PR matrix-org#2192)

Docs:

* Clarify doc for SQLite to PostgreSQL port (PR matrix-org#1961) Thanks @benhylau!
* Fix typo in synctl help (PR matrix-org#2107) Thanks @HarHarLinks!
* ``web_client_location`` documentation fix (PR matrix-org#2131) Thanks @matthewjwolff!
* Update README.rst with FreeBSD changes (PR matrix-org#2132) Thanks @feld!
* Clarify setting up metrics (PR matrix-org#2149) Thanks @encks!
@erikjohnston erikjohnston deleted the erikj/repl_tcp_server branch October 26, 2017 11:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants