Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Prevent account_data content from being sent over TCP replication #6333

Merged
merged 4 commits into from
Nov 26, 2019

Conversation

anoadragon453
Copy link
Member

@anoadragon453 anoadragon453 commented Nov 5, 2019

There was an issue with changes to account_data not being reflected upon a /sync request from a client, specifically changes to m.direct. The client would send a PUT /.../m.direct, but never receive the new version down /sync.

This seemed to only happen on servers running in worker mode for large user accounts. It turns out that m.direct for large user accounts is huge, literally a dictionary for every user_id and room_id for a direct chat that your account has.

When a change is made on the master process, the updated account_data entry is replicated over to synchrotron workers, to tell them that they need to send down a new copy to clients when they ask for it. Unfortunately, the content of this replication was so large that twisted would deny it and drop the request:

Traceback (most recent call last):
  File "/home/ops/.synapse3/env3/lib/python3.5/site-packages/twisted/internet/defer.py", line 1418, in _inlineCallbacks
    result = g.send(result)
StopIteration: ([(505374, ('@andrewm:amorgan.xyz', None, 'm.direct', '{"@xxx:matrix.org": ["!WUAgSOAjsAMjDlDkhf:amorgan.xyz", "!uLpJrldJgpBjerWFJD:matrix.org", "!aSlnrTeGrTkKwziiij:matrix.org"], "@yyy:matrix.org": ["!yZHTGeDKZUeKaqeTeU:matrix.org"], <truncated>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ops/.synapse3/env3/lib/python3.5/site-packages/synapse/replication/tcp/resource.py", line 225, in _run_notifier_loop
    conn.stream_update(stream.NAME, token, row)
  File "/home/ops/.synapse3/env3/lib/python3.5/site-packages/synapse/replication/tcp/protocol.py", line 544, in stream_update
    self.send_command(RdataCommand(stream_name, token, data))
  File "/home/ops/.synapse3/env3/lib/python3.5/site-packages/synapse/replication/tcp/protocol.py", line 293, in send_command
    % (cmd.NAME, len(encoded_string), self.MAX_LENGTH)
Exception: Failed to send command RDATA as too long (21544 > 16384)

This means the synchrotron would never get the update, and never know to send the updated account_data entry to clients.

Instead, since the account_data cache on the synchrotron is already invalidated through replication, we can simply stop sending the content of account_data.m.direct over instead. Replications will, the synchrotron will know to pull m.direct data out of the database a fresh, and everything works again.

@anoadragon453 anoadragon453 self-assigned this Nov 5, 2019
@erikjohnston
Copy link
Member

To be honest I'd be tempted to only send the account_data key name, and not the contents. Workers should hopefully be able to pull that from the database as necessary. Ideally each item sent down the stream should be well bounded.

Throwing huge JSON blobs to workers is probably not going to help the performance any.

@anoadragon453
Copy link
Member Author

To be honest I'd be tempted to only send the account_data key name, and not the contents. Workers should hopefully be able to pull that from the database as necessary. Ideally each item sent down the stream should be well bounded.

Throwing huge JSON blobs to workers is probably not going to help the performance any.

Not a bad point. I'll check what's involved in that.

@anoadragon453
Copy link
Member Author

Looks like:

Returns:
Deferred[Tuple[List[Tuple[int, Any]], int]:
Resolves to a pair ``(updates, current_token)``, where ``updates`` is a
list of ``(token, row)`` entries. ``row`` will be json-serialised and
sent over the replication steam.

is where the data is taken out of the database. This seems like it's fundamental to a lot of things that get sent over replication, and might be kinda scary to change. Will investigate more tomorrow.

@anoadragon453 anoadragon453 requested a review from a team November 8, 2019 15:51
…_account_data_sync

* 'develop' of github.com:matrix-org/synapse:
  Blacklist PurgeRoomTestCase (#6361)
  Set room version default to 5
@anoadragon453 anoadragon453 changed the title Remove max line length constraint on TCP replication sends Prevent account_data content from being sent over TCP replication Nov 14, 2019
@anoadragon453 anoadragon453 merged commit a8175d0 into develop Nov 26, 2019
@anoadragon453 anoadragon453 deleted the anoa/fix_account_data_sync branch November 26, 2019 13:58
neilisfragile added a commit that referenced this pull request Dec 9, 2019
Synapse 1.7.0rc1 (2019-12-09)
=============================

Features
--------

- Implement per-room message retention policies. ([\#5815](#5815), [\#6436](#6436))
- Add etag and count fields to key backup endpoints to help clients guess if there are new keys. ([\#5858](#5858))
- Add `/admin/v2/users` endpoint with pagination. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#5925](#5925))
- Require User-Interactive Authentication for `/account/3pid/add`, meaning the user's password will be required to add a third-party ID to their account. ([\#6119](#6119))
- Implement the `/_matrix/federation/unstable/net.atleastfornow/state/<context>` API as drafted in MSC2314. ([\#6176](#6176))
- Configure privacy-preserving settings by default for the room directory. ([\#6354](#6354))
- Add ephemeral messages support by partially implementing [MSC2228](matrix-org/matrix-spec-proposals#2228). ([\#6409](#6409))
- Add support for [MSC 2367](matrix-org/matrix-spec-proposals#2367), which allows specifying a reason on all membership events. ([\#6434](#6434))

Bugfixes
--------

- Transfer non-standard power levels on room upgrade. ([\#6237](#6237))
- Fix error from the Pillow library when uploading RGBA images. ([\#6241](#6241))
- Correctly apply the event filter to the `state`, `events_before` and `events_after` fields in the response to `/context` requests. ([\#6329](#6329))
- Fix caching devices for remote users when using workers, so that we don't attempt to refetch (and potentially fail) each time a user requests devices. ([\#6332](#6332))
- Prevent account data syncs getting lost across TCP replication. ([\#6333](#6333))
- Fix bug: TypeError in `register_user()` while using LDAP auth module. ([\#6406](#6406))
- Fix an intermittent exception when handling read-receipts. ([\#6408](#6408))
- Fix broken guest registration when there are existing blocks of numeric user IDs. ([\#6420](#6420))
- Fix startup error when http proxy is defined. ([\#6421](#6421))
- Fix error when using synapse_port_db on a vanilla synapse db. ([\#6449](#6449))
- Fix uploading multiple cross signing signatures for the same user. ([\#6451](#6451))
- Fix bug which lead to exceptions being thrown in a loop when a cross-signed device is deleted. ([\#6462](#6462))
- Fix `synapse_port_db` not exiting with a 0 code if something went wrong during the port process. ([\#6470](#6470))
- Improve sanity-checking when receiving events over federation. ([\#6472](#6472))
- Fix inaccurate per-block Prometheus metrics. ([\#6491](#6491))
- Fix small performance regression for sending invites. ([\#6493](#6493))
- Back out cross-signing code added in Synapse 1.5.0, which caused a performance regression. ([\#6494](#6494))

Improved Documentation
----------------------

- Update documentation and variables in user contributed systemd reference file. ([\#6369](#6369), [\#6490](#6490))
- Fix link in the user directory documentation. ([\#6388](#6388))
- Add build instructions to the docker readme. ([\#6390](#6390))
- Switch Ubuntu package install recommendation to use python3 packages in INSTALL.md. ([\#6443](#6443))
- Write some docs for the quarantine_media api. ([\#6458](#6458))
- Convert CONTRIBUTING.rst to markdown (among other small fixes). ([\#6461](#6461))

Deprecations and Removals
-------------------------

- Remove admin/v1/users_paginate endpoint. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#5925](#5925))
- Remove fallback for federation with old servers which lack the /federation/v1/state_ids API. ([\#6488](#6488))

Internal Changes
----------------

- Add benchmarks for structured logging and improve output performance. ([\#6266](#6266))
- Improve the performance of outputting structured logging. ([\#6322](#6322))
- Refactor some code in the event authentication path for clarity. ([\#6343](#6343), [\#6468](#6468), [\#6480](#6480))
- Clean up some unnecessary quotation marks around the codebase. ([\#6362](#6362))
- Complain on startup instead of 500'ing during runtime when `public_baseurl` isn't set when necessary. ([\#6379](#6379))
- Add a test scenario to make sure room history purges don't break `/messages` in the future. ([\#6392](#6392))
- Clarifications for the email configuration settings. ([\#6423](#6423))
- Add more tests to the blacklist when running in worker mode. ([\#6429](#6429))
- Refactor data store layer to support multiple databases in the future. ([\#6454](#6454), [\#6464](#6464), [\#6469](#6469), [\#6487](#6487))
- Port synapse.rest.client.v1 to async/await. ([\#6482](#6482))
- Port synapse.rest.client.v2_alpha to async/await. ([\#6483](#6483))
- Port SyncHandler to async/await. ([\#6484](#6484))
erikjohnston added a commit that referenced this pull request Dec 13, 2019
Synapse 1.7.0 (2019-12-13)
==========================

This release changes the default settings so that only local authenticated users can query the server's room directory. See the [upgrade notes](UPGRADE.rst#upgrading-to-v170) for details.

Support for SQLite versions before 3.11 is now deprecated. A future release will refuse to start if used with an SQLite version before 3.11.

Administrators are reminded that SQLite should not be used for production instances. Instructions for migrating to Postgres are available [here](docs/postgres.md). A future release of synapse will, by default, disable federation for servers using SQLite.

No significant changes since 1.7.0rc2.

Synapse 1.7.0rc2 (2019-12-11)
=============================

Bugfixes
--------

- Fix incorrect error message for invalid requests when setting user's avatar URL. ([\#6497](#6497))
- Fix support for SQLite 3.7. ([\#6499](#6499))
- Fix regression where sending email push would not work when using a pusher worker. ([\#6507](#6507), [\#6509](#6509))

Synapse 1.7.0rc1 (2019-12-09)
=============================

Features
--------

- Implement per-room message retention policies. ([\#5815](#5815), [\#6436](#6436))
- Add etag and count fields to key backup endpoints to help clients guess if there are new keys. ([\#5858](#5858))
- Add `/admin/v2/users` endpoint with pagination. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#5925](#5925))
- Require User-Interactive Authentication for `/account/3pid/add`, meaning the user's password will be required to add a third-party ID to their account. ([\#6119](#6119))
- Implement the `/_matrix/federation/unstable/net.atleastfornow/state/<context>` API as drafted in MSC2314. ([\#6176](#6176))
- Configure privacy-preserving settings by default for the room directory. ([\#6355](#6355))
- Add ephemeral messages support by partially implementing [MSC2228](matrix-org/matrix-spec-proposals#2228). ([\#6409](#6409))
- Add support for [MSC 2367](matrix-org/matrix-spec-proposals#2367), which allows specifying a reason on all membership events. ([\#6434](#6434))

Bugfixes
--------

- Transfer non-standard power levels on room upgrade. ([\#6237](#6237))
- Fix error from the Pillow library when uploading RGBA images. ([\#6241](#6241))
- Correctly apply the event filter to the `state`, `events_before` and `events_after` fields in the response to `/context` requests. ([\#6329](#6329))
- Fix caching devices for remote users when using workers, so that we don't attempt to refetch (and potentially fail) each time a user requests devices. ([\#6332](#6332))
- Prevent account data syncs getting lost across TCP replication. ([\#6333](#6333))
- Fix bug: TypeError in `register_user()` while using LDAP auth module. ([\#6406](#6406))
- Fix an intermittent exception when handling read-receipts. ([\#6408](#6408))
- Fix broken guest registration when there are existing blocks of numeric user IDs. ([\#6420](#6420))
- Fix startup error when http proxy is defined. ([\#6421](#6421))
- Fix error when using synapse_port_db on a vanilla synapse db. ([\#6449](#6449))
- Fix uploading multiple cross signing signatures for the same user. ([\#6451](#6451))
- Fix bug which lead to exceptions being thrown in a loop when a cross-signed device is deleted. ([\#6462](#6462))
- Fix `synapse_port_db` not exiting with a 0 code if something went wrong during the port process. ([\#6470](#6470))
- Improve sanity-checking when receiving events over federation. ([\#6472](#6472))
- Fix inaccurate per-block Prometheus metrics. ([\#6491](#6491))
- Fix small performance regression for sending invites. ([\#6493](#6493))
- Back out cross-signing code added in Synapse 1.5.0, which caused a performance regression. ([\#6494](#6494))

Improved Documentation
----------------------

- Update documentation and variables in user contributed systemd reference file. ([\#6369](#6369), [\#6490](#6490))
- Fix link in the user directory documentation. ([\#6388](#6388))
- Add build instructions to the docker readme. ([\#6390](#6390))
- Switch Ubuntu package install recommendation to use python3 packages in INSTALL.md. ([\#6443](#6443))
- Write some docs for the quarantine_media api. ([\#6458](#6458))
- Convert CONTRIBUTING.rst to markdown (among other small fixes). ([\#6461](#6461))

Deprecations and Removals
-------------------------

- Remove admin/v1/users_paginate endpoint. Contributed by Awesome Technologies Innovationslabor GmbH. ([\#5925](#5925))
- Remove fallback for federation with old servers which lack the /federation/v1/state_ids API. ([\#6488](#6488))

Internal Changes
----------------

- Add benchmarks for structured logging and improve output performance. ([\#6266](#6266))
- Improve the performance of outputting structured logging. ([\#6322](#6322))
- Refactor some code in the event authentication path for clarity. ([\#6343](#6343), [\#6468](#6468), [\#6480](#6480))
- Clean up some unnecessary quotation marks around the codebase. ([\#6362](#6362))
- Complain on startup instead of 500'ing during runtime when `public_baseurl` isn't set when necessary. ([\#6379](#6379))
- Add a test scenario to make sure room history purges don't break `/messages` in the future. ([\#6392](#6392))
- Clarifications for the email configuration settings. ([\#6423](#6423))
- Add more tests to the blacklist when running in worker mode. ([\#6429](#6429))
- Refactor data store layer to support multiple databases in the future. ([\#6454](#6454), [\#6464](#6464), [\#6469](#6469), [\#6487](#6487))
- Port synapse.rest.client.v1 to async/await. ([\#6482](#6482))
- Port synapse.rest.client.v2_alpha to async/await. ([\#6483](#6483))
- Port SyncHandler to async/await. ([\#6484](#6484))
babolivier pushed a commit that referenced this pull request Sep 1, 2021
)

* commit 'a8175d0f9':
  lint
  Add changelog
  Remove content from being sent for account data rdata stream
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants