getting rid of the replication "stream" tables #13456

richvdh · 2022-08-04T14:13:24Z

Currently we have a number of tables in the database that exist only to record data for the replication streams. These include:

cache_invalidation_stream_by_instance
current_state_delta_stream
ex_outlier_stream
presence_stream
push_rules_stream

(and there may well be others).

Essentially, whenever we record a change to a table that needs to be replicated to workers, we add a row to one of these tables; the rows are then used in one of two ways:

ReplicationStreamer._run_notifier_loop regularly polls them (via the *Stream._update_function methods) and sends out a NOTIFY over Redis pubsub with the data from the table.
If a worker gets disconnected from Redis (so misses notifications), it can catch up with any missed notification by reading the relevant table itself.

The reason we use this arrangement is twofold:

It allows workers which miss the memo (because they were disconnected from Redis) to catch up with anything they missed.
Since the Redis notifications are sent out asynchronously by _run_notifier_loop, it is possible for the "writing" process to abort between updating the database and sending the notification to Redis. Persisting the data in postgres ensures that we can replay anything that wasn't sent when we restart.

However, I assert that these extra tables are a source of complexity, as well as increased database I/O and storage (not least because we never clear them out (#5888)). Worse, whenever we need to add a new type of replication stream, we have to add a load of extra paraphenalia in the shape of a new stream table. It would be good to consider how to get rid of them.

The text was updated successfully, but these errors were encountered:

richvdh · 2022-08-04T14:34:22Z

We had some thoughts. Let's deal with the two usecases separately:

"Workers that miss the memo" could be dealt with by using Redis streams instead of pub/sub. (We'd need to kill off TCP replication (#11728).) Conceptually this moves the task of maintaining the "stream" table into Redis.

"Writing worker dies before notifying" could be dealt with by sending out the redis notification before committing the postgres transaction. This brings further problems, though - in particular, the receiving worker needs to know when it is actually safe to read the new data from the database.

We have two proposals for dealing with this new problem:

(h/t @squahtx): We could include the postgres txid_current() in the redis notification, and have the receiving process poll txid_status() until it completes. Interestingly this might also allow the worker to safely read from an asynchronous postgres replica.
Use Postgres locking. For example, have the writing worker take out a FOR UPDATE row lock, and the reading worker take a FOR SHARE lock. (That will work for updates, but not inserts. For inserts, provided you have an incrementing primary key, you can safely use the current mechanism of just polling the table asynchronously. Or you could use pg advisory locks)

ArtObr · 2022-09-06T20:56:38Z

Hello there, @richvdh
So if my synapse app is running in monolith mode, I don't need any of the above tables?
Would it be safe to just delete all of the content in that case? (e.g. DELETE * FROM cache_invalidation_stream_by_instance)

reivilibre added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 4, 2022

richvdh mentioned this issue Aug 5, 2022

cache_invalidation_stream_by_instance grows without bounds and causes slow startup #8269

Closed

MadLittleMods added the A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db label Sep 20, 2022

richvdh mentioned this issue Oct 10, 2022

Faster joins: support worker-mode deployments #12994

Closed

reivilibre mentioned this issue Nov 11, 2022

Faster joins: create an un-partial-stated events stream for notifying workers that events have been un-partial-stated #14418

Closed

This was referenced Feb 2, 2023

state_groups (& state_groups_state & state_group_edges) are not fully purged alongside the rooms #12821

Open

Streams tables are never cleared out #5888

Open

MadLittleMods added the Z-Cleanup Things we want to get rid of, but aren't actively causing pain label Apr 25, 2023

richvdh mentioned this issue Jul 4, 2023

we never clear out cache_invalidation_stream #3665

Closed

realtyem mentioned this issue Jul 5, 2023

Current state of Presence #15877

Open

matrixbot mentioned this issue Dec 21, 2023

getting rid of the replication "stream" tables element-hq/synapse#13456

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

getting rid of the replication "stream" tables #13456

getting rid of the replication "stream" tables #13456

richvdh commented Aug 4, 2022 •

edited

Loading

richvdh commented Aug 4, 2022 •

edited

Loading

ArtObr commented Sep 6, 2022

getting rid of the replication "stream" tables #13456

getting rid of the replication "stream" tables #13456

Comments

richvdh commented Aug 4, 2022 • edited Loading

richvdh commented Aug 4, 2022 • edited Loading

ArtObr commented Sep 6, 2022

richvdh commented Aug 4, 2022 •

edited

Loading

richvdh commented Aug 4, 2022 •

edited

Loading