This repository has been archived by the owner on Apr 26, 2024. It is now read-only.
getting rid of the replication "stream" tables #13456
Labels
A-Database
DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db
T-Task
Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Z-Cleanup
Things we want to get rid of, but aren't actively causing pain
Currently we have a number of tables in the database that exist only to record data for the replication streams. These include:
cache_invalidation_stream_by_instance
current_state_delta_stream
ex_outlier_stream
presence_stream
push_rules_stream
(and there may well be others).
Essentially, whenever we record a change to a table that needs to be replicated to workers, we add a row to one of these tables; the rows are then used in one of two ways:
ReplicationStreamer._run_notifier_loop
regularly polls them (via the*Stream._update_function
methods) and sends out aNOTIFY
over Redis pubsub with the data from the table.The reason we use this arrangement is twofold:
_run_notifier_loop
, it is possible for the "writing" process to abort between updating the database and sending the notification to Redis. Persisting the data in postgres ensures that we can replay anything that wasn't sent when we restart.However, I assert that these extra tables are a source of complexity, as well as increased database I/O and storage (not least because we never clear them out (#5888)). Worse, whenever we need to add a new type of replication stream, we have to add a load of extra paraphenalia in the shape of a new stream table. It would be good to consider how to get rid of them.
The text was updated successfully, but these errors were encountered: