-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
State groups relation schema #7156
Comments
Maybe have a look at this tool: https://github.com/matrix-org/rust-synapse-compress-state |
I started createing a visualization here: #934 (comment) |
Yep, that tool is well known. But that only compresses referenced data - I run that periodically. If there would be documentation about the relations of the tables in the database, we as a community would be able to find bugs in the database handling. And especially we maintainers of old matrix server instances would be able to cleanup our database. I have several large datasets in that giant state_groups_state table which have a room_id which is not reverenced in the rooms table. |
Continuing in this process I had a look at the purge process, re-implemented it by rewriting all delete queries into select equivalents. And build a script around that, I had uploaded the script and the corresponding results into a gist https://gist.github.com/sargon/445ed23a471db609a165a816f2be6ce8 Looking at that data, its seems like there is a lot of data where entries in state_groups relate to events that does not exist in the database anymore, and throse entries relate to impressive set of entries in state_groups_state. But since my understanding of the database concept of synapse is limited, I am not able to decide if that data is garbage or if it still of some value. |
I'd also very much like this to get resolved or at least have instructions to clean up the db. I have a single user HS. The
I've measured this even after leaving all of the big rooms that i had previously joined and running the |
As a general rule, I would encourage people who want to understand the deepest darkest secrets of the database schema to drop by The particular question here isn't as simple as you might think; to answer it briefly: We need to be able to relatively quickly calculate the state of a room at any point in that room's history. In other words, we need to know the state of the room at each event in that room. This is done as follows: A sequence of events where the state is the same are grouped together into a
Now, if we stored all the room state for each So, most state groups have an entry in A full state group just records the event id for each piece of state in the room at that point. If anybody wants to write that up and submit it as a PR, feel free. I'm unlikely to be able to spend the time in polishing it myself and I don't think it should be a priority for the core team. |
(again: if you have further questions on the database schema, please bring them to #synapse-dev.) |
I've copied most of that stuff to https://github.com/matrix-org/synapse/wiki/State-Groups. |
We have several Homeservers whose state_groups_state table is filling up the database server. One of the servers I take care of has reached ~180GB in table size, see below
That size has nearly doubled since I last reported the size in #3364. To my knowledge there have been bugs contributing to this growth, e.g. #6566.
So to the question in need: Can someone of the core devteam please provide a relational schema or point me to the correct documentation, if I am blind, of the state tables. So I can forge/propose a query to cleanup the mess - even if we have to turn off the homeserver during that operation, but we have to get rid of this garbage data - sooner or later.
The text was updated successfully, but these errors were encountered: