-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
get_auth_chain_difference is super expensive #8622
Comments
Another thought here is that we may be able to change the database schema to make it more efficient to query. I'm not sure what this looks like in practice, but could be something like splitting up the auth chains into larger blocks to make it easier to reduce the search space? Or representing the graph in a different format? |
This was done in #8672 |
I've been thinking and experimenting a bit with this, and so I'm going to jot down a few notes here. Having found a couple of examples of really slow calls to The current algorithm works by walking the the auth event graph in depth order, keeping track of which events are reachable from the different state sets. Looking at the implementation and some sample runs I don't see any obvious dramatic performance improvements, and so we will likely need to make changes to the algorithm to get the necessary improvements. Having searched around a bit I can't see anything in the literature that is directly applicable here, though there is quite a lot about efficient reachability lookups that is relevant. What we're likely looking at is adding some form of "index" that will helps us to more quickly compute the auth chain difference. Some of the key properties we're looking for here are:
http://www.cse.ust.hk/~dimitris/6311/L24-RI-Parchas.pdf has an interesting table comparing various algorithms and their performance vs index size properties, with rough overviews of each. It's worth noting that a lot of the performance characteristics depend on the "shape" of the graph, e.g. how sparse it is. My current avenue of attack is to note that the transitive reduction of the auth chain graphs appear to mostly be very "spikey", i.e. we have long streaks of events which only have a single incoming and a single outgoing edge. This naively feels like something that should be easy to index along the lines of "this event is part of the streak that starts at X and ends at Y", which would allow efficiently skipping to the end (rather than iteratively walking down the chain). In fact, this is very similar to the idea of a "chain cover" of a graph, where you partition the graph into "chains", like so (taken from the PDF above): By labelling the chains and adding sequence numbers you can:
The above can be used to generate the auth chain difference by for each state set computing the maximum sequence number in each chain that is reachable by that state set. The auth chain difference is then all events in each chain between the maximum and minimum sequence numbers reachable by each state set. This is easy in a world where we have index as described in 4 above, but is a bit harder if we only have an index of 3 above. It's not clear to me what the practical cost of 3 vs 4 would be right now. A simple way of computing a chain cover "online" would be to assign chains for each type/state key pair (though noting we may need multiple chains for a type/state key if there is a fork, but that should hopefully be rare). While it's impossible to generate the most efficient chain cover "online" (since it depends on the shape of the graph, which can theoretically change drastically as we get new events), we do know that long chains in the auth graph will most likely come from events of the same type/state key, and so basing the chains off type/state key should hopefully give us a decent chain covering. (Though we may be able to refine that a bit). Building up the reachability index when inserting a new state event should be a case of:
|
An implementation that uses the index from 4 takes about ~100ms on my laptop with postgres compared to ~1s with the old algorithm. So ~10x improvement. The index tables combined have about are about a third the size of |
Next steps here is to figure out how to handle existing rooms that don't have the index. A naive solution would be to default to the old algorithm for existing rooms, but that would suck for existing servers and rooms. |
The
get_auth_chain_difference
transaction can often take hundreds of seconds of database time (during which all other traffic to the room is queued). This is a large component of database load from state resolution.Caching the results of this transaction wouldn't be useful as each result is typically used only once (to populate the cache of state-resolution results), but could we cache some of the inputs which could get reused across multiple transactions (for example, the list of auth event ids for each event)?
The text was updated successfully, but these errors were encountered: