-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
/_matrix/federation/v1/state_ids/ is heavy #7893
Comments
see #6597: it's not obvious that this data is even useful. |
It seems now that we do want to keep this behaviour, but optimize the performance of it as per #6597 (comment) |
sorry, yes. This issue is about the database/CPU time taken on the server side; I've actually got a couple of patches which I need to PR to improve it. #6597 is about handling the response on the client side. |
If we send out an event which refers to `prev_events` which other servers in the federation are missing, then (after a round or two of backfill attempts), they will end up asking us for `/state_ids` at a particular point in the DAG. As per #7893, this is quite expensive, and we tend to see lots of very similar requests around the same time. We can therefore handle this much more efficiently by using a cache, which (a) ensures that if we see the same request from multiple servers (or even the same server, multiple times), then they share the result, and (b) any other servers that miss the initial excitement can also benefit from the work. [It's interesting to note that `/state` has a cache for exactly this reason. `/state` is now essentially unused and replaced with `/state_ids`, but evidently when we replaced it we forgot to add a cache to the new endpoint.]
If we send out an event which refers to `prev_events` which other servers in the federation are missing, then (after a round or two of backfill attempts), they will end up asking us for `/state_ids` at a particular point in the DAG. As per #7893, this is quite expensive, and we tend to see lots of very similar requests around the same time. We can therefore handle this much more efficiently by using a cache, which (a) ensures that if we see the same request from multiple servers (or even the same server, multiple times), then they share the result, and (b) any other servers that miss the initial excitement can also benefit from the work. [It's interesting to note that `/state` has a cache for exactly this reason. `/state` is now essentially unused and replaced with `/state_ids`, but evidently when we replaced it we forgot to add a cache to the new endpoint.]
this is still a problem; matrix.org's federation readers are still pinning the CPU and hammering the database regularly. |
This is exacerbated by rooms which have complex graphs. A single back-pagination request in |
(the above also highlights the fact that state_ids is returning way too much data: there will be a lot of duplication in the auth chains of those 51 backwards-extremities, which suggests we should spec an updated state_ids that returns less data.) |
or follow construct's example, and send the request with an |
@erikjohnston also suggests adding a cache to the middle of |
#9482 might help with this. I still feel like a lot of the problem here is that we tend to return a huge amount of redundant information, since the requesting server will have most of the |
The comment above stated:
#9482 improved CPU and DB utilization by an order of magnitude, and is shipping in 1.30.0rc1. Closing as resolved. 🎉 |
in a large room with long auth chains, each request to /_matrix/federation/v1/state_ids/ takes a lot of database time. The requesting servers tend to time out the requests, and then retry, compounding the problem.
The text was updated successfully, but these errors were encountered: