/_matrix/federation/v1/state_ids/ is heavy #7893

richvdh · 2020-07-18T17:28:51Z

in a large room with long auth chains, each request to /_matrix/federation/v1/state_ids/ takes a lot of database time. The requesting servers tend to time out the requests, and then retry, compounding the problem.

richvdh · 2020-07-18T17:46:25Z

see #6597: it's not obvious that this data is even useful.

anoadragon453 · 2020-07-20T18:01:35Z

It seems now that we do want to keep this behaviour, but optimize the performance of it as per #6597 (comment)

anoadragon453 · 2020-07-20T18:01:55Z

@richvdh Do you think it worth keeping this issue open if we have #6597?

richvdh · 2020-07-21T12:01:37Z

sorry, yes. This issue is about the database/CPU time taken on the server side; I've actually got a couple of patches which I need to PR to improve it.

#6597 is about handling the response on the client side.

If we send out an event which refers to `prev_events` which other servers in the federation are missing, then (after a round or two of backfill attempts), they will end up asking us for `/state_ids` at a particular point in the DAG. As per #7893, this is quite expensive, and we tend to see lots of very similar requests around the same time. We can therefore handle this much more efficiently by using a cache, which (a) ensures that if we see the same request from multiple servers (or even the same server, multiple times), then they share the result, and (b) any other servers that miss the initial excitement can also benefit from the work. [It's interesting to note that `/state` has a cache for exactly this reason. `/state` is now essentially unused and replaced with `/state_ids`, but evidently when we replaced it we forgot to add a cache to the new endpoint.]

richvdh · 2020-07-29T16:58:35Z

Hopefully this is improved by #7931 and #7930.

richvdh · 2020-08-28T09:21:49Z

this is still a problem; matrix.org's federation readers are still pinning the CPU and hammering the database regularly.

richvdh · 2020-09-28T21:22:28Z

This is exacerbated by rooms which have complex graphs. A single back-pagination request in !YynUnYHpqlHuoTAjsp:matrix.org on matrix.org introduced 51 new backwards-extremities, each of which required a state_ids request to kde.org, each of which takes 3 minutes of database time.

richvdh · 2020-09-28T21:35:56Z

(the above also highlights the fact that state_ids is returning way too much data: there will be a lot of duplication in the auth chains of those 51 backwards-extremities, which suggests we should spec an updated state_ids that returns less data.)

richvdh · 2020-09-28T23:26:00Z

we should spec an updated state_ids that returns less data

or follow construct's example, and send the request with an auth_chain_ids=0 query param, to indicate we don't need any of the auth chain ids at all.

richvdh · 2020-09-29T09:43:32Z

@erikjohnston also suggests adding a cache to the middle of get_auth_chain_ids and caching intermediate results.

richvdh · 2021-02-24T19:48:12Z

#9482 might help with this. I still feel like a lot of the problem here is that we tend to return a huge amount of redundant information, since the requesting server will have most of the auth_events already.

callahad · 2021-03-19T16:00:04Z

The comment above stated:

This issue is about the database/CPU time taken on the server side

#9482 improved CPU and DB utilization by an order of magnitude, and is shipping in 1.30.0rc1. Closing as resolved. 🎉

anoadragon453 added p1 A-Performance Performance, both client-facing and admin-facing labels Jul 20, 2020

richvdh mentioned this issue Jul 21, 2020

Put a cache on /state_ids #7931

Merged

richvdh closed this as completed Jul 29, 2020

richvdh reopened this Aug 28, 2020

richvdh mentioned this issue Aug 28, 2020

Unrejectable invite from oftc room #8091

Closed

erikjohnston mentioned this issue Oct 15, 2020

Federation /event is slow for large rooms with non-shared history #8554

Open

This was referenced Feb 24, 2021

federation catchup is slow and network-heavy #9492

Open

Use chain cover index in get_auth_chain_ids #9482

Closed

callahad closed this as completed Mar 19, 2021

clokep mentioned this issue Nov 22, 2022

/state_ids is slow to respond #13620

Open

MadLittleMods added the A-Federation label Jan 31, 2023

matrixbot mentioned this issue Dec 21, 2023

federation catchup is slow and network-heavy element-hq/synapse#9492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/_matrix/federation/v1/state_ids/ is heavy #7893

/_matrix/federation/v1/state_ids/ is heavy #7893

richvdh commented Jul 18, 2020

richvdh commented Jul 18, 2020

anoadragon453 commented Jul 20, 2020

anoadragon453 commented Jul 20, 2020

richvdh commented Jul 21, 2020

richvdh commented Jul 29, 2020

richvdh commented Aug 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 29, 2020

richvdh commented Feb 24, 2021

callahad commented Mar 19, 2021

/_matrix/federation/v1/state_ids/ is heavy #7893

/_matrix/federation/v1/state_ids/ is heavy #7893

Comments

richvdh commented Jul 18, 2020

richvdh commented Jul 18, 2020

anoadragon453 commented Jul 20, 2020

anoadragon453 commented Jul 20, 2020

richvdh commented Jul 21, 2020

richvdh commented Jul 29, 2020

richvdh commented Aug 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 28, 2020

richvdh commented Sep 29, 2020

richvdh commented Feb 24, 2021

callahad commented Mar 19, 2021