-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Optimise push action processing #13448
Comments
I was looking at this for some threading work and came to a similar conclusion that this would simplify a lot of queries by avoiding the join between It could be interesting to see the query plan for the query you want to optimize though. Is there an index missing on it? |
I don't think we're missing any indices, the query plan seems sensible to me - and during usual running it's not a problem. We have one particular large room that all users belong to and posting in there immediately triggers thousands of this query in parallel which causes heavy CPU & IO load. The plan: synapse=> explain SELECT ep.event_id, ep.room_id, ep.stream_ordering, ep.actions,
ep.highlight
FROM (
SELECT room_id,
MAX(stream_ordering) as stream_ordering
FROM events
INNER JOIN receipts_linearized USING (room_id, event_id)
WHERE receipt_type = 'm.read' AND user_id = '@fizzadar:beeper.com'
GROUP BY room_id
) AS rl,
event_push_actions AS ep
WHERE
ep.room_id = rl.room_id
AND ep.stream_ordering > rl.stream_ordering
AND ep.user_id = '@fizzadar:beeper.com'
AND ep.notif = 1
ORDER BY ep.stream_ordering ASC LIMIT 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=170.18..170.19 rows=1 width=97)
-> Sort (cost=170.18..170.19 rows=1 width=97)
Sort Key: ep.stream_ordering
-> Nested Loop (cost=166.94..170.17 rows=1 width=97)
-> GroupAggregate (cost=166.52..166.54 rows=1 width=39)
Group Key: events.room_id
-> Sort (cost=166.52..166.52 rows=1 width=39)
Sort Key: events.room_id
-> Nested Loop (cost=1.25..166.51 rows=1 width=39)
-> Index Scan using receipts_linearized_user on receipts_linearized (cost=0.56..49.88 rows=30 width=75)
Index Cond: (user_id = '@fizzadar:beeper.com'::text)
Filter: (receipt_type = 'm.read'::text)
-> Index Scan using events_event_id_key on events (cost=0.70..3.88 rows=1 width=83)
Index Cond: (event_id = receipts_linearized.event_id)
Filter: (receipts_linearized.room_id = room_id)
-> Index Scan using event_push_actions_room_id_user_id on event_push_actions ep (cost=0.42..3.61 rows=1 width=97)
Index Cond: ((room_id = events.room_id) AND (user_id = '@fizzadar:beeper.com'::text))
Filter: ((notif = 1) AND (stream_ordering > (max(events.stream_ordering))))
(18 rows) I think adding the extra column is worthwhile, it opens the door for a few different optimisations. |
The other optimisation that comes to mind, particularly for rooms like this, is process by room not by user but that’s a huge change in how things work and probably not worth it! |
Been poking around this some more, one potential issue is read receipts don't always have a matching event in the DB, I suspect this is possible over federation that the read receipt arrives before the underlying event. This makes the above approach tricky, maybe impossible... |
Adding to to-discuss for the team to consider as another method to optimise event push action processing. |
That query plan looks like its pulling out all receipts for a user. We should be able to refactor it to only check receipts for rooms that have push actions in the rage, which I think should speed it up a bunch? |
I think it still needs all the receipts because we need to check for the absense of a receipt for rooms the user is not (yet) a member of but should receive notifications for? The initial refactor in #13597 has simplified the queries quite significantly, but still relies on pulling all the receipts for the user. If this were limited to the stream ordering being processed it would be impossible to know if a missing receipt was because the user has read beyond this point or has yet to join the room I think. |
Having now thought about this part, this is incorrect - if a client is posting a read receipt the event must have come from the clients own homeserver. So unless the client is sending in dodgy event IDs the event should exist on the HS side. #13703 implements adding the event stream ordering into the receipts table. |
This query is responsible for a significant amount of load on our instance and I'd like to optimise it :)
My first (currently only :)) suggestion is to store the stream ordering of an event in the
receipts_linearized
table (probably asevent_stream_ordering
). I believe (will confirm at a later time) that this means both queries can be combined into a much simpler one along the lines of:This would also optimise a bunch of other queries such as here and here.
The text was updated successfully, but these errors were encountered: