Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Optimise push action processing #13448

Open
Fizzadar opened this issue Aug 3, 2022 · 10 comments
Open

Optimise push action processing #13448

Fizzadar opened this issue Aug 3, 2022 · 10 comments
Labels
A-Performance Performance, both client-facing and admin-facing A-Push Issues related to push/notifications T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@Fizzadar
Copy link
Contributor

Fizzadar commented Aug 3, 2022

This query is responsible for a significant amount of load on our instance and I'd like to optimise it :)

My first (currently only :)) suggestion is to store the stream ordering of an event in the receipts_linearized table (probably as event_stream_ordering). I believe (will confirm at a later time) that this means both queries can be combined into a much simpler one along the lines of:

SELECT ep.event_id, ep.room_id, ep.stream_ordering, ep.actions,
   ep.highlight
 FROM event_push_actions AS ep
 LEFT JOIN receipts_linearized AS rl USING (room_id, event_id)
 WHERE
   ep.stream_ordering > rl.stream_ordering
   AND ep.user_id = ?
   AND ep.stream_ordering > ?
   AND ep.stream_ordering <= ?
   AND ep.notif = 1
 ORDER BY ep.stream_ordering ASC LIMIT ?

This would also optimise a bunch of other queries such as here and here.

@clokep
Copy link
Member

clokep commented Aug 3, 2022

My first (currently only :)) suggestion is to store the stream ordering of an event in the receipts_linearized table (probably as event_stream_ordering). I believe (will confirm at a later time) that this means both queries can be combined into a much simpler one along the lines of:

I was looking at this for some threading work and came to a similar conclusion that this would simplify a lot of queries by avoiding the join between receipts_linearized and events.

It could be interesting to see the query plan for the query you want to optimize though. Is there an index missing on it?

@Fizzadar
Copy link
Contributor Author

Fizzadar commented Aug 4, 2022

I don't think we're missing any indices, the query plan seems sensible to me - and during usual running it's not a problem. We have one particular large room that all users belong to and posting in there immediately triggers thousands of this query in parallel which causes heavy CPU & IO load. The plan:

synapse=> explain SELECT ep.event_id, ep.room_id, ep.stream_ordering, ep.actions,
   ep.highlight
 FROM (
   SELECT room_id,
       MAX(stream_ordering) as stream_ordering
   FROM events
   INNER JOIN receipts_linearized USING (room_id, event_id)
   WHERE receipt_type = 'm.read' AND user_id = '@fizzadar:beeper.com'
   GROUP BY room_id
) AS rl,
 event_push_actions AS ep
 WHERE
   ep.room_id = rl.room_id
   AND ep.stream_ordering > rl.stream_ordering
   AND ep.user_id = '@fizzadar:beeper.com'
   AND ep.notif = 1
 ORDER BY ep.stream_ordering ASC LIMIT 100;
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=170.18..170.19 rows=1 width=97)
   ->  Sort  (cost=170.18..170.19 rows=1 width=97)
         Sort Key: ep.stream_ordering
         ->  Nested Loop  (cost=166.94..170.17 rows=1 width=97)
               ->  GroupAggregate  (cost=166.52..166.54 rows=1 width=39)
                     Group Key: events.room_id
                     ->  Sort  (cost=166.52..166.52 rows=1 width=39)
                           Sort Key: events.room_id
                           ->  Nested Loop  (cost=1.25..166.51 rows=1 width=39)
                                 ->  Index Scan using receipts_linearized_user on receipts_linearized  (cost=0.56..49.88 rows=30 width=75)
                                       Index Cond: (user_id = '@fizzadar:beeper.com'::text)
                                       Filter: (receipt_type = 'm.read'::text)
                                 ->  Index Scan using events_event_id_key on events  (cost=0.70..3.88 rows=1 width=83)
                                       Index Cond: (event_id = receipts_linearized.event_id)
                                       Filter: (receipts_linearized.room_id = room_id)
               ->  Index Scan using event_push_actions_room_id_user_id on event_push_actions ep  (cost=0.42..3.61 rows=1 width=97)
                     Index Cond: ((room_id = events.room_id) AND (user_id = '@fizzadar:beeper.com'::text))
                     Filter: ((notif = 1) AND (stream_ordering > (max(events.stream_ordering))))
(18 rows)

I think adding the extra column is worthwhile, it opens the door for a few different optimisations.

@Fizzadar
Copy link
Contributor Author

Fizzadar commented Aug 4, 2022

The other optimisation that comes to mind, particularly for rooms like this, is process by room not by user but that’s a huge change in how things work and probably not worth it!

@H-Shay H-Shay added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Aug 10, 2022
@Fizzadar
Copy link
Contributor Author

Been poking around this some more, one potential issue is read receipts don't always have a matching event in the DB, I suspect this is possible over federation that the read receipt arrives before the underlying event. This makes the above approach tricky, maybe impossible...

@anoadragon453
Copy link
Member

Adding to to-discuss for the team to consider as another method to optimise event push action processing.

@erikjohnston
Copy link
Member

That query plan looks like its pulling out all receipts for a user. We should be able to refactor it to only check receipts for rooms that have push actions in the rage, which I think should speed it up a bunch?

@Fizzadar
Copy link
Contributor Author

That query plan looks like its pulling out all receipts for a user. We should be able to refactor it to only check receipts for rooms that have push actions in the rage, which I think should speed it up a bunch?

I think it still needs all the receipts because we need to check for the absense of a receipt for rooms the user is not (yet) a member of but should receive notifications for?

The initial refactor in #13597 has simplified the queries quite significantly, but still relies on pulling all the receipts for the user. If this were limited to the stream ordering being processed it would be impossible to know if a missing receipt was because the user has read beyond this point or has yet to join the room I think.

@Fizzadar
Copy link
Contributor Author

Fizzadar commented Sep 2, 2022

Been poking around this some more, one potential issue is read receipts don't always have a matching event in the DB, I suspect this is possible over federation that the read receipt arrives before the underlying event. This makes the above approach tricky, maybe impossible...

Having now thought about this part, this is incorrect - if a client is posting a read receipt the event must have come from the clients own homeserver. So unless the client is sending in dodgy event IDs the event should exist on the HS side. #13703 implements adding the event stream ordering into the receipts table.

@MadLittleMods MadLittleMods added A-Push Issues related to push/notifications A-Performance Performance, both client-facing and admin-facing labels Sep 20, 2022
@Fizzadar
Copy link
Contributor Author

Progress update: we are using #13918 in our fork and this has yielded significant improvement to the IOPs at all times and in particular during posts in large rooms.

Next up to this is improving the counting of unread actions used when sending pushes: #13846

@clokep
Copy link
Member

clokep commented Feb 9, 2023

@Fizzadar Is this a duplicate of #13846? Edit: I guess not since you say that's "next" up.

Anyway, what's left to do here? Land #13918?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing A-Push Issues related to push/notifications T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

No branches or pull requests

6 participants