Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Improve performance of remove_hidden_devices_from_device_inbox #11420

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/11420.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Improve performance of various background updates related to device inboxes.
23 changes: 10 additions & 13 deletions synapse/storage/databases/main/deviceinbox.py
Original file line number Diff line number Diff line change
Expand Up @@ -737,31 +737,28 @@ async def _remove_hidden_devices_from_device_inbox(
def _remove_hidden_devices_from_device_inbox_txn(
txn: LoggingTransaction,
) -> int:
"""stream_id is not unique
we need to use an inclusive `stream_id >= ?` clause,
since we might not have deleted all hidden device messages for the stream_id
returned from the previous query
"""Retrieve the list of hidden devices that appear in the device_inbox table,
ordered by user_id so the query uses the index on the devices table.

Then delete only rows matching the `(user_id, device_id, stream_id)` tuple,
to avoid problems of deleting a large number of rows all at once
due to a single device having lots of device messages.
"""

last_stream_id = progress.get("stream_id", 0)
last_user_id = progress.get("mxid", "")

sql = """
SELECT device_id, user_id, stream_id
FROM device_inbox
FROM devices
INNER JOIN device_inbox USING (device_id, user_id)
WHERE
stream_id >= ?
AND (device_id, user_id) IN (
SELECT device_id, user_id FROM devices WHERE hidden = ?
)
ORDER BY stream_id
user_id >= ?
AND hidden = ?
ORDER BY user_id
LIMIT ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this limit will mean that we may not find all the device_inbox rows for a given user - we may only end up deleting half of them before deciding to move onto the next user.

I think a better strategy is to remove the join against device_inbox, and just look for hidden devices, without worrying about whether they have device_inbox rows. Then, do a DELETE FROM device_inbox for each such device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to do this initially, but the original docstring mentions that it might be a bit heavy if the device has tons of pending messages in its inbox. For example, on abolivier.bzh's database, I've got devices with over 60k rows in device_inbox (which aren't hidden devices but that's probably because I've already run the previous incarnation of this update), which sounds like a lot to delete in one go.

On top of that, I don't see how we might not delete entries for all devices for a given user, given the condition in the query is user_id >= ?, so if we don't do every device from a user we should just do the rest in the next run?

Copy link
Member

@richvdh richvdh Nov 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of that, I don't see how we might not delete entries for all devices for a given user, given the condition in the query is user_id >= ?, so if we don't do every device from a user we should just do the rest in the next run?

oh sorry, you're right.

However, from your analysis in the PR comment:

         ->  Index Only Scan using device_inbox_user_stream_id on device_inbox  (cost=0.54..100217.89 rows=109879 width=42) (actual time=0.141..92.477 rows=101229 loops=1)
               Heap Fetches: 6968

this is no good. It's a scan of the entirety of the device_inbox_user_stream_id index: there is no index cond here. It's acceptable for your local db with 100k rows, but won't be for a larger db.

Honestly, looking at this again, I think we're better off rewriting it again (sorry!) to do the same as #11421 (ie, walk through the device_inbox table for a sequence of stream_ids. Hell, why not combine it with #11421?

other ideas...

  • we can probably get away with deleting all the messages for a device at once, even if its in the 10s of thousands, though it gets really nasty given we'll be considering 100 devices on the first pass.
  • maybe a two-stage loop: first find a device, then work through a range of stream_ids for that device on each run of the bg update. That gets annoyingly fiddly in terms of tracking state, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you're right here, let's combine it with #11421

"""

txn.execute(sql, (last_stream_id, True, batch_size))
txn.execute(sql, (last_user_id, True, batch_size))
rows = txn.fetchall()

num_deleted = 0
Expand All @@ -782,7 +779,7 @@ def _remove_hidden_devices_from_device_inbox_txn(
self.REMOVE_HIDDEN_DEVICES,
{
"device_id": rows[-1][0],
"user_id": rows[-1][1],
"mxid": rows[-1][1],
"stream_id": rows[-1][2],
},
)
Expand Down