device_inbox never gets emptied #3599

ara4n · 2018-07-24T23:37:24Z

it's up to 85M rows or so on matrix.org, which seems insane. similarly *_stream never seem to get purged, or device_federation_inbox/outbox

richvdh · 2018-07-25T06:11:51Z

The main problem is that it's hard to tell when we can delete to-device messages. They don't have an expiry time on them.

ukcb · 2018-07-27T14:36:00Z

I made a small test script for private use with postgresql. Let's see if it works. :-)

#!/usr/bin/perl

use strict;
use warnings;

use DBI;
use DateTime;

my $db_name = 'synapse';
my $db_host = '127.0.0.1';
my $db_port = '5432';
my $db_user = 'synapse';
my $db_pass = 'synapse';

my $time_to_keep_in_days = 7;

my $table_device_inbox = 'device_inbox';
my $table_time_indexer = 'device_inbox_timestamp_indexer';


my $datetime        = DateTime->today;
my $timestamp_today = $datetime->epoch;
$datetime->subtract( days => $time_to_keep_in_days );
my $timestamp_obsolete = $datetime->epoch;


my $dbh = DBI->connect( "dbi:Pg:dbname=$db_name;host=$db_host;port=$db_port",
    $db_user, $db_pass,
    { AutoCommit => 1, RaiseError => 1, PrintError => 1, PrintWarn => 0 } )
  or die $DBI::errstr;

$dbh->do("CREATE TABLE IF NOT EXISTS $table_time_indexer
            ( stream_id BIGINT,
              timestamp BIGINT,
              PRIMARY KEY (stream_id)
            )") or die $dbh->errstr;

my $sth = $dbh->prepare("SELECT stream_id FROM $table_device_inbox")
  or warn $dbh->errstr;
$sth->execute() or warn $dbh->errstr;

my $pool = {};
while ( my $ref = $sth->fetchrow_hashref() ) {
    $pool->{ $ref->{'stream_id'} } = 1;
}

$sth = $dbh->prepare("SELECT stream_id, timestamp FROM $table_time_indexer")
  or warn $dbh->errstr;
$sth->execute() or warn $dbh->errstr;

my $index = {};
while ( my $res = $sth->fetchrow_hashref() ) {
    $index->{ $res->{'stream_id'} } = $res->{'timestamp'};
}


foreach my $key ( keys $pool->%* ) {
    if ( $index->%{$key} ) {
        if ( $index->%{$key} < $timestamp_obsolete ) {
            delete_key_from_tables($key);
        }
    }
    else {
        insert_key_into_indexer($key);
    }
}


sub insert_key_into_indexer {
    my $key = shift;

    return if ( $key !~ / \A \d+ \z /smx );

    # insert key into table $table_time_indexer
    $sth = $dbh->prepare("INSERT INTO $table_time_indexer
                            ( stream_id, timestamp )
                          VALUES
                            ( ?, ? )
                         ") or warn $dbh->errstr;

    $sth->execute( $key, $timestamp_today ) or warn $dbh->errstr;
}


sub delete_key_from_tables {
    my $key = shift;

    return if ( $key !~ / \A \d+ \z /smx );

    # delete key in table $table_time_indexer
    $sth = $dbh->prepare("DELETE FROM $table_time_indexer
                          WHERE stream_id = ?
                         ") or warn $dbh->errstr;

    $sth->execute($key) or warn $dbh->errstr;

    # delete key in table $table_device_inbox
    $sth = $dbh->prepare("DELETE FROM $table_device_inbox
                          WHERE stream_id = ?
                         ") or warn $dbh->errstr;

    $sth->execute($key) or warn $dbh->errstr;
}

# eof

ukcb · 2018-07-28T10:19:52Z

It would be better if the table device_inbox already had a timestamp. That would make purging up a lot easier. Technically, that should not be a problem.

Midek · 2021-12-13T10:38:19Z

Is there currently any official solution to this?
device_inbox is currently the largest table in my DB, taking up 82GB

synapse=# SELECT nspname || '.' || relname AS "relation",
    pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
  FROM pg_class C
  LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
  WHERE nspname NOT IN ('pg_catalog', 'information_schema')
    AND C.relkind <> 'i'
    AND nspname !~ '^pg_toast'
  ORDER BY pg_total_relation_size(C.oid) DESC
  LIMIT 20;
               relation               | total_size 
--------------------------------------+------------
 public.device_inbox                  | 82 GB

aaronraimist · 2021-12-13T11:38:53Z

@Midek what version of Synapse are you on? Recent versions of Synapse should have deleted quite a bit

Midek · 2021-12-13T11:43:20Z

My synapse version is:

"version": "1.48.0"

If anything, i have noticed it started growing more rapidly quite recently (so perhaps after one of the more recent updates), but i cannot pinpoint exactly when, since the table was always quite large.

//edit
Is there some other info i can provide to help debug this?
I have 3770 users not counting the ones from irc and discord bridge:

synapse=# select count(*) from users where name not like '%irc%'  and name  not like '%discord' and deactivated='0';
 count 
-------
  3770
(1 row)

device_inbox has exactly this many records:

synapse=# select count(*) from device_inbox;
  count   
----------
 66963488
(1 row)

And my synapse instance is using workers.

DMRobertson · 2021-12-14T18:59:16Z

@Midek there is a background update job in 1.47 (whose performance was improved in 1.48) which removes to_device messages for hidden and deleted devices. You could check to see what background updates are running, with select * from background_updates;. Perhaps something else, e.g. the event_arbitrary_relations job is running.

FWIW, device_inbox messages should get deleted once they're:

delivered to the client in a /sync
that client acknowledges receipt by requesting another /sync with the next_batch from the first /sync response. I would guess that to_device messages are accumulating for devices that aren't active at the moment---possibly in a large encrypted room?

You could try looking to see which devices have all these unread messages. Something like this, perhaps, but beware: I expect this query will take a while if it's got to scan 80GB of records.

select devices.user_id, count(*) 
from device_inbox join devices using(device_id) 
group by devices.user_id
order by count desc;

I would only start investigating down this line if there are no rows in background_updates; though.

Midek · 2021-12-14T19:36:40Z

Oh, i do have event_arbitrary_relations and remove_dead_devices_from_device_inbox in background_updates
I guess that might be why my database has been causing 3x higher load than usual for the past month.

synapse=# select * from background_updates;
              update_name              |                          progress_json                           | depends_on | ordering 
---------------------------------------+------------------------------------------------------------------+------------+----------
 remove_dead_devices_from_device_inbox | {}                                                               |            |     6508
 event_arbitrary_relations             | {"last_event_id":"$J31N5_iiqU9cEnXL5qXGbQsSop-X3uHAA1-5CSbWCEM"} |            |     6507
(2 rows)

So this is normal and should just fix itself after the background jobs are done, i guess?
Thanks for your response!

DMRobertson · 2021-12-14T20:01:04Z

So this is normal and should just fix itself after the background jobs are done, i guess?

I think things will get better after the remove_dead_devices_from_device_inbox job is done. But maybe not perfect: after all, to_device messages are kept on the HS so we can guarantee their delivery. Maybe there are lots of clients out there who haven't synced in a while and have a large inbox.

As @richvdh notes, there's no inherent mechanism for expiring these messages if the client doesn't acknowledge their receipt. Perhaps there ought to be? Unsure---there may be ramifications for E2EE)

Midek · 2021-12-22T21:40:29Z

The remove_dead_devices_from_device_inbox background job has finished, and instead of 66963488, there are 47965078 entries in the device_inbox table.
That's better, but still unproportionaly large to other tables in the db.
After running the query

select devices.user_id, count(*) 
from device_inbox join devices using(device_id) 
group by devices.user_id
order by count desc;

i have noticed that there are multiple users with over 1mln results.
Many of them seem inactive.
What would be the consequence of deleting those entries? Would that result in them simply being unable to decrypt some messages? Or can it lead to a more serious breakage?

acheng-floyd · 2022-06-04T03:57:41Z

I've notice that there's always delete operations of the two users on my server. These two users are shared by many people, it means many people may login at the same time，i've also noticed that these people login with element web client, is it the reason there are so many delete operations of different device with table "device_inbox"?

If these delete operations effect, it can't explain why the table still grows.

now this table increase to 78M. Could i just delete all the data in this table and restart synapse server? What will happy if i do this

richvdh · 2022-06-06T12:32:36Z

@acheng-floyd your questions are unrelated to this issue, but in short, those DELETE queries look like normal behaviour.

Deleting entries from device_inbox isn't particularly recommended; at the very least, it has the potential to cause problems with encrypted messaging for any currently-logged-in devices.

anoadragon453 · 2022-11-17T12:18:30Z

i have noticed that there are multiple users with over 1mln results.
Many of them seem inactive.
What would be the consequence of deleting those entries? Would that result in them simply being unable to decrypt some messages? Or can it lead to a more serious breakage?

If the recipient devices are deleted (there is an Admin API for doing so) then the corresponding to-device messages will be deleted as well. Likewise if the user is completely deactivated - which in turn erases all of their devices.

On just removing the entries from the table and leaving the devices in place however - there shouldn't be much that breaks other than the ability to decrypt past messages, no. If the ability to decrypt past messages is definitely not a concern for the accounts in question (know that sometimes people come back to their accounts after years of inactivity!), then feel free to remove them to save space.

While the rows in device_inbox do not have a timestamp, they do have a stream_id. This is a value that increments for every to-device message sent to all users. If you are cautious about preserving new messages and only want to delete older ones to save space, you could delete entries by partitioning them at the median stream_id for that device_id.

One last note: @DMRobertson is right in that to-device messages can accumulate for accounts that just sit in encrypted rooms all day, don't /sync and have to-device messages with session keys continuously sent to them. If you have a bot that is doing this and doesn't need to read (encrypted) messages, consider removing any encryption-enabled devices from that account (log out any devices that were created by logging in through an E2EE matrix client). Other clients will then no longer send you keys via to-device messages to decrypt room messages with. Or, just call /sync periodically while your bot is running to clear out the backlog.

And finally: encourage your users to remove log out old devices that they are no longer using. Chances are these are the ones that are racking up a backlog of to-device messages to begin with.

ara4n · 2023-01-04T11:35:37Z

it's now up to 3.6 billion rows...

kegsay · 2023-01-04T11:47:34Z

From my understanding of to device messages for sliding sync, the vast vast majority (94%) of these messages are m.room_key_request which come in two flavours: request and request cancellation. These events aiui are sent when a new device for the same user logs in and then requests keys for encrypted messages as a form of gossiping. It should be fine to delete these events and cause minimal disruption in the process. Best case is that the request was satisfied by another device and has now been cancelled which would cause zero impact and is what MSC3944 is advocating. Worst case the key isn't sent when requested and the new device cannot read that particular message, though if they actually used a proper key management solution e.g backing up keys this would not be needed.

ara4n · 2023-01-08T13:30:01Z

The disk wastage here is crazy. can I suggest we add a bg job to:

delete all key reqs/cancels older than a week or so
log out devices which have been idle for more than 3 months or so, for accts where the user is backing up their keys (to avoid risk of losing their keys) and keeping a single device around (the newest one) so they still have a hope of receiving msgs while they were gone?
- this would be easier if we had dehydrated devices properly rolled out, as all the non-dehydrated devices could be logged out, without fear of losing msgs
- this would also be easier with SS everywhere, given login wouldn't be frustratingly slow.

RhysRdm · 2023-03-13T14:13:27Z

Was there any resolution to this problem. I see the same thing on my install
Anything I can help with?

richvdh mentioned this issue Jul 25, 2018

device_inbox never gets emptied #3600

Closed

neilisfragile added z-p2 (Deprecated Label) z-minor (Deprecated Label) labels Aug 12, 2018

richvdh mentioned this issue Dec 3, 2019

Purge-API - delete old ciphertexts #3189

Closed

richvdh mentioned this issue Nov 2, 2020

we really shouldn't accept to_device messages for non-existent devices #3656

Closed

This was referenced Feb 8, 2021

to_device messages aimed at a local device are not removed after that device is deleted #9346

Closed

Synapse accepts to_device messages to hidden devices #9348

Closed

JohannesKleine added a commit to JohannesKleine/synapse that referenced this issue May 30, 2021

fix matrix-org#3599

e2fc0ac

JohannesKleine added a commit to JohannesKleine/synapse that referenced this issue Jun 1, 2021

Merge branch 'matrix-org:develop' into fix-matrix-org#3599

9f46c0f

This was referenced Oct 1, 2021

delete messages from device_inbox table when deleting device #10098

Closed

Delete messages from device_inbox table when deleting device #10969

Merged

dklimpel mentioned this issue Oct 27, 2021

Delete messages for hidden devices from device_inbox #11199

Merged

4 tasks

clokep added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. S-Minor Blocks non-critical functionality, workarounds exist. and removed z-p2 (Deprecated Label) z-minor (Deprecated Label) labels Feb 1, 2022

This comment was marked as off-topic.

Sign in to view

MadLittleMods added A-Disk-Space things which fill up the disk A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db labels Dec 8, 2022

ara4n mentioned this issue Jan 4, 2023

/send/toDevice should support TTL. matrix-org/matrix-spec#1394

Open

MadLittleMods mentioned this issue Jun 22, 2023

Ignore key request if the device inbox is already big #15808

Closed

4 tasks

This was referenced Aug 3, 2023

Implements part of MSC3944 by dropping cancelled&duplicated m.room_key_request #15842

Open

Implement MSC3944 to delete stale to-device messages #16082

Open

clokep mentioned this issue Oct 16, 2023

devices table has case-sensitive user_id which leads to not removing devices from user #16474

Closed

matrixbot mentioned this issue Dec 21, 2023

device_inbox never gets emptied element-hq/synapse#3599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

device_inbox never gets emptied #3599

device_inbox never gets emptied #3599

ara4n commented Jul 24, 2018

richvdh commented Jul 25, 2018

ukcb commented Jul 27, 2018 •

edited

Loading

ukcb commented Jul 28, 2018

Midek commented Dec 13, 2021

aaronraimist commented Dec 13, 2021

Midek commented Dec 13, 2021 •

edited

Loading

DMRobertson commented Dec 14, 2021

Midek commented Dec 14, 2021

DMRobertson commented Dec 14, 2021

Midek commented Dec 22, 2021

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

acheng-floyd commented Jun 4, 2022 •

edited

Loading

richvdh commented Jun 6, 2022

anoadragon453 commented Nov 17, 2022 •

edited

Loading

ara4n commented Jan 4, 2023

kegsay commented Jan 4, 2023

ara4n commented Jan 8, 2023 •

edited

Loading

RhysRdm commented Mar 13, 2023

device_inbox never gets emptied #3599

device_inbox never gets emptied #3599

Comments

ara4n commented Jul 24, 2018

richvdh commented Jul 25, 2018

ukcb commented Jul 27, 2018 • edited Loading

ukcb commented Jul 28, 2018

Midek commented Dec 13, 2021

aaronraimist commented Dec 13, 2021

Midek commented Dec 13, 2021 • edited Loading

DMRobertson commented Dec 14, 2021

Midek commented Dec 14, 2021

DMRobertson commented Dec 14, 2021

Midek commented Dec 22, 2021

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

acheng-floyd commented Jun 4, 2022 • edited Loading

richvdh commented Jun 6, 2022

anoadragon453 commented Nov 17, 2022 • edited Loading

ara4n commented Jan 4, 2023

kegsay commented Jan 4, 2023

ara4n commented Jan 8, 2023 • edited Loading

RhysRdm commented Mar 13, 2023

ukcb commented Jul 27, 2018 •

edited

Loading

Midek commented Dec 13, 2021 •

edited

Loading

acheng-floyd commented Jun 4, 2022 •

edited

Loading

anoadragon453 commented Nov 17, 2022 •

edited

Loading

ara4n commented Jan 8, 2023 •

edited

Loading