Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Messages from certain remote users missing / being rejected #13716

Closed
tonkku107 opened this issue Sep 5, 2022 · 26 comments
Closed

Messages from certain remote users missing / being rejected #13716

tonkku107 opened this issue Sep 5, 2022 · 26 comments
Labels
O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter

Comments

@tonkku107
Copy link

tonkku107 commented Sep 5, 2022

Description

I've noticed people replying to events that don't exist for me and looking at the event source they are replying to specific users from the matrix.org homeserver. I am not receiving messages from those members but I do see them typing and their read receipts.
In the case of the log output below, they have been in the room for close to a month and the referenced m.room.member event can be found via element's dev tools, yet at the time of receiving the event synapse fails to calculate that as an auth_event.

Steps to reproduce

  • (it is currently unknown what causes users to get in this state)
  • receive a message from an affected user through federation
  • be confused when people are talking to others that you cannot see

Homeserver

tonkku.me

Synapse Version

1.66.0

Installation Method

Docker (matrixdotorg/synapse)

Platform

Running in docker containers on a server with Ubuntu 20.04.3, AMD Ryzen 5 3600, 64GB RAM.

Relevant log output

2022-09-04 16:12:40,328 - synapse.federation.federation_server - 1197 - INFO - _process_incoming_pdus_in_room_inner-1875 - handling received PDU in room !GtIfdsfQtQIgbQSxwJ:archlinux.org: <FrozenEventV3 event_id=$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU, type=m.room.message, state_key=None, outlier=False>
2022-09-04 16:12:40,350 - synapse.handlers.federation_event - 1806 - INFO - _process_incoming_pdus_in_room_inner-1875-$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU - event's auth_events are different to our calculated auth_events. Claimed but not calculated: [<FrozenEventV3 event_id=$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU, type=m.room.member, state_key=@rajabinkz:matrix.org, outlier=False>]. Calculated but not claimed: []
2022-09-04 16:12:40,350 - synapse.handlers.federation_event - 1826 - WARNING - _process_incoming_pdus_in_room_inner-1875-$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU - While checking auth of <FrozenEventV3 event_id=$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU, type=m.room.message, state_key=None, outlier=False> against room state before the event: 403: User @rajabinkz:matrix.org not in room !GtIfdsfQtQIgbQSxwJ:archlinux.org (None)
2022-09-04 16:12:40,368 - synapse.http.client - 446 - INFO - _process_incoming_pdus_in_room_inner-1875-$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU - Received response to POST http://generic-worker4:9114/_synapse/replication/fed_send_events/XtTHDlfEIH: 200

Anything else that would be useful to know?

No response

@DMRobertson
Copy link
Contributor

The key log line:

2022-09-04 16:12:40,350 - synapse.handlers.federation_event - 1826 - WARNING - _process_incoming_pdus_in_room_inner-1875-$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU - While checking auth of <FrozenEventV3 event_id=$9nEfMJ48sVmR_5zO1fjmNyaXWNVHSw1Go6CwAtSH3EU, type=m.room.message, state_key=None, outlier=False> against room state before the event: 403: User @rajabinkz:matrix.org not in room !GtIfdsfQtQIgbQSxwJ:archlinux.org (None)

Can you see any logs that mention $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU? (That event was created on the 12th of August this year.)

Additionally, what do the following database queries return?

SELECT COUNT(*) FROM events WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
SELECT COUNT(*) FROM event_json WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';

@DMRobertson DMRobertson added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. labels Sep 5, 2022
@tonkku107
Copy link
Author

Can you see any logs that mention $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU? (That event was created on the 12th of August this year.)

No, only that one log line above. I believe my log history doesn't go that far back since the containers have been recreated to update synapse.

Additionally, what do the following database queries return?

synapse=# SELECT COUNT(*) FROM events WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
 count 
-------
     1
(1 row)

synapse=# SELECT COUNT(*) FROM event_json WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
 count 
-------
     1
(1 row)

@squahtx
Copy link
Contributor

squahtx commented Sep 5, 2022

It's possible that the user's membership, $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU, has been marked as rejected for some reason or another, and so their messages are being marked as rejected too.

Can you run the following query to confirm?

SELECT event_id, type, outlier, rejection_reason FROM events WHERE event_id IN (SELECT '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU' UNION SELECT auth_id FROM event_auth WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU');

@tonkku107
Copy link
Author

It's possible that the user's membership, $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU, has been marked as rejected for some reason or another, and so their messages are being marked as rejected too.

Can you run the following query to confirm?

synapse=# SELECT event_id, type, outlier, rejection_reason FROM events WHERE event_id IN (SELECT '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU' UNION SELECT auth_id FROM event_auth WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU');
                   event_id                   |        type         | outlier | rejection_reason 
----------------------------------------------+---------------------+---------+------------------
 $XhK6dzOmcZ3bWdeIY5dgndPYynv81kV1KC3tAc4Mwqc | m.room.power_levels | f       | 
 $lYTZFGTzuDxSU0Hr6Jn6UhzA2UaBja2fiCbR7Y_YpNg | m.room.join_rules   | f       | 
 $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU | m.room.member       | f       | 
 $o-qWJk4LlSnwSr_N0WXu9z-kE4-je1H0i7p1I2scPP0 | m.room.create       | t       | 
(4 rows)

There seems to be no rejection reason

@DMRobertson
Copy link
Contributor

Odd. I wonder if you've been hit by a state reset (#8629) due to the weird corner (#12943). I'd like to do a little more investigation---let me concoct some queries for you to run.

In the worst case scenario, I'd suggest purging the room with the admin API and rejoining it.

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 8, 2022

@tonkku107, could you run this query and paste its output please?

SELECT current_state_delta_stream.* FROM events JOIN current_state_delta_stream ON events.stream_ordering = current_state_delta_stream.stream_id WHERE events.event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
SELECT * FROM current_state_events WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';

@tonkku107
Copy link
Author

synapse=# SELECT current_state_delta_stream.* FROM events JOIN current_state_delta_stream ON events.stream_ordering = current_state_delta_stream.stream_id WHERE events.event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
 stream_id |              room_id              |     type      |       state_key       |                   event_id                   | prev_event_id | instance_name 
-----------+-----------------------------------+---------------+-----------------------+----------------------------------------------+---------------+---------------
    628852 | !GtIfdsfQtQIgbQSxwJ:archlinux.org | m.room.member | @rajabinkz:matrix.org | $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU |               | master
(1 row)

synapse=# SELECT * FROM current_state_events WHERE event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU';
 event_id | room_id | type | state_key | membership 
----------+---------+------+-----------+------------
(0 rows)

I'm guessing that last one isn't supposed to return nothing

@DMRobertson
Copy link
Contributor

I'm guessing that last one isn't supposed to return nothing

Yeah. To help understand why, can you run this query?

WITH groups AS (
    SELECT state_group_edges.*
    FROM state_group_edges
        JOIN event_to_state_groups USING(state_group)
        JOIN events USING(event_id)
    WHERE events.event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU'
)
SELECT state_groups_state.*
FROM state_groups_state
    JOIN groups ON state_groups_state.state_group IN (groups.state_group, groups.prev_state_group)
;

@tonkku107
Copy link
Author

synapse=# WITH groups AS (
synapse(#     SELECT state_group_edges.*
synapse(#     FROM state_group_edges
synapse(#         JOIN event_to_state_groups USING(state_group)
synapse(#         JOIN events USING(event_id)
synapse(#     WHERE events.event_id = '$jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU'
synapse(# )
synapse-# SELECT state_groups_state.*
synapse-# FROM state_groups_state
synapse-#     JOIN groups ON state_groups_state.state_group IN (groups.state_group, groups.prev_state_group)
synapse-# ;
 state_group |              room_id              |     type      |        state_key        |                   event_id                   
-------------+-----------------------------------+---------------+-------------------------+----------------------------------------------
       64427 | !GtIfdsfQtQIgbQSxwJ:archlinux.org | m.room.member | @codingotaku:matrix.org | $LpwIBBp8Nnz7rXcsDno9cJztUuE13MtNT65JMV7TCPg
       64439 | !GtIfdsfQtQIgbQSxwJ:archlinux.org | m.room.member | @rajabinkz:matrix.org   | $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU
(2 rows)

@DMRobertson
Copy link
Contributor

Okay, looks like the event was accepted when it was received. Can you run this query to see if there any changes since then? (May take a while)

SELECT * FROM current_state_delta_stream WHERE room_id='!GtIfdsfQtQIgbQSxwJ:archlinux.org' AND type=' m.room.member' AND '@rajabinkz:matrix.org' ORDER BY stream_id ASC;

@tonkku107
Copy link
Author

synapse=# SELECT * FROM current_state_delta_stream WHERE room_id='!GtIfdsfQtQIgbQSxwJ:archlinux.org' AND type='m.room.member' AND state_key='@rajabinkz:matrix.org' ORDER BY stream_id ASC;
 stream_id |              room_id              |     type      |       state_key       |                   event_id                   |                prev_event_id                 |  instance_name  
-----------+-----------------------------------+---------------+-----------------------+----------------------------------------------+----------------------------------------------+-----------------
    628852 | !GtIfdsfQtQIgbQSxwJ:archlinux.org | m.room.member | @rajabinkz:matrix.org | $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU |                                              | master
    653360 | !GtIfdsfQtQIgbQSxwJ:archlinux.org | m.room.member | @rajabinkz:matrix.org |                                              | $jWCsQ39xw6rhTIPuRjHqh6KspCEmWzbycTKjqBfcEtU | generic_worker4
(2 rows)

I had to do some edits to the query, I believe this is what you wanted

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 9, 2022

Ahh yes, sorry. I think I screwed up my copying and pasting. (Thanks, the query is just as I should have typed it!)

What happened at stream_id 653360?

WITH groups AS (
    SELECT state_group_edges.*
    FROM state_group_edges
    WHERE state_group =  653360
)
SELECT state_groups_state.*
FROM state_groups_state
    JOIN groups ON state_groups_state.state_group IN (groups.state_group, groups.prev_state_group)
;

@tonkku107
Copy link
Author

synapse=# WITH groups AS (
synapse(#     SELECT state_group_edges.*
synapse(#     FROM state_group_edges
synapse(#     WHERE state_group =  653360
synapse(# )
synapse-# SELECT state_groups_state.*
synapse-# FROM state_groups_state
synapse-#     JOIN groups ON state_groups_state.state_group IN (groups.state_group, groups.prev_state_group)
synapse-# ;
 state_group | room_id | type | state_key | event_id 
-------------+---------+------+-----------+----------
(0 rows)

nothing 🤔

@DMRobertson
Copy link
Contributor

Sorry, I think I missed a JOIN and got the first WHERE clause wrong. What about:

WITH groups AS (
    SELECT state_group_edges.*
    FROM events
        JOIN event_to_state_groups USING(event_id)
        JOIN state_group_edges USING (state_group)
    WHERE stream_ordering=653360
)
SELECT state_groups_state.*
FROM state_groups_state
    JOIN groups ON state_groups_state.state_group IN (groups.state_group, groups.prev_state_group)
;

@tonkku107
Copy link
Author

Still nothing...

@DMRobertson
Copy link
Contributor

One last try then.

WITH RECURSIVE state(state_group) AS (
                    (SELECT state_group FROM events NATURAL JOIN event_to_state_groups WHERE stream_ordering = 653360)
                    UNION ALL
                    SELECT prev_state_group FROM state_group_edges e, state s
                    WHERE s.state_group = e.state_group
                )
                SELECT state_group, type, state_key, event_id
                FROM state_groups_state
                WHERE state_group IN (
                    SELECT state_group FROM state
                )
                ORDER BY state_group DESC, type, state_key 
;

@tonkku107
Copy link
Author

This one returned 16536 rows. Gonna upload the results as a file
query.txt
This looks like it's pretty much the entire room state

@erikjohnston
Copy link
Member

Thanks for bearing with us @tonkku107

Hmm, that query.txt looks to be showing a full state snapshot (rather than a delta, which it normally is). That makes sense actually, as (for reasons) we can't do deltas when a state entry is dropped (rather than added or replaced). But it does mean its much harder to confirm exactly what has happened.

Like @DMRobertson says, I think this will be due to either the "weird corner" and/or a state reset. The former has been fixed, and we're working on the latter.

Having all users on your server leave the room and rejoin should "fix" this for you. I'm going to close this issue for now in favour of the issues @DMRobertson linked.

@erikjohnston
Copy link
Member

(Obviously shout if leaving/rejoining doesn't work!)

@tonkku107
Copy link
Author

tonkku107 commented Sep 12, 2022

I am getting an error Current state for room !GtIfdsfQtQIgbQSxwJ:archlinux.org is empty while trying to rejoin. This could just be slow join stuff but the error is new to me.
I'm going to delete the room with the admin API but I am worried the bug #12821 will cause more problems
Update: did that and it seems to be having trouble getting missing prev_events and the room is missing a bunch of state, including name

@tonkku107
Copy link
Author

@erikjohnston things just got worse. I can't even leave or delete the room anymore... Leaving errors with No state group for unknown or outlier event $1TYkfbUXi0kFuHBqWO0gCpMBLMbbHLykLEoABwWXEqc

@DMRobertson
Copy link
Contributor

I'm going to delete the room with the admin API but I am worried the bug #12821 will cause more problems
Update: did that and it seems to be having trouble getting missing prev_events and the room is missing a bunch of state, including name

When you say "it seems to be having trouble getting missing prev_events", what is "it"?

Can you paste the curl command line you used to invoke the admin API? Make sure to omit or redact access tokens.

@tonkku107
Copy link
Author

tonkku107 commented Sep 16, 2022

When you say "it seems to be having trouble getting missing prev_events", what is "it"?

"it" is referring to synapse

Can you paste the curl command line you used to invoke the admin API? Make sure to omit or redact access tokens.

I used the synapse admin ui to delete the room

Edit: This was due to not restarting synapse before rejoining.

@tonkku107
Copy link
Author

This has just happened again and in the same room as well.
@erikjohnston @DMRobertson Would you like to investigate further or do I just keep purging and rejoining the room every couple months?

@DMRobertson
Copy link
Contributor

If you have logs illustrating the problem (e.g. describing an event id from matrix.org that's referred to, but isn't visible to you) we can take a quick look.

@DMRobertson DMRobertson reopened this Nov 29, 2022
@tonkku107
Copy link
Author

I should've saved a copy of the logs for this issue... Went and lost them this morning with a restart.
I can still get an event ID while I wait for new log lines to appear. The logs were equivalent to the ones in the original post though.
event ID: $W_-JJMIFONxHomK_8TrnTjFtaK5SEwebO_Ttz8YAARA

I also tried checking those SQL queries again yesterday and this time the response to the last query was empty (using the stream_ordering from this query: #13716 (comment) with the sender of the message above)

@DMRobertson DMRobertson removed their assignment Dec 5, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. X-Needs-Info This issue is blocked awaiting information from the reporter
Projects
None yet
Development

No branches or pull requests

4 participants