Federated room cannot be joined again when left and deleted before #9481

waclaw66 · 2021-02-24T12:57:09Z

Description

When a federated room (e.g. #synapse:matrix.org) is joined from self hosted federated homeserver, then left, deleted via API from self homeserver, then rejoin fails.

Steps to reproduce

Join some room (e.g. #synapse:matrix.org) from self hosted homeserver (same version of Synapse 1.27.0).
Leave that room
Delete that room using API /_synapse/admin/v1/rooms/[room_id]/delete
Try to join that room again
It fails with following error...

synapse.http.server: [POST-10040] Failed handle request via 'JoinRoomAliasServlet': <XForwardedForRequest at 0x7f57646fa970 method='POST' uri='/_matrix/client/r0/join/%23synapse%3Amatrix.org' clientproto='HTTP/1.1' site='8008'>
    Traceback (most recent call last):
      File "/usr/lib/python3.9/site-packages/synapse/http/server.py", line 252, in _async_render_wrapper
        callback_return = await self._async_render(request)
      File "/usr/lib/python3.9/site-packages/synapse/http/server.py", line 430, in _async_render
        callback_return = await raw_callback_return
      File "/usr/lib/python3.9/site-packages/synapse/rest/client/v1/room.py", line 301, in on_POST
        await self.room_member_handler.update_membership(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/room_member.py", line 333, in update_membership
        result = await self.update_membership_locked(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/room_member.py", line 549, in update_membership_locked
        remote_join_response = await self._remote_join(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/room_member.py", line 1091, in _remote_join
        event_id, stream_id = await self.federation_handler.do_invite_join(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/federation.py", line 1400, in do_invite_join
        max_stream_id = await self._persist_auth_tree(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/federation.py", line 2050, in _persist_auth_tree
        await self.persist_events_and_notify(
      File "/usr/lib/python3.9/site-packages/synapse/handlers/federation.py", line 2925, in persist_events_and_notify
        events, max_stream_token = await self.storage.persistence.persist_events(
      File "/usr/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 262, in persist_events
        ret_vals = await make_deferred_yieldable(
    twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_auth_chains_pkey"
    DETAIL:  Key (event_id)=($e9U026auDHIgaZPAqlblvPupACjl7jcZDblP970dJPs) already exists.

    /usr/lib/python3.9/site-packages/synapse/metrics/background_process_metrics.py:208:run
    --- <exception caught here> ---
    /usr/lib/python3.9/site-packages/synapse/storage/persist_events.py:172:handle_queue_loop
    /usr/lib/python3.9/site-packages/synapse/storage/persist_events.py:324:persisting_queue
    /usr/lib/python3.9/site-packages/synapse/storage/persist_events.py:532:_persist_events
    /usr/lib/python3.9/site-packages/synapse/storage/databases/main/events.py:171:_persist_events_and_state_updates
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:661:runInteraction
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:744:runWithConnection
    /usr/lib64/python3.9/site-packages/twisted/python/threadpool.py:250:inContext
    /usr/lib64/python3.9/site-packages/twisted/python/threadpool.py:266:<lambda>
    /usr/lib64/python3.9/site-packages/twisted/python/context.py:122:callWithContext
    /usr/lib64/python3.9/site-packages/twisted/python/context.py:85:callWithContext
    /usr/lib64/python3.9/site-packages/twisted/enterprise/adbapi.py:306:_runWithConnection
    /usr/lib64/python3.9/site-packages/twisted/python/compat.py:464:reraise
    /usr/lib64/python3.9/site-packages/twisted/enterprise/adbapi.py:297:_runWithConnection
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:739:inner_func
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:539:new_transaction
    /usr/lib/python3.9/site-packages/synapse/logging/utils.py:71:wrapped
    /usr/lib/python3.9/site-packages/synapse/storage/databases/main/events.py:379:_persist_events_txn
    /usr/lib/python3.9/site-packages/synapse/storage/databases/main/events.py:472:_persist_event_auth_chain_txn
    /usr/lib/python3.9/site-packages/synapse/storage/databases/main/events.py:630:_add_chain_cover_index
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:896:simple_insert_many_txn
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:274:execute_batch
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:319:_do_execute
    /usr/lib/python3.9/site-packages/synapse/storage/database.py:274:<lambda>
    /usr/lib64/python3.9/site-packages/psycopg2/extras.py:1209:execute_batch
    ]]

The room should be rejoined without any problem.

Version information

Homeserver: matrix.bolesiny.net
Version: 1.27.0
Install method: Fedora rpm package
Platform: Fedora 33

The text was updated successfully, but these errors were encountered:

waclaw66 · 2021-02-24T12:58:35Z

Related #9418

clokep · 2021-02-24T12:59:39Z

@waclaw66 This seems to be a duplicate of #9418. I'm going to close it unless you feel there's a reason that it is not.

waclaw66 · 2021-02-24T14:11:54Z

Maybe it's the same bug causing the problem, but those are different cases, which could lead to it.
You can close it in favour of #9418, there is a reference, in case the fix wouldn't work for my problem I can reopen it.

@waclaw66 This seems to be a duplicate of #9418. I'm going to close it unless you feel there's a reason that it is not.

clokep · 2021-02-24T14:42:09Z

Ah, I see that this is due to leaving while the other is due to a failure to join. 👍 Hopefully the fix is the same!

waclaw66 · 2021-02-25T06:32:16Z

Is there any hotfix? Leaving and joining the room again works properly, but when a room is deleted via API and joined again it crashes as shown. So there supposed to be some remain of that deleted room in database which causes that crash after joining it again.

clokep · 2021-02-25T16:00:52Z

When you deleted the room you also purged it I'm guessing?

I suspect that purge_room_state needs to be updated to remove from event_auth_chains also?

waclaw66 · 2021-02-25T16:06:15Z

Right, I purged the history, then I left and finally deleted it.

When you deleted the room you also purged it I'm guessing?

I suspect that purge_room_state needs to be updated to remove from event_auth_chains also?

waclaw66 · 2021-02-25T16:31:32Z

That's strange
select * from state_groups_state where room_id='!ehXvUhWNASUkSLvAGP:matrix.org'
gives me more than 300k records, but
POST http://localhost:8008/_synapse/admin/v1/purge_history/!ehXvUhWNASUkSLvAGP:matrix.org
with data
{ "delete_local_events": true, "purge_up_to_ts": 1614270664000 }
returns
{"errcode":"M_NOT_FOUND","error":"there is no event to be purged"}

clokep · 2021-02-25T16:54:24Z

purge_history is different than purging the room (it purges the events before a certain time). I was asking whether you gave {"purge": true} when calling /_synapse/admin/v1/rooms/[room_id]/delete.

waclaw66 · 2021-02-25T17:15:27Z

Understood, I used synapse-admin to delete the room finally.
It seems, it doesn't purge...
https://github.com/Awesome-Technologies/synapse-admin/blob/684c44e47057e42fb217688e08706e9f2e6767a7/src/synapse/dataProvider.js#L69

purge_history is different than purging the room (it purges the events before a certain time). I was asking whether you gave {"purge": true} when calling /_synapse/admin/v1/rooms/[room_id]/delete.

Bubu · 2021-02-25T17:26:20Z

Oh, interesting. FWIW, in #9418, I probably purged/deleted the room at some point in the past before trying to rejoin it (and then lrunning into this issue).

waclaw66 · 2021-02-25T17:42:53Z

The question is how to fix it. I suppose the patch from #9498 will fix it for the future. Do I need to run some sql commands to fix current state?

clokep · 2021-02-25T17:44:13Z

The question is how to fix it. I suppose the patch from #9498 will fix it for the future. Do I need to run some sql commands to fix current state?

My goal in there would be to fix it for previously purged rooms too, if possible. If not you can hopefully run some of the commands manually.

Oh, interesting. FWIW, in #9418, I probably purged/deleted the room at some point in the past before trying to rejoin it (and then lrunning into this issue).

Sounds like these are probably duplicates then.

waclaw66 · 2021-02-26T06:28:08Z

I have tested all those sql commands (delete replaced with select) from _purge_room_txn function. But no rows were found. Joining #synapse:matrix.org still fails with the same exception.
Then I replaced the whole purge_events.py in my Synapse server, still the same.
There are some other tables with reference to that room_id left, e.g. state_groups_state as I mentioned, then current_state_delta_stream, receipts_graph, receipts_linearized, state_groups, stream_ordering_to_exterm.

What causes this?

File "/usr/lib/python3.9/site-packages/synapse/storage/persist_events.py", line 262, in persist_events
        ret_vals = await make_deferred_yieldable(
    twisted.internet.defer.FirstError: FirstError[#0, [Failure instance: Traceback: <class 'psycopg2.errors.UniqueViolation'>: duplicate key value violates unique constraint "event_auth_chains_pkey"
    DETAIL:  Key (event_id)=($e9U026auDHIgaZPAqlblvPupACjl7jcZDblP970dJPs) already exists.

clokep · 2021-02-26T13:29:03Z

state_groups_state, state_groups, receipts_graph, receipts_linearized, and stream_ordering_to_exterm should have been deleted previously unless an error occurred when you deleted the room. 😕 I believe those have all been part of the purge for at least the last couple of years.

The code currently does not remove data from current_state_delta_stream, however and notes that we might want to.

waclaw66 · 2021-02-26T13:33:04Z

So can I try to purge those tables you mentioned manually with that room_id without any harm?

I think the exception

DETAIL:  Key (event_id)=($e9U026auDHIgaZPAqlblvPupACjl7jcZDblP970dJPs) already exists.

is the key to the problem. It seems primary key is created multiple times within table event_auth_chains for some reason.

state_groups_state, state_groups, receipts_graph, receipts_linearized, and stream_ordering_to_exterm should have been deleted previously unless an error occurred when you deleted the room. 😕 I believe those have all been part of the purge for at least the last couple of years.

The code currently does not remove data from current_state_delta_stream, however and notes that we might want to.

waclaw66 · 2021-02-26T13:54:07Z

There occured no error when the room was deleted, it happens for every federated room you leave, delete and re-join. It's not random.
I think the problem is that purge doesn't clean everything, and re-join doesn't count with some already existing database entries for that room. I tested it for more than 5 rooms.

should have been deleted previously unless an error occurred when you deleted the room

tleydxdy · 2021-02-28T15:11:36Z

Same problem here, I encountered this problem when using the irc bridge, I failed to join the #artix room, but my irc side is in the room. now all the irc rooms are broken because I tried a !quit, bad idea :(

clokep · 2021-03-03T16:04:57Z

Oops, we'll need the background update to fully fix this.

clokep · 2021-03-03T16:05:39Z

So can I try to purge those tables you mentioned manually with that room_id without any harm?

I think that should be OK. Note that some of them don't have a room_id column. You should definitely make a database backup before doing this.

waclaw66 · 2021-03-08T19:54:02Z

I can confirm that problem with leaving and deleting rooms is solved by Synapse 1.29.0. Tested on a room, that wasn't previously deleted by API.
I would really appriteated also a script to purge remains of those rooms deleted with Synapse < 1.28.0. Thanks.

MacLemon · 2021-03-08T22:30:07Z

We're seeing the same issue and have already updated to 1.29.0. Since it is cause by events in the past before running 1.29.0 the problem persists.

Documentation of a safe way to fix problems with existing rooms would be much appreciated. (If that requires direct access to synapse, the PostGreSQL database, or Admin-API, that would be a viable solution for us.)

clokep · 2021-03-09T12:17:58Z

I would really appriteated also a script to purge remains of those rooms deleted with Synapse < 1.28.0. Thanks.

This will happen automatically once #9542 is merged -- a background update should clean-up the data. Unfortunately this didn't make it into v1.29.0 and will be in v1.30.0.

clokep added the X-Needs-Info This issue is blocked awaiting information from the reporter label Feb 24, 2021

clokep added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Feb 25, 2021

clokep mentioned this issue Feb 25, 2021

Purge chain cover tables when purging events. #9498

Merged

clokep mentioned this issue Feb 25, 2021

psycopg2.errors.UniqueViolation: duplicate key value violates unique constraint "event_auth_chains_pkey" #9418

Closed

clokep self-assigned this Feb 25, 2021

clokep closed this as completed in #9498 Mar 3, 2021

clokep reopened this Mar 3, 2021

clokep mentioned this issue Mar 3, 2021

Add a background task to purge unused chain IDs. #9542

Merged

clokep mentioned this issue Mar 3, 2021

Add a basic test for purging rooms #9541

Merged

clokep closed this as completed in #9542 Mar 9, 2021

MadLittleMods added the A-Federated-Join joins over federation generally suck label Dec 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Federated room cannot be joined again when left and deleted before #9481

Federated room cannot be joined again when left and deleted before #9481

waclaw66 commented Feb 24, 2021

waclaw66 commented Feb 24, 2021

clokep commented Feb 24, 2021

waclaw66 commented Feb 24, 2021

clokep commented Feb 24, 2021

waclaw66 commented Feb 25, 2021 •

edited

Loading

clokep commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

clokep commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

Bubu commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

clokep commented Feb 25, 2021

waclaw66 commented Feb 26, 2021 •

edited

Loading

clokep commented Feb 26, 2021

waclaw66 commented Feb 26, 2021 •

edited

Loading

waclaw66 commented Feb 26, 2021 •

edited

Loading

tleydxdy commented Feb 28, 2021

clokep commented Mar 3, 2021

clokep commented Mar 3, 2021

waclaw66 commented Mar 8, 2021

MacLemon commented Mar 8, 2021

clokep commented Mar 9, 2021

Federated room cannot be joined again when left and deleted before #9481

Federated room cannot be joined again when left and deleted before #9481

Comments

waclaw66 commented Feb 24, 2021

Description

Steps to reproduce

Version information

waclaw66 commented Feb 24, 2021

clokep commented Feb 24, 2021

waclaw66 commented Feb 24, 2021

clokep commented Feb 24, 2021

waclaw66 commented Feb 25, 2021 • edited Loading

clokep commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

clokep commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

Bubu commented Feb 25, 2021

waclaw66 commented Feb 25, 2021

clokep commented Feb 25, 2021

waclaw66 commented Feb 26, 2021 • edited Loading

clokep commented Feb 26, 2021

waclaw66 commented Feb 26, 2021 • edited Loading

waclaw66 commented Feb 26, 2021 • edited Loading

tleydxdy commented Feb 28, 2021

clokep commented Mar 3, 2021

clokep commented Mar 3, 2021

waclaw66 commented Mar 8, 2021

MacLemon commented Mar 8, 2021

clokep commented Mar 9, 2021

waclaw66 commented Feb 25, 2021 •

edited

Loading

waclaw66 commented Feb 26, 2021 •

edited

Loading

waclaw66 commented Feb 26, 2021 •

edited

Loading

waclaw66 commented Feb 26, 2021 •

edited

Loading