-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Initial /sync is returning 404 due to "unknown room" when processing an event. #7065
Comments
I've got back in by /ignore-ing 5 users who had sent me invites which were stuck (meaning I couldn't reject them, as the server didn't consider me to be in the room, as the inviting user had previously parted in). The rooms in question had a NULL in the room_version column of the rooms table. However, rooms with null room_versions surely shouldn't cause the whole /sync to fail. It's also worth noting that rejecting other invites (e.g. !qkoxVSPwURByZOmIPQ:matrix.org) fails with: "No create event in auth events" |
I think we've figured out what is happening here. Let me emphasise first of all that As part of erik's work on fixing UISIs (no really, this all starts there), he wanted to arrange things so that we would clear out However, at that time, the way we figured out the version of a particular room was to look up its create event via CSE. We still need to know the versions of rooms where there are no active members (as this example shows), so #6729 added a A later background update, added in #6802, then clears out CSE for rooms such as this one where there are no active members. Later, we needed to know the room version for each event when we read it from the database (in case we later needed to redact it following the room-specific event-redaction algorithm): #6874, #6875, etc. This also meant that we wanted to ensure that However, the fly in this ointment is that there doesn't seem to be anything that ensures the So, any homeserver which went straight from pre-#6729 to post-#6847 should be ok, but any servers which ran the code between #6802 and #6847, which includes matrix.org (and anyone else who ran synapse 1.10 or any of its RCs) may have suffered this data corruption. A final irony here: it's not the first time that Fixing itWe still have the create events, so it should be easy enough to construct yet another database update to try to populate We should probably also add the missing constraint between |
well, we could maybe code around this somehow, to make the room be ignored instead of throwing an error, but I'm generally sceptical of such measures. It's easy to clutter the code with defensive mechanisms which obfuscate how it is meant to work, and furthermore it can hide bugs until they come and bite us in the backside later on. In short: I'm a fan of failing early wherever possible rather than struggling on in the face of bad data. |
(incidentally: if you need a workaround in the meantime, poking the right room_version into the table is probably a much easier one than ignoring users and whathaveyou.) |
yeah. i only confirmed the missing room_version pattern after ignoring the stuck invite users. worth noting that i have other stuck invites i can’t reject which do have room_version entries, which fail with missing auth events. |
fixed by #7070 |
Fixes matrix-org#7065 This is basically the same as matrix-org#6847 except it tries to populate events from `state_events` rather than `current_state_events`, since the latter might have been cleared from the state of some rooms too early, leaving them with a `NULL` room version.
My account is currently unable to initial /sync and is broken with:
The
$...-vJc
event is an invite event for my user to a room I hadn't yet accepted.This circumstantially looks related to the new redaction stuff in #6875 - perhaps in conjunction with stuck invite bugs, or failing to set room_version on unaccepted invites, or something else...
The text was updated successfully, but these errors were encountered: