Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

members API returns M_NOT_FOUND: MatrixError: [404] Can't find event for token #14814

Closed
andybalaam opened this issue Jan 11, 2023 · 4 comments · Fixed by #14817
Closed

members API returns M_NOT_FOUND: MatrixError: [404] Can't find event for token #14814

andybalaam opened this issue Jan 11, 2023 · 4 comments · Fixed by #14817
Assignees
Labels
A-Create-Room A-Membership O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers

Comments

@andybalaam
Copy link
Member

andybalaam commented Jan 11, 2023

Description

See element-hq/element-web#24179.

The user is creating a new DM room.

Element Web creates the room, waits for a sync, then calls the members API with the latest sync token:

https://matrix-client.matrix.org/_matrix/client/r0/rooms/!lFfrlZXUrMAOXpzuxp%3Amatrix.org/members?not_membership=leave&at=m3595246325%7E2.3595246467%7E3.3595246468%7E36.3595246466_757284974_52668403_1826042582_1854740460_3893944_721765062_6284542923_0

Steps to reproduce

Since we're using a sync token given to us by Synapse from a sync that contains the new room's events, we would expect it to point to a valid event in the room.

Homeserver

matrix.org

Synapse Version

1.75.0rc1 (b=matrix-org-hotfixes,04aa6a9707)

But note that this was reported on 2023-01-06T11:49 UTC

Installation Method

I don't know

Database

PostgreSQL

Workers

Multiple workers

Platform

.

Configuration

.

Relevant log output

Element web logs: https://github.com/matrix-org/element-web-rageshakes/issues/18940

Anything else that would be useful to know?

@richvdh is already investigating.

@richvdh
Copy link
Member

richvdh commented Jan 11, 2023

Initial investigation based on an example call:

2023-01-11 10:03:24,057 - synapse.http.server - 108 - INFO - GET-787cd28348aa75cf-LHR - <XForwardedForRequest at 0x7f77ee504be0 method='GET' uri='/_matrix/client/r0/rooms/!uvUhHnEbAYZNVcRWbx%3Amatrix.org/members?not_membership=leave&at=m3608765164%7E1.3608765719%7E2.3608765719%7E3.3608765719_757284974_3566946_1836222772_1864166937_3907868_728498125_6331086177_0' clientproto='HTTP/1.1' site='11105'> SynapseError: 404 - Can't find event for token StreamToken(room_key=RoomStreamToken(topological=None, stream=3608765164, instance_map=frozendict.frozendict({'event_persister-1': 3608765719, 'event_persister-2': 3608765719, 'event_persister-3': 3608765719})), presence_key=757284974, typing_key=3566946, receipt_key=1836222772, account_data_key=1864166937, push_rules_key=3907868, to_device_key=728498125, device_list_key=6331086177, groups_key=0)

The stream token in this request is:

m3608765164~1.3608765719~2.3608765719~3.3608765719_757284974_3566946_1836222772_1864166937_3907868_728498125_6331086177_0

Within that token, the interesting part is before the first _ (corresponding to the RoomStreamToken - ie, in-room events). That part tells us that all event persisters have persisted up to at least stream_id 3608765164, while persisters 1, 2, and 3 have persisted right up to 3608765719. (See here and here for documentation of the token format). I'll call these two numbers the "minimal" and "maximal" stream orderings.

Looking at the content of this room:

matrix=> select stream_ordering, instance_name, sender, type from events where room_id='!uvUhHnEbAYZNVcRWbx:matrix.org' order by stream_ordering;
 stream_ordering |   instance_name   |         sender         |           type            
-----------------+-------------------+------------------------+---------------------------
      3608765557 | event_persister-2 | @dave:matrix.org       | m.room.create
      3608765563 | event_persister-2 | @dave:matrix.org       | m.room.member
      3608765577 | event_persister-2 | @dave:matrix.org       | m.room.power_levels
      3608765578 | event_persister-2 | @dave:matrix.org       | m.room.join_rules
      3608765579 | event_persister-2 | @dave:matrix.org       | m.room.history_visibility
      3608765580 | event_persister-2 | @dave:matrix.org       | m.room.guest_access
      3608765581 | event_persister-2 | @dave:matrix.org       | m.room.encryption
      3608765585 | event_persister-2 | @dave:matrix.org       | m.room.member
      3608769619 | event_persister-2 | @dave:matrix.org       | m.room.encrypted
      3608769807 | event_persister-2 | @davetest10:matrix.org | m.room.member
      3608770356 | event_persister-2 | @davetest10:matrix.org | m.room.encrypted
      3608786832 | event_persister-2 | @dave:matrix.org       | m.room.member
      3608791211 | event_persister-2 | @davetest10:matrix.org | m.room.member

... we can see that the room was created on event persister 2 more recently the minimal stream ordering 3608765164. This is the cause of the error: the /members handler calls get_last_event_in_room_before_stream_ordering, which only considers the minimal stream ordering.

This seems to be in contrast to the behaviour of /sync, which will return all events (and rooms) which happened before the maximal stream ordering (https://github.com/matrix-org/synapse/blob/v1.74.0/synapse/storage/databases/main/stream.py#L572).

I have to admit, the sharded event stream logic tends to be indistinguishable from magic to me, but presumably get_last_event_in_room_before_stream_ordering ought to be considering these extra events.

@richvdh
Copy link
Member

richvdh commented Jan 11, 2023

Thinking about this further, the whole reason for providing a token to /members is to avoid races between /sync and /members (in a way whose details currently escape me: there may be more info in the design docs for membership lazy-loading). Surely this is completely obviated if /sync and /members then use different logic for deciding how to interpret the token?

Incidentally: it seems possible that this might have been made more obvious by recent work to speed up room creation?

@erikjohnston erikjohnston self-assigned this Jan 11, 2023
erikjohnston added a commit that referenced this issue Jan 11, 2023
erikjohnston added a commit that referenced this issue Jan 11, 2023
erikjohnston added a commit that referenced this issue Jan 11, 2023
@DMRobertson
Copy link
Contributor

To summarise: by making a request with the token m3608765164~1.3608765719~2.3608765719~3.3608765719, the client is telling us that in its most recent incremental sync:

  • it has seen all relevant events with stream ordering <= 3608765164
  • it has seen all relevant events with 3608765164 < stream ordering <= 3608765719 that were persisted by event persister 1
  • ditto for event persisters 2 and 3. (In this case, the upper bound happens to be the same across all three named persisters).

Calling /members?at=... with this token means we should consider all events described above when determining the members in the room.

@DMRobertson
Copy link
Contributor

Having said that: I thought events were sharded to different event persisters by room id (in Rich's example, event_persister2). Does it suffice to look only for events created by event_persister2? Oh, I suppose not---the sharding pattern may have changed over time.

@DMRobertson DMRobertson added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Create-Room A-Membership Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers O-Occasional Affects or can be seen by some users regularly or most users rarely labels Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Create-Room A-Membership O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants