diff --git a/proposals/3871-gappy-timelines.md b/proposals/3871-gappy-timelines.md new file mode 100644 index 00000000000..cde60e5198a --- /dev/null +++ b/proposals/3871-gappy-timelines.md @@ -0,0 +1,438 @@ +# MSC3871: Gappy timeline + +`/messages` returns a linearized version of the event DAG. From any given +homeservers perspective of the room, the DAG can have gaps where they're missing +events. This could be because the homeserver hasn't fetched them yet or because +it failed to fetch the events because those homeservers are unreachable and no +one else knows about the event. + +Currently, there is an unwritten rule between the server and client that the +server will always return all contiguous events in that part of the timeline. +But the server has to break this rule sometimes when it doesn't have the event +and is unable to get the event from anyone else. This MSC aims to change the +dynamic so the server can give the client feedback and an indication of where +the gaps are. + +This way, clients know where they are missing events and can even retry fetching +by perhaps adding some UI to the timeline like "We failed to get some messages +in this gap, try again." + +This can also make servers faster to respond to `/messages`. For example, +currently, Synapse always tries to backfill and fill in the gap (even when it +has enough messages locally to respond). In big rooms like `#matrix:matrix.org` +(Matrix HQ), almost every place you ask for has gaps in it (thousands of +backwards extremities) and lots of those events are unreachable so we try the +same thing over and over hoping the response will be different this time but +instead, we just make the `/messages` response time slow. With this MSC, we can +instead be more intelligent about backfilling in the background and just tell +the client about the gap that they can retry fetching a little later. + + +## Proposal + +Add a `gaps` field to the response of [`GET +/_matrix/client/v3/rooms/{roomId}/messages`](https://spec.matrix.org/v1.1/client-server-api/#get_matrixclientv3roomsroomidmessages). +This field is an array of `GapEntry` indicating where missing events in the +timeline are as defined below. + + +### 200 response + +This describes the new `gaps` response field being added to the `200 response` +of `/messages`: + +Name | Type | Description | required +--- | --- | --- | --- +`gaps` | `[GapEntry]` | A list of gaps indicating where events are missing in the `chunk` | no + + +#### `GapEntry` + +key | type | value | description | required +--- | --- | --- | --- | --- +`event_id` | string | Event ID | The event ID indicating the position in the `/messages` `chunk` response | yes +`prev_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG before the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no +`next_pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG after the given `event_id` in the `chunk`. Omitting this field just means there is no gap on this side. | no + + +### `/messages` response examples + +The following mermaid diagram represents the room DAG snapshot used for the following +`/messages` responses. The slightly transparent events with no background are events +that the homeserver does not have and are in the gap. + +Pagination tokens are positions between events. This already an established concept but +to illustrate this better, see the following `tX` pagination tokens in the following +diagram. + +```mermaid +flowchart RL + after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event + + classDef gap-event opacity:0.8,fill:transparent; +``` + +The idea is to be able to keep paginating from +`prev_pagination_token`/`next_pagination_token` in the respective direction to fill in +the gap. + + +#### `/messages?dir=b` + +`/messages?dir=b` response example with gaps (`chunk` has events in +reverse-chronoligcal order since we're paginating backwards): + +`/messages?dir=b&from=t6` +```json5 +{ + "chunk": [ + { + "event_id": "$corge", + "type": "m.room.message", + "content": { + "body": "corge", + } + }, + { + "event_id": "$baz", + "type": "m.room.message", + "content": { + "body": "baz", + } + }, + { + "event_id": "$foo", + "type": "m.room.message", + "content": { + "body": "foo", + } + } + ] + "gaps": [ + { + "prev_pagination_token": "t6", + "event_id": "$corge", + "next_pagination_token": "t5", + }, + { + "prev_pagination_token": "t4", + "event_id": "$baz", + "next_pagination_token": "t3", + }, + { + "prev_pagination_token": "t2", + "event_id": "$foo", + "next_pagination_token": "t1", + } + ] +} +``` + + +#### `/messages?dir=f` + +`/messages?dir=f` response example with gaps (`chunk` has events in +chronoligcal order since we're paginating forwards): + +`/messages?dir=f&from=t6` +```json5 +{ + "chunk": [ + { + "event_id": "$garply", + "type": "m.room.message", + "content": { + "body": "garply", + } + }, + { + "event_id": "$fred", + "type": "m.room.message", + "content": { + "body": "fred", + } + }, + ] + "gaps": [ + { + "prev_pagination_token": "t7", + "event_id": "$garply", + "next_pagination_token": "t8", + }, + { + "prev_pagination_token": "t9", + "event_id": "$fred", + "next_pagination_token": "t10", + } + ] +} +``` + + + +## Potential issues + +Lots of gaps/extremities are generated when a spam attack occurs and federation +falls behind. If clients start showing gaps with retry links, we might just be +exposing the spam more. + + +## Alternatives + +As an alternative, we can continue to do nothing as we do today and not worry +about the occasional missing events. People seem not to notice any missing +messages anyway but they do probably see our slow `/messages` pagination. + + +### Expose `prev_events` to the client + +One alternative is including the `prev_events` in the events that the client sees so +they can figure out the DAG chain themselves and see if there is an missing event in the +middle. + +There is an [unspecced `/messages?raw=true` query parameter in +Synapse](https://github.com/matrix-org/synapse/blob/20c76cecb9eb84dadfa7b2d25b436d3ab9218a1a/synapse/rest/client/room.py#L653) +that returns the full raw event as seen over federation which means it will include the +`prev_events`. + +You can also specify `event_format: federation` directly in that JSON `filter` parameter +of `/messages` -> +`/_matrix/client/v3/rooms/{room_id}}/messages?dir=b&filter=%7B%22event_format%22%3A%20%22federation%22%7D` + +Related to: + + - https://github.com/matrix-org/matrix-spec/issues/859 + - https://github.com/matrix-org/matrix-spec/issues/1047 + + +### Synthetic `m.timeline.gap` event alternative + +Another alternative is using synthetic events (thing that looks like an event +without an `event_id`) which the server inserts alongside other events in the +`chunk` to indicate where the gap is. But this has detractors since it's harder +to implement in strongly typed SDK's and easy for a client to naively display +every "event" in the `chunk`. + +`/messages` response example with a gap: + +```json +{ + "chunk": [ + { + "type": "m.room.message", + "content": { + "body": "foo", + } + }, + { + "type": "m.timeline.gap", + "content": { + "gap_start_event_id": "$12345", + "pagination_token": "t47409-4357353_219380_26003_2265", + } + }, + { + "type": "m.room.message", + "content": { + "body": "baz", + } + }, + ] +} +``` + + +### `GapEntry` alternative only indicating a gap `next_to_event_id` (only one side) + +Same concept as the existing `GapEntry` proposal but we only indicate the gap on one +side of an event `next_to_event_id` according to the direction that `/messages` is going +already. + +The problem with this alternative is that clients store events differently and it's +valid to want to paginate in either direction from a given event. This alternative works +fine in the Element Web case where you always paginate backwards in the scrollback and +store events as a whole timeline list but another client like the [Trixinity +SDK](https://github.com/benkuly/trixnity), where events are stored individually in a +linked list, where each event could have a gap before and after, and where a gap could +be 100's, 1000's of events wide, it would be useful to paginate from both ends to fill +the gap faster. + +
+ + Details for the GapEntry alternative only indicating a gap next_to_event_id + + +#### `GapEntry` + +key | type | value | description | required +--- | --- | --- | --- | --- +`next_to_event_id` | string | Event ID | The event ID indicating the position in the `/messages` `"chunk"` response where the gap starts after that position. This field can be `null` or completely omitted to indicate that the gap is at the start of the `/messages` `"chunk"` | no +`pagination_token` | string | Pagination token | A pagination token that represents the spot in the DAG to be able to continue paginating in the same direction as the request and fill in the gap from `next_to_event_id` to the next known event. | yes + + +### `/messages` response examples + +The following mermaid diagram represents the room DAG snapshot used for the following +`/messages` responses. The slightly transparent events with no background are events +that the homeserver does not have and are in the gap. + +Pagination tokens are positions between events. This already an established concept but +to illustrate this better, see the following `tX` pagination tokens in the following +diagram. + +```mermaid +flowchart RL + after[newest events...]:::gap-event -->|t10| fred -->|t9| waldo:::gap-event -->|t8| garply -->|t7| grault:::gap-event -->|t6| corge -->|t5| qux:::gap-event -->|t4| baz -->|t3| bar:::gap-event -->|t2| foo -->|t1| before[oldest events...]:::gap-event + + classDef gap-event opacity:0.8,fill:transparent; +``` + +The idea is to be able to keep paginating from `pagination_token` in the same +direction of the request to fill in the gap. + + +#### `/messages?dir=b` + +`/messages?dir=b` response example with gaps (`chunk` has events in +reverse-chronoligcal order since we're paginating backwards): + +`/messages?dir=b&from=t6` +```json5 +{ + "chunk": [ + // there is no gap from `t6` to `$corge` as expected + { + "event_id": "$corge", + "type": "m.room.message", + "content": { + "body": "corge", + } + }, + // + { + "event_id": "$baz", + "type": "m.room.message", + "content": { + "body": "baz", + } + }, + // + { + "event_id": "$foo", + "type": "m.room.message", + "content": { + "body": "foo", + } + } + // + ] + "gaps": [ + { + "next_to_event_id": "$corge", + "pagination_token": "t5", + }, + { + "next_to_event_id": "$baz", + "pagination_token": "t3", + }, + { + "next_to_event_id": "$foo", + "pagination_token": "t1", + } + ] +} +``` + + +#### `/messages?dir=f` + +`/messages?dir=f` response example with gaps (`chunk` has events in +chronoligcal order since we're paginating forwards): + +`/messages?dir=f&from=t6` +```json5 +{ + "chunk": [ + // + { + "event_id": "$garply", + "type": "m.room.message", + "content": { + "body": "garply", + } + }, + // + { + "event_id": "$fred", + "type": "m.room.message", + "content": { + "body": "fred", + } + }, + // + ] + "gaps": [ + { + "next_to_event_id": null, + "pagination_token": "t6", + }, + { + "next_to_event_id": "$garply", + "pagination_token": "t8", + }, + { + "next_to_event_id": "$fred", + "pagination_token": "t10", + } + ] +} +``` + +
+ + +## Future considerations + +In the future, we should consider adding the same `gaps` field to `/context` because +it's another endpoint that returns a linearized version of the DAG. + +It could make sense to roll this into this MSC but it might make the proposal less clear +if we have to bulk it up by specifying the same details for `/context`. Leaving it to be +follow-up MSC for now. + + +## Security considerations + +Only your own homeserver controls whether a gap is added to the `/messages` +response so there shouldn't be any weird edge case where someone else can +control whether you to fetch something. + + +## Unstable prefix + +While this feature is in development, the `gaps` field can be used as +`org.matrix.msc3871.gaps` + +### While the MSC is unstable + +During this period, to detect server support clients should check for the +presence of the `org.matrix.msc3871` flag in `unstable_features` on `/versions`. +Clients are also required to use the unstable prefixes (see [unstable +prefix](#unstable-prefix)) during this time. + +### Once the MSC is merged but not in a spec version + +Once this MSC is merged, but is not yet part of the spec, clients should rely on +the presence of the `org.matrix.msc3871.stable` flag in `unstable_features` to +determine server support. If the flag is present, clients are required to use +stable prefixes (see [unstable prefix](#unstable-prefix)). + +### Once the MSC is in a spec version + +Once this MSC becomes a part of a spec version, clients should rely on the +presence of the spec version, that supports the MSC, in `versions` on +`/versions`, to determine support. Servers are encouraged to keep the +`org.matrix.msc3871.stable` flag around for a reasonable amount of time +to help smooth over the transition for clients. "Reasonable" is intentionally +left as an implementation detail, however the MSC process currently recommends +*at most* 2 months from the date of spec release.