Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4186: Simplified Sliding Sync #4186

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
384 changes: 384 additions & 0 deletions proposals/4186-simplified-sliding-sync.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • Client (ideally multiple)
  • Server

Original file line number Diff line number Diff line change
@@ -0,0 +1,384 @@
# MSC4186: Simplified Sliding Sync

The current `/sync` endpoint scales badly as the number of rooms on an account increases. It scales badly because all
rooms are returned to the client, incremental syncs are unbounded and slow down based on how long the user has been
offline, and clients cannot opt-out of a large amount of extraneous data such as receipts. On large accounts with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't data be removed with filters? (See RoomFilter)

thousands of rooms, the initial sync operation can take tens of minutes to perform. This significantly delays the
initial login to Matrix clients, and also makes incremental sync very heavy when resuming after any significant pause in
usage.

Note: this is a “simplified” version of the sliding sync API proposed in
[MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575), based on paring back that API based
on real world use cases and usages.


# Goals

This improved `/sync` mechanism has a number of goals:

- Sync time should be independent of the number of rooms you are in.
- Time from launch to confident usability should be as low as possible.
- Time from login on existing accounts to usability should be as low as possible.
Comment on lines +20 to +21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get the difference here. What is "launch"? Does this just mean an initial login vs an already logged in user?

- Bandwidth should be minimized.
- Support lazy-loading of things like read receipts (and avoid sending unnecessary data to the client)
- Support informing the client when room state changes from under it, due to state resolution.
- Clients should be able to work correctly without ever syncing in the full set of rooms they’re in.
- Don’t incremental sync rooms you don’t care about.
- Servers should not need to store all past since tokens. If a since token has been discarded we should gracefully
degrade to initial sync.

These goals shaped the design of this proposal.


# Proposal

The core differences between sync v2 and simplified sliding sync are:

- The server initially only sends the most recent N rooms to the client (where N is specified by the lcient), which then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- The server initially only sends the most recent N rooms to the client (where N is specified by the lcient), which then
- The server initially only sends the most recent N rooms to the client (where N is specified by the client), which then

can paginate in older rooms in subsequent requests
- The client can configure which information the server will return for different sets of rooms (e.g. a smaller timeline
limit for older rooms).
- The client can filter what rooms it is interested in
- The client can maintain multiple sync loops (with some caveats)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this useful? When do clients want to use this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Primarily for iOS which has 2 processes (a push process (NSE) and the main app process).


The basic operation is similar between sync v2 and simplified sliding sync: both use long-polling with tokens to fetch
updates from the server. I.e., the basic operation of both APIs is to do an “initial” request and then repeatedly call
the API supplying the token returned in the previous response in the subsequent “incremental” request.


## Lists and room subscriptions

The core component of a sliding sync request is “lists”, which specify what information to return about which rooms.
Each list specifies some filters on rooms (e.g. ignore spaces), the range of filtered rooms to select (e.g. the most
recent 20 filtered rooms), and the config for the data to return for those rooms (e.g. the required state, timeline
limit, etc). The order of rooms is always done based on when the server received the most recent event for the room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Stream order" in synapse parlance?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, should standardize this type of language.

Suggested change
limit, etc). The order of rooms is always done based on when the server received the most recent event for the room.
limit, etc). The order of rooms is always done based on the arrival time of the most recent event for the room on the homeserver.


The client can also specify config for specific rooms if it has their room ID, these are known as room subscriptions.

Multiple lists and subscriptions can be specified in a request. If a room matches multiple lists/subscriptions then the
config is “combined” to be the superset of all configs (e.g. take the maximum timeline limit). See below for the exact
algorithm.
Comment on lines +58 to +60
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lists + subscriptions replace filters?


The server tracks what data has been sent to the client in which rooms. If a room matches a list or subscription that
hasn’t been sent down before, then the server will respond with the full metadata about the room indicated by `initial:
true`. If a room stops matching a list (i.e. it falls out of range) then no further updates will be sent until it starts
matching a list again, at which point the missing updates (limited by the `timeline_limit`) will be sent down. However,
as clients are now expected to paginate all rooms in the room list in the background (in order to correctly order and
search them), the act of a room falling out of range is a temporary edge-case.


## Pagination

Pagination is achieved by the client increasing the ranges of one (or more) lists.

For example an initial request might have a list called `all_rooms` specifying a range of `0..20` in the initial
request, and the server will respond with the top 20 rooms (by most recently updated). On the second request the client
may change the range to `0..100`, at which point the server will respond with the top 100 rooms that either a) weren’t
sent down in the first request, or b) have updates since the first request.

Clients can increase and decrease the ranges as they see fit. A common approach would be to start with a small window
and grow that until the range covers all the rooms. After some threshold of the app being offline it may reduce the
range back down and incrementally grow it again. This allows for ensuring that a limited amount of data is requested at
once, to improve response times.


## Connections

Clients can have multiple “connections” (i.e. sync loops) with the server, so long as each connection has a different
`conn_id` set in the request.

Clients must only have a single request in-flight at any time per connection (clients can have multiple connections by
specifying a unique `conn_id`). If a client needs to send another request before receiving a response to an in-flight
request (e.g. for retries or to change parameters) the client *must* cancel the in-flight request and *not* process any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Cancel" referring to HTTP level cancelling?

response it receives for it.

In particular, a client must use the returned `pos` value in a response as the `since` param in exactly one request that
the client will process the response for. Clients must be careful to ensure that when processing a response any new
requests use the new `pos`, and any in-flight requests using an old `pos` are canceled.

The server cannot assume that a client has received a response until it receives a new request with the `since` token
set to the `pos` it returned in the response. The server must ensure that any per-connection state it tracks correctly
handles receiving multiple requests with the same `since` token (e.g. the client retries the request or decides to
cancel and resend a request with different parameters).

A server may decide to “expire” connections, either to free resources or because the server thinks it would be faster
for the client to start from scratch (e.g. because there are many updates to send down). This is done by responding with
a 400 HTTP status and an error code of `M_UNKNOWN_POS`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a new error code(?) I'd prefer spelling out POSITION here. The abbreviation doesn't help much.



## List configuration

**TODO**, these are the same as in [MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please copy them here so this is self contained.


- Required state format
- The filters
- Lazy loading of members
- Combining room config


## Room config changes

When a room comes in and out of different lists or subscriptions, the effective `timeline_limit` and `required_state`
parameters may change. This section outlines how the server should handle these cases.

If the `timeline_limit` *increases* then the server *may* choose to send down more historic data. This is to support the
ability to get more history for certain rooms, e.g. when subscribing to the currently visible rooms in the list to
precache their history. This is done by setting `unstable_expanded_timeline` to true and sending down the last N events
(this may include events that have already been sent down). The server may choose not to do this if it believes it has
already sent down the appropriate number of events.
Comment on lines +124 to +128
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this entire section. Why is it unstable?


If new entries are added to `required_state` then the server must send down matching current state events.


## Extensions

We anticipate that as more features land in Matrix, different kinds of data will also want to be synced to clients. Sync
v2 did not have any first-class support to opt-in to new data. Sliding Sync does have support for this via "extensions".
Extensions also allow this proposal to be broken up into more manageable sections. Extensions are requested by the
client in a dedicated extensions block.

In an effort to reduce the size of this proposal, extensions will be done in separate MSCs. There will be extensions
for:

- To Device Messaging \- MSC3885
- End-to-End Encryption \- MSC3884
- Typing Notifications \- MSC3961
- Receipts \- MSC3960
- Presence \- presence in sync v2: spec
- Account Data \- account\_data in sync v2: MSC3959
- Threads

**TODO** explain how these interact with the room lists, this is the same as in
[MSC3575](https://github.com/matrix-org/matrix-spec-proposals/pull/3575)

## Request format

```javascript
{
"conn_id": "<conn_id>", // Client chosen ID of the connection, c.f. "Connections"

// The set of room lists
"lists": {
// An arbitrary string which the client is using to refer to this list for this connection.
"<list-name>": {

// Sliding window ranges, c.f. Lists and room subscriptions
"ranges": [[0, 10]],
Comment on lines +165 to +166
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a list of lists really helpful here? Are these numbers inclusive?


// Filters to apply to the list.
"filters": {
// Flag which only returns rooms present (or not) in the DM section of account data.
// If unset, both DM rooms and non-DM rooms are returned. If false, only non-DM rooms
// are returned. If true, only DM rooms are returned.
"is_dm": true|false|null,

// Flag which only returns rooms which have an `m.room.encryption` state event. If unset,
// both encrypted and unencrypted rooms are returned. If false, only unencrypted rooms
// are returned. If true, only encrypted rooms are returned.
"is_encrypted": true|false|null,

// Flag which only returns rooms which have an `m.room.encryption` state event. If unset,
// both encrypted and unencrypted rooms are returned. If false, only unencrypted rooms
// are returned. If true, only encrypted rooms are returned.
"is_invite": true|false|null,
Comment on lines +180 to +183
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

documentation is copied across from is_encrypted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will knocks and left rooms be handled? Maybe this should be a string and be the current membership of the user?


// If specified, only rooms where the `m.room.create` event has a `type` matching one
// of the strings in this array will be returned. If this field is unset, all rooms are
// returned regardless of type. This can be used to get the initial set of spaces for an account.
// For rooms which do not have a room type, use 'null' to include them.
"room_types": [ ... ],

// Same as "room_types" but inverted. This can be used to filter out spaces from the room list.
// If a type is in both room_types and not_room_types, then not_room_types wins and they are
// not included in the result.
"not_room_types": [ ... ],
},

// The maximum number of timeline events to return per response.
"timeline_limit": 10,

// Required state for each room returned. An array of event type and state key tuples.
// Elements in this array are ORd together to produce the final set of state events
// to return. One unique exception is when you request all state events via ["*", "*"]. When used,
// all state events are returned by default, and additional entries FILTER OUT the returned set
// of state events. These additional entries cannot use '*' themselves.
// For example, ["*", "*"], ["m.room.member", "@alice:example.com"] will _exclude_ every m.room.member
// event _except_ for @alice:example.com, and include every other state event.
// In addition, ["*", "*"], ["m.space.child", "*"] is an error, the m.space.child filter is not
// required as it would have been returned anyway.
Comment on lines +200 to +208
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not state and not_state? That sounds a lot simpler.

"required_state": [ ... ],
}
},

// The set of room subscriptions
"room_subscriptions": {
// The key is the room to subscribe to.
"!foo:example.com": {
Comment on lines +214 to +216
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-adding this discussion from the other MSC3575:

What should happen if you try to subscribe to a room you are not part of (or not allowed)?

The easy option would be to ignore it but then clients have no feedback about what they did wrong.

Perhaps entries in the rooms map could also have an errcode/error field. We could also just blow up the whole request with an error and point out which room_id is wrong although that isn't very machine-readable.


Can you subscribe to a public world_readable room that you're not part of?

// These have the same meaning as in `lists` section
"timeline_limit": 10,
"required_state": [ ... ],
}
},

// c.f. "Extensions"
"extensions": {
}
}
```


## Response format

```javascript
{
// The position to use as the `since` token in the next sliding sync request.
// c.f. Connections.
"pos": "<opaque string>",
Comment on lines +234 to +236
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there was an effort to standardize pagination terminology?


// Information about the lists supplied in the request.
"lists": {
// Matches the list name supplied by the client in the request
"<list-name>" {
// The total number of rooms that match the list's filter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the total in the request, correct? If a room appears in multiple lists is it double counted?

"count": 1234,
}
},

// Aggregated rooms from lists and room subscriptions. There will be one entry per room, even if
// the room appears in multiple lists and/or room subscriptions.
"rooms": {
Comment on lines +247 to +249
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify which rooms should be showing up.

We want to return every room where the user has membership except rooms that the user has left themselves; unless "newly left" (left since the last incremental sync pos token) because we want everything that's still relevant to the user. We include "newly left" rooms because the last event that the user should see is their own leave event.

A leave != kick (leave events where the sender is not the same user). Kicks should always be included.

To say it another way, here is a list:

  • join
  • invite
  • knock
  • ban
  • Kicks (leave where the sender doesn't match the state_key)
  • Newly left rooms (leave membership that happened since the last incremental sync pos token)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you only include ban if there is a previous join, invite, or knock? :) (Maybe scope creep.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also had that in mind but I'm not sure (could go either way). It's kinda the same as an unsolicited invite (optionally followed by a kick/ban).

Ideally, we probably wouldn't show those kind of ban

"!foo:example.com": {
// The room name, if one exists. Only sent initially and when it changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth stating that this is strictly the value from the m.room.name event given the previous MSC3575 was previously trying to do a bunch of room name calculation stuff.

Suggested change
// The room name, if one exists. Only sent initially and when it changes.
// The room name (`m.room.name` -> `content.name`), if one exists. Only sent initially and when it changes.

"name": str|null,
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
// The room avatar, if one exists. Only sent initially and when it changes.
"avatar_url": str|null,
// The "heroes" for the room, if there is no room name. Only sent initially and when it changes.
"heroes": [
{"user_id":"@alice:example.com","displayname":"Alice","avatar_url":"mxc://..."},
],
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the heroes membership state events need to be included in required_state response when requesting required_state: ["m.room.member", "$LAZY"] (lazy-loading room members)?

Sync v2 says this:

When lazy-loading room members is enabled, the membership events for the heroes MUST be included in the state, unless they are redundant. When the list of users changes, the server notifies the client by sending a fresh list of heroes. If there are no changes since the last sync, this field may be omitted.

But I think that's because m.heroes in Sync v2 is only a list of user ID's. Whereas, in the sliding sync response here, we already have all of the info necessary in these stripped events.

One alternative is to match Sync v2 and only list the user ID's in heroes and include the membership events in required_state.

Comment on lines +256 to +258
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One weird corner with the current spec: If the room doesn't have a name set and the user is invited/knock to the room, we don't have any heroes to provide here. This is because heroes membership isn't included in the stripped state that the server receives when they are invited/knocked over federation.

Related to matrix-org/matrix-spec#380

This was solved for partial-joins via MSC3943

Not necessarily a blocker but just calling out something that won't work completely until that spec issue is solved.


// Flag which is set when this is the first time the server is sending this data on this connection.
// When set the client must replace any stored metadata for the room with the new data. In
// particular, the state must be replaced with the state in `required_state`.
"initial": true|null,

// Same as in sync v2. Indicates whether there are more events to fetch than those in the timeline.
"limited:" true|null,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the options in the request and response supposed to be in a union with null?

Isn't it the convention in the spec that the property is omitted entirely rather than given the value null?

(For clarity, this is something that jumps out at me since frequently people don't realise omitting a property and giving the value null are distinct things in JSON. Again, I'm not sure what the convention is for the CS-API though.)

Copy link
Contributor

@Gnuxie Gnuxie Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems like limited is already optional in the timeline portion of sync? https://github.com/matrix-org/matrix-spec/blob/7f2f100420a355cf8ea50d08329e6f98989fc975/data/api/client-server/definitions/timeline_batch.yaml#L15-L36

But this does not imply a union with null (it's not explicit). It could if there is another section of the specification that says something to the effect "all unrequired properties in responses and requests should be treated as though their values are in a union with null" somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to rewrite this section using JSONSchema so that the intent is clearer in either case.

Comment on lines +265 to +266
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify what limited means when complicating it with gaps, newly joined, newly joined a room where you can't see the history, etc

Sync v2 in Synapse is a bit crazy with the logic probably because the spec isn't clear.

The existing definition seems accurate "Indicates whether there are more events to fetch than those in the timeline." Could maybe clarify that you can be limited if there is timeline gap and you stop at the gap and say limited: true so the client paginates over the gap themselves. And maybe also need to say that being newly joined doesn't factor in.

Comment on lines +265 to +266
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarify when to add m.relations in unsigned (bundled_aggregation logic).

Currently, it's done in Sync v2 and Sliding Sync in Synapse when limited: true but @erikjohnston probably has the right inclination to not do it given that sync is returning new events and any relations to those events would come after it by their nature (and already be in the response or the next one).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston unearthed some more context in MSC2675 and relevant Synapse PR

// Indicates if we have "expanded" the timeline due to the timeline_limit changing, c.f. Room config
// changes above.
"unstable_expanded_timeline": true|null,
// The list of events, sorted least to most recent.
"timeline": [ ... ],
// The current state of the room as a list of events
"required_state": [ ... ],
Comment on lines +272 to +273
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this always sent as the full room state or can partial state be sent as well?

Copy link
Contributor

@MadLittleMods MadLittleMods Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On initial sync or the first time the room is sent down the connection (when the room indicates initial: true), this is the full set of state requested in required_state.

On incremental sync (when the room indicates initial: null), this will only be the state that has changed since the last time you synced (since the pos token in the request).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think this could be clearer.

Comment on lines +270 to +273
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call out that the required_state, timeline, limited, num_live, prev_batch fields will be omitted for invite/knock rooms. Only invite_state/knock_state is provided.

The reason these can't be included is because we don't have any of that information for remote invites and the user isn't participating in the room yet so we shouldn't leak anything to them. We can only rely on the information in the invite event.

Related previous conversation: #3575 (comment)

// The number of timeline events which have just occurred and are not historical.
// The last N events are 'live' and should be treated as such.
"num_live": 1,
// Same as sync v2, passed to `/messages` to fetch more past events.
"prev_batch": "...",

// For invites this is the stripped state of the room at the time of invite
"invite_state": [ .. ],

// For knocks this is the stripped state of the room at time of knock
"knock_state": [ .. ],

// Whether the room is a DM room.
"is_dm": true|null,

// An opaque integer that can be used to sort the rooms by "Bump Stamp"
"bump_stamp": 1,
Comment on lines +289 to +290
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this defined?


// These are the same as sync v2.
"joined_count": 1,
"invited_count": 1,
"notification_count": 1,
"highlight_count": 1,
}
},

"extensions": {
},
}
```

# Example usage

This section gives an example of how a client can use this API (roughly based on how Element X currently uses the API).

When the app starts up it configures a single list with a range of `[0, 19]` (to get the top 20 rooms) and a
`timeline_limit` of 1. This returns quickly with the top 20 rooms (or just the changes in the top 20 rooms if a token
was specified).

The client then increases the range (in the next request) to `[0, 99]`, which will return the next 80 rooms. The server
may sort the rooms differently than they are returned by the server (e.g. they may ignore reactions for sorting
purposes).
Comment on lines +313 to +315
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused at how the server now only gives 80 rooms and not 100. The definition above seems to imply it would be 100 rooms?


The client can use room subscriptions, with a `timeline_limit` of 20, to preload history for the top rooms. This means
that if the user clicks on one of the top rooms the app can immediately display a screens worth of history. (An
alternative would be to have a second list with a static range of `[0, 19]` and a `timeline_limit` of 20. The downside)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence was unfinished.


The client can keep increasing the list range in increments to pull in the full list of rooms. The client uses the
returned `count` for the list to know when to stop expanding the list.

The client *may* decided to reduce the range back to `[0, 19]` (and then subsequently incrementally expand the range),
this can be done.

When the client is expecting a fast response (e.g. while expanding the lists), it should set the `timeout` parameter to
0 to ensure the server doesn't block waiting for new data. This can easily happen if the app starts and sends the first
request with a `since` parameter, if the client shows a spinner but doesn't set a timeout then the request may take a
long time to return (if there were no updates to return).


# Alternatives / changes

There are a number of potential changes that we could make.

## Pagination

In practice, having the client specify the ranges to use for the lists is often sub-optimal. The client generally wants
to have the sync request return as quickly as possible, but it doesn't know how much data the server has to return and
so whether to increase or decrease the range.

An alternative is for the client to specify a `page_size`, where the server sends down at most `page_size` number of
rooms. If there are more rooms to send to the client (beyond `page_size`), then the client can request to "paginate" in
these missed updates in subsequent updates.

Since this would require client side changes, this should be explored in a separate MSC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't all of this require client side changes?


## Timeline event trickling

If the `timeline_limit` is increased then the server will send down historic data (c.f. "Room config changes"), which
allows the clients to easily preload more history in recent rooms.

This mechanism is fiddly to implement, and ends up resending down events that we have previously sent to the client.

A simpler alternative is to use `/messages` to fetch the history. This has two main problems: 1) clients generally want
to preload history for multiple rooms at once, and 2) `/messages` can be slow if it tries to backfill over federation.

We could implement a bulk `/messages` endpoint, where the client would specify multiple rooms and `prev_batch` tokens.
We can also add a flag to disable attempting to backfill over pagination (to match the behaviour of the sync timeline).
Comment on lines +349 to +360
Copy link
Contributor

@MadLittleMods MadLittleMods Sep 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please kill unstable_expanded_timeline with one of these alternatives. It was implemented in Simplified Sliding Sync solely to match what the Sliding Sync proxy did. And the proxy behavior was a bug not a feature (self-described)

Previous conversation:


## `required_state` response format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems to contradict the opening that this MSC solves the issue of clients having incorrect state.


The format of returned state in `required_state` is a list of events. This does now allow the server to indicate if a
"state reset" has happened which removed an entry from the state entirely (rather than it being replaced with another
event).

This is particularly problematic if the user gets "state reset" out of the room, where the server has no mechanism to
indicate to the client that the user has effectively left the room (the server has no leave event to return).

We may want to allow special entries in the `required_state` list of the form
`{"type": .., "state_key": .., content: null}` to indicate that the state entry has been removed.
Comment on lines +364 to +372
Copy link
Contributor

@Gnuxie Gnuxie Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty serious and I don't think we can leave it up to clients to figure out somehow. Since that requires client authors to both be extremely knowledgeable about how room state works and also write perfect code. The chances of both of those happening is slim. For example, I tried to write some code to track room state from legacy /sync (by also using /state) in Draupnir and it seems like I haven't even been able to do that right. https://github.com/Gnuxie/matrix-protection-suite/blob/main/src/StateTracking/StateChangeType.ts#L13-L64, https://github.com/Gnuxie/matrix-protection-suite/blob/main/src/StateTracking/StateChangeType.ts#L83-L123. The situation where a state event can be removed entirely via a state reset is not even accounted for. Partially because I was under the impression that this was not something that could happen, as I was a dumbass.

You can read more about the way Draupnir collects state here https://marewolf.me/posts/draupnir/2402.html#development-room-state-tracking. But it would be good if clients didn't need to calculate state deltas at all and there was more specialized API support for that. There'd need to be consideration for how such an API would interact with timeline gaps if it was done alongside /sync.



# Security considerations

Care must be taken, as with sync v2, to ensure that only the data that the user is authorized to see is returned in the
response.
Comment on lines +377 to +378
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be expanded to explain what this means.

In terms of state:

  • join: Access to all of the room state.
  • invite/knock: Can only derive state from the stripped state events in the event itself (invite_room_state, knock_room_state)
  • leave/ban: Depends on the previous membership.
    • If the previous membership was join, access to the state at the time of the leave/ban (historical state)
    • If the previous membership was invite, fallback to the stripped state on the invite event
    • If the previous membership was leave/ban, repeat this logic
    • If no previous membership, no access to anything.

In terms of timeline events, this probably depends solely on m.room.history_visibility



# Unstable prefix

The unstable URL for simplified sliding sync is `/org.matrix.simplified_msc3575/sync`. The flag in `/versions` is
`org.matrix.simplified_msc3575`.