Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4033: Explicit ordering of events for receipts #4033

Open
wants to merge 51 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
3c50306
Initial draft
andybalaam Jul 4, 2023
e7983f2
Update number to MSC4033
andybalaam Jul 4, 2023
a75a8e5
Fix mis-spelling
andybalaam Jul 4, 2023
6841e75
Fix another mis-spelling
andybalaam Jul 4, 2023
2521212
Add some more alternatives
andybalaam Jul 4, 2023
fe83ad9
Add TODO list
andybalaam Jul 4, 2023
3ae4995
Add missing bracket
andybalaam Jul 4, 2023
8414c2e
Add an acknowledgements section
andybalaam Jul 4, 2023
7e26924
Make thread roots not in their thread
andybalaam Jul 5, 2023
ee19047
Add a changelog
andybalaam Jul 5, 2023
0e17420
Clarify stream order and handle redactions
andybalaam Jul 5, 2023
a32a3f9
Small clarifications
andybalaam Jul 5, 2023
58102cf
Include updated defintion of read/unread event
andybalaam Jul 5, 2023
8ecb0bb
Provide JSON examples
andybalaam Jul 5, 2023
c452342
Remove TODO list
andybalaam Jul 5, 2023
2d6a03e
Wrap code blocks in pre
andybalaam Jul 5, 2023
fba3bbe
Format code with pre only
andybalaam Jul 5, 2023
f4899a0
Fix misplaced space
andybalaam Jul 5, 2023
afa9629
Add missing "in"
andybalaam Jul 5, 2023
90e5798
Explain what the spec issue is about
andybalaam Jul 5, 2023
76217a0
Remove redundant part of main thread definition
andybalaam Jul 5, 2023
c9bdac8
Reflect the spec saying thread roots are not in main
andybalaam Jul 5, 2023
501bfac
Restrict to just redacted events being always read
andybalaam Jul 5, 2023
17bfe74
Add missing "a"
andybalaam Jul 5, 2023
c02285b
Servers probably also think thread roots are in main
andybalaam Jul 5, 2023
a48c1d0
Move thread_id into content thanks to conversation with Nico
andybalaam Jul 5, 2023
aeb0650
Mention 3051 as an alternative to thread_id
andybalaam Jul 5, 2023
d58758f
Move thread_id to cleartext of content
andybalaam Jul 6, 2023
bbf1c94
Cut down to just ordering
andybalaam Jul 6, 2023
36a28e5
Small brevity improvements
andybalaam Jul 7, 2023
be77a58
Note that order should be inserted by servers
andybalaam Jul 7, 2023
04c4606
Move order to be a sibling of ts in receipts
andybalaam Jul 7, 2023
6e027e2
Reword motivation paragraph
andybalaam Jul 13, 2023
5499b04
Formatting
andybalaam Jul 13, 2023
7493391
Mention the specific API used to get older messages
andybalaam Jul 13, 2023
7d3df44
Make clear the order only needs to increase relative to messages in t…
andybalaam Jul 13, 2023
c45a5e4
Formatting
andybalaam Jul 13, 2023
1b668ce
Clarify the consistency guarantees and meaning of order
andybalaam Jul 13, 2023
d2fe0f4
Include order in redacted events too
andybalaam Jul 13, 2023
694317e
Emphasise the room level when talking about server calculation
andybalaam Jul 13, 2023
be0f7ac
Add a note that this applies to other users' receipts too
andybalaam Jul 17, 2023
e650a11
Acknowledge that we actually are changing the definition of read/unread
andybalaam Jul 17, 2023
7d1728e
Fix typo
andybalaam Aug 3, 2023
12cde91
Fix missing e in mxid
andybalaam Aug 3, 2023
6950a59
Make order on a receipt mandatory
andybalaam Aug 4, 2023
2543428
Document the idea of just never sending receipts that don't have orde…
andybalaam Aug 4, 2023
605eadd
Add a note about fully-read markers being out of scope
andybalaam Nov 30, 2023
d749fb1
Add notes about negative event order
andybalaam Nov 30, 2023
6859b8d
Add a note about negative order meaning event is read
andybalaam Nov 30, 2023
d2cc49d
Add a note about order not being identical
andybalaam Nov 30, 2023
c7a8192
Fix spelling error
andybalaam Dec 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
386 changes: 386 additions & 0 deletions proposals/4033-event-thread-and-order.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,386 @@
# MSC4033: Explicit ordering of events for receipts
andybalaam marked this conversation as resolved.
Show resolved Hide resolved
andybalaam marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a quick note: I have strong suspicions that: a) this is not possible to do in a way that doesn't limit scalability of servers, b) this isn't necessarily the order in which clients should render events, and c) trying to handle filling in gaps is complicated as you need to deal with edge cases.

I'm sure we can do something close here, but needs investigation from server teams to figure out viability.

One change that might help a lot here is: instead of inserting an ordering on all events, instead have an opaque "receipt_key" field for events coming down /sync or /messages, which you can pass to the /receipt API. This is a lot less powerful than a full linearized history, but feels immediately more viable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also: the server sometimes doesn't know the correct ordering of two historical events immediately (e.g. it has two disconnected chunks of DAG). This is less of a problem for /sync and read receipts (as historical events don't go down /sync or count as unread, server side at least), but may be more of one for other use cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also: the server sometimes doesn't know the correct ordering of two historical events immediately (e.g. it has two disconnected chunks of DAG).

this is easily handled by my binary tree idea, the server just picks an arbitrary spot in the tree and then changes it when it knows the correct spot. since the tree nodes' order can be changed without changing message/node ids, this makes it more flexible than using a numeric order.


The [spec](https://spec.matrix.org/unstable/client-server-api/#receipts) states
that receipts are "read-up-to" without explaining what order the events are in,
so it is difficult to decide whether an event is before or after a receipt.

We propose adding an explicit order number to all events, so that it is clear
which events are read.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with numbers is that you can't fill gaps between them, so if you already have 15 and 16, there's no place inbetween them. Instead, you could treat the order as a string of digits instead with the property that
15 < 151 < 16

For easier understanding, you can compare this to decimal numbers: 0.15 < 0.151 < 0.16

Copy link

@programmerjake programmerjake Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you need to be able to insert a new value between any existing pair, you can use Dyadic Rationals, basically a value of the form a * 2^b where a and b are integers.
e.g. if you have 23 and 24, you can use 47 * 2^-1 = 23.5

Copy link

@toger5 toger5 Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an array as the index is also a concept that can be found in CRDT's.
[1], [1,1], [1,2], [1,2,1],[2]
(very similar to what timokoesters is proposing but strings bring more typing ambiguity/issues.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of storage/memory used per event (thus bandwidth usage), dyadic rationals sound optimal: sufficiently large space, only require two u32 or two u64. Strings or array representations would require bigger allocations. (Nobody really proposed it, but FP numbers would be a waste in terms of storage space b/o all the NaNs + IEEE754 is hard to get right.)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dyadic rationals only require one byte for the exponent, unless you want to support a value with >256 bits (imo excessive), so they only need 5 bytes (32-bit mantissa) or 9 bytes (64-bit mantissa), though you can definitely use more for alignment or convenience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that we will need to be able to handle edge cases where we run out of exponents, e.g. you have an event A and you keep inserting events just after it (which may very well end up a common case). If you only have a one byte exponent then you quickly run out of room.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you run out of exponent bits, that also means the mantissa has to be at least 256-bits wide (or something like that), so you will need big-integer arithmetic for that...i think that's likely excessive.

maybe a better idea is to maintain a binary tree with one node per message and have the server send the list of changed nodes (if the message's contents doesn't change, all that needs to be sent is the node ids of the tree's children, since that's all that changes during tree balancing), this allows tree balancing to avoid the tree getting too deep.

messages would then be ordered by their position in an in-order traversal of that tree.


This proposal covers receipts, and not fully-read markers. Fully-read markers
have the same issue in terms of ordering, and should probably be fixed in a
similar way, but they are not addressed here.

## Motivation

To decide whether a room is unread, a Matrix client must decide whether it
contains any unread messages.

Similarly, to decide whether a room has notifications, we must decide whether
any of its potentially-notifying messages is unread.

Both of these tasks require us to decide whether a message is read or unread.

To make this decision we have receipts. We use the following rule:

> An event is read if the room contains an unthreaded receipt pointing at an
> event which is *after* the event, or a threaded receipt pointing at an event
> that is in the same thread as the event, and is *after* or the same as the
> event.
>
> Otherwise, it is unread.

(In both cases we only consider receipts sent by the current user, obviously. We
consider either private or public read receipts.)

To perform this calculation we need a clear definition of *after*.

### Current definition of *after*

The current spec (see
[11.6 Receipts](https://spec.matrix.org/latest/client-server-api/#receipts)) is not clear
about what it calls "read up to" means.

Clients like Element Web make the assumption that *after* means "after in Sync
Order", where "Sync Order" means "the order in which I (the client) received the
events from the server via sync", so if a client received an event and another
event for which it has a receipt via sync, then the event that was later in the
sync or received in a later sync, is after the other one.

See
[room-dag-concepts](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)
for some Synapse-specific information on Stream Order. In Synapse, Sync Order is
expected to be identical to its concept of Stream Order.

See also [Spec Issue #1167](https://github.com/matrix-org/matrix-spec/issues/1167),
which calls out this ambiguity about the meaning of "read up to".

### Problems with the current definition

The current definition of *after* is ambiguous, and difficult for clients to
calculate. It depends on only receiving events via sync, which is impossible
since we sometimes want messages that did not arrive via sync, so we use
different APIs such as `messages` or `relations`.

The current definition also makes it needlessly complex for clients to determine
whether an event is read because the receipt itself does not hold enough
information: the referenced event must be fetched and correctly ordered.

Note: these problems actually apply to all receipts, not just those of the
current user. The symptoms are much more visible and impactful when the current
user's receipts are misinterpreted than for other users, but this proposal
covers both cases.

## Proposal

We propose to add an explicit order number to events and receipts, so we can
easily compare whether an event is before or after a receipt.

This order should be a number that is attached to an event by the server before
it sends it to any client, and it should never change. It should,
loosely-speaking, increase for "newer" messages within the same room.

The order of an event may be negative, and if so it is understood that this
event is always read. The order included with a receipt should never be
negative.

The ordering must be consistent between a user's homeserver and all of that
user's connected clients. There are no guarantees it is consistent across
different users or rooms. It will be inconsistent across federation as there is
no mechanism to sync order between homeservers. For this reason, we propose that
`order` be included in an event's `unsigned` property.

This proposal attaches no particular meaning to the rate at which the ordering
increments. (Although we can imagine that some future proposal might want to
expand this idea to include some meaning.)

### Examples

Example event (changes are highlighted in bold):

<pre>{
"type": "m.room.message",
"content": {
"body": "This is an example text message",
"format": "org.matrix.custom.html",
"formatted_body": "&lt;b&gt;This is an example text message&lt;/b&gt;",
"msgtype": "m.text"
},
"event_id": "$143273582443PhrSn:example.org",
"origin_server_ts": 1432735824653,
"room_id": "!jEsUZKDJdhlrceRyVU:example.org",
"sender": "@example:example.org",
"unsigned": {
"age": 1234,
<b>"order": 56764334543</b>
andybalaam marked this conversation as resolved.
Show resolved Hide resolved
}
}</pre>

Example encrypted event (changes are highlighted in bold):

<pre>{
"type": "m.room.encrypted",
"content": {
"algorithm": "m.megolm.v1.aes-sha2",
"sender_key": "<sender_curve25519_key>",
"device_id": "<sender_device_id>",
"session_id": "<outbound_group_session_id>",
"ciphertext": "<encrypted_payload_base_64>"
}
"event_id": "$143273582443PhrSn:example.org",
"origin_server_ts": 1432735824653,
"room_id": "!jEsUZKDJdhlrceRyVU:example.org",
"sender": "@example:example.org",
"unsigned": {
"age": 1234,
<b>"order": 56764334543</b>
}
}</pre>

Example receipt (changes are highlighted in bold):

<pre>{
"content": {
"$1435641916114394fHBLK:matrix.org": {
"m.read": {
"@erikj:jki.re": {
"ts": 1436451550453,
<b>"order": 56764334544,</b>
}
},
}
},
"type": "m.receipt"
}</pre>

We propose:

* all events should contain an `order` property inside `unsigned`.
* all receipts should contain an `order` property alongside `ts` inside the
information about an event, which is a cache of the `order` property within
the referred-to event.
andybalaam marked this conversation as resolved.
Show resolved Hide resolved

The `order` property in receipts should be inserted by servers when they are
creating the aggregated receipt event.

If the server is not able to provide the order of a receipt (e.g. because it
does not have the relevant event) it should not send the receipt. If a server
later receives an event, allowing it to provide an order for this receipt, it
should send the receipt at that time. Rationale: without the order, a receipt is
not useful to the client since it is not able to use it to determine which
events are read. If a receipt points at an unknown event, the safest assumption
is that other events in the room are unread i.e. there is no receipt.

If a receipt is received for an event with negative order, the server should set
the order in the receipt to zero. All events with negative order are understood
to be read.

Note that the `order` property for a particular event will probably be the same
for every user, so will be repeated multiple times in an aggregated receipt
event. This structure was chosen to reduce the chance of breaking existing
clients by introducing `order` at a higher level.

### Proposed definition of *after*

We propose that the definition of *after* should be:

* Event A is after event B if its order is larger.
andybalaam marked this conversation as resolved.
Show resolved Hide resolved

We propose updating the spec around receipts
([11.6 Receipts](https://spec.matrix.org/latest/client-server-api/#receipts))
to be explicit about what "read up to" means, using the above definition.

### Definition of read and unread events

We propose that the definition of whether an event is read should include the
original definition plus the above definition of *after*, and also include this
clarification:

> (Because the receipt itself contains the `order` of the pointed-to event,
> there is no need to examine the pointed-to event: it is sufficient to compare
> the `order` of the event in question with the `order` in the receipt.)

Further, it should be stated that events with negative order are always read,
even if no receipt exists.

### Order does not have to be unique
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a conversation with @erikjohnston about not wanting to constrain server implementations, I added this section. Comments welcome.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think the problem with this is where the client gets a bunch of messages with an ordering much lower than the current ordering of the room. Do you a) treat them as read immediately, or b) make it hard to correctly mark them as read?


If this proposal required the `order` property to be unique within a room, it
might inadvertently put constraints on the implementation of servers since some
linearised process would need to be involved.

So, we do not require that `order` should be unique within a room. Instead, if
two events have the same `order`, they are both marked as read by a receipt with
that order.

Events with identical order introduce some imprecision into the process of
marking events as read, so they should be minimised where possible, but some
overlap is tolerable where the server implementation requires it.

So, a server might choose to use the epoch millisecond at which it received a
message as its order. However, if a server receives a large batch of messages in
the same millisecond, this might cause undesirable behaviour, so a refinement
might be the millisecond as the integer part and a fractional part that
increases as the batch is processed, preserving the order in which the server
receives the messages in the batch.

If a server were processing multiple batches in parallel, it could implement
this in each process separately, and accept that some events would receive
identical orders, but this would be rare in practice and have little effect on
end users' experience of unread markers.

### Redacted events

Existing servers already include an `unsigned` section with redacted events,
despite `unsigned` not being mentioned in the [redaction
rules](https://spec.matrix.org/unstable/rooms/v10/#redactions).

Therefore we propose that redacted events should include `order` in exactly the
same way as all room events.

## Discussion

### What order to display events in the UI?

It is desirable that the order property should match the order of events
displayed in the client as closely as possible, so that receipts behave
consistently with the displayed timeline. However, clients may have different
ideas about where to display late-arriving messages, so it is impossible to
define an order that works for all clients. Instead we agree that a consistent
answer is the best we can do, and rely on clients to provide the best UX they
can for late-arriving messages.

### Stream order or Topological Order?

The two orders that we might choose to populate the `order` property are "stream
order" where late-arriving messages tend to receive higher numbers, or
"Topological Order" where late-arriving message tend to receive lower numbers.

We believe that it is better to consider late-arriving messages as unread,
meaning the client has the information that these newly arrived messages have
not been read and can choose how to display it (or not). This is what leads us
to suggest Stream Order as the correct choice.

However, if servers choose Topological Order, this proposal still works - we
just have what the authors consider undesirable behaviour regarding
andybalaam marked this conversation as resolved.
Show resolved Hide resolved
late-arriving events (they are seen as read even though they are not).

### Inconsistency across federation

Because order may be inconsistent across federation[^1], one user may
occasionally see a different unread status for another user from what that user
themselves see. We regard this as impossible to avoid, and expect that in most
cases it will be unnoticeable, since home servers with good connectivity will
normally see events in similar orders. When servers have long network splits,
there will be a noticeable difference at first, but once messages start flowing
normally and users start reading them, the differences will disappear as new
events will have higher Stream order than the older ones on both servers.

[^1]: In fact, order could also be inconsistent across different users on the
same home server, although we expect in practice this will not happen.

The focus of this proposal is that a single user sees consistent behaviour
around their own read receipts, and we consider that much more important that
the edge case of inconsistent behaviour across federation after a network split.

## Implementation Notes

Some home servers such as Synapse already have a concept of Stream Order. We
expect that the order defined here could be implemented using Stream Order.

## Potential issues

This explicitly allows receipts to be inconsistent across federation. In
practice this is already the case in the wild, and is impossible to solve using
Stream Order. The problems with using Topological Order (and Sync Order) have
already been outlined.
clokep marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives

### Solves the same problem MSC3981 Relations Recursion tried to solve

This proposal would not replace
[MSC3981: /relations recursion](https://github.com/matrix-org/matrix-spec-proposals/pull/3981)
but would make it less important, because we would no longer depend on the
server providing messages in Sync Order, so we could happily fetch messages
recursively and still be able to slot them into the right thread and ordering.

Note that the expectation (from some client devs e.g. me @andybalaam) was that
MSC3981 would solve many problems for clients because the events in a thread
would be returned in Sync Order, but this is not true: the proposal will return
events in Topological Order, which is useless for determining which events are
read.

### The server could report which rooms are unread

We could use the definitions within this proposal but avoid calculating what was
unread on the client. Instead we could ask the server to figure out which rooms
are unread.
andybalaam marked this conversation as resolved.
Show resolved Hide resolved

The client will still need to know which events are unread in order to process
notifications that are encrypted when they pass through the server, so this
proposal would probably be unaltered even if we added the capability for servers
to surface which rooms are unread.

### Location of order property in receipts

Initially, we included `order` as a sibling of `m.read` inside the content of a
receipt:

<pre>{
"content": {
"$1435641916114394fHBLK:matrix.org": {
<b>"order": 56764334544,</b>
"m.read": { "@rikj:jki.re": { "ts": 1436451550453, "thread_id": "$x" } },
"m.read.private": { "@self:example.org": { "ts": 1661384801651 } }
}
},
"type": "m.receipt"
}</pre>

We moved it inside the content, as a sibling to `ts`, because multiple existing
clients (mautrix-go, mautrix-python and matrix-rust-sdk) would have failed to
parse the above JSON if they encountered it without first being updated.

### Drop receipts with missing order information

In the case where a server has a receipt to send to the client, but does not
have the event to which it refers, and therefore cannot find its order, we
proposed above that the server should hold the receipt until it has the relevant
event, and send it then.

Alternatively, we could simply never send the receipt under these circumstances.
We believe that this is reasonable because it is not expected to happen for the
user's own events, which are the most critical to provide accurate read
receipts, and implementing the "hold and send later" strategy may cause extra
work for the server for little practical gain.

## Security considerations

None highlighted so far.

## Unstable prefix

TODO

## Dependencies

None at this time.

## Acknowledgements

Formed from a discussion with @ara4n, with early review from @clokep. Built on
ideas from @t3chguy, @justjanne, @germain-gg and @weeman1337.

## Changelog

* 2023-07-04 Initial draft by @andybalaam after conversation with @ara4n.
* 2023-07-05 Remove thread roots from their thread after conversation with @clokep.
* 2023-07-05 Make redactions never unread after conversation with @t3chguy
* 2023-07-05 Give a definition of Stream Order
* 2023-07-05 Be explicit about Stream Order not going over federation
* 2023-07-05 Mention disagreeing about what another user has read
* 2023-07-05 Move thread_id into content after talking to @deepbluev7
* 2023-07-06 Reduced to just order. Thread IDs will be a separate MSC
* 2023-07-06 Moved order deeper within receipts to reduce existing client impact
* 2023-07-13 Include order with redacted events after comments from @clokep
Loading