-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] MSC4244: RFC 9420 MLS for Matrix #4244
Draft
turt2live
wants to merge
6
commits into
main
Choose a base branch
from
travis/msc/mls/00-core
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 4 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
e23079f
MSC: RFC 9420 MLS for Matrix
turt2live 07e8d6f
Assign number
turt2live 55b0616
Use MSC numbers
turt2live fbf0c67
the correct words help more than the wrong syntax
turt2live 3891900
Add keypackage stuff
turt2live fd9844c
add an issue
turt2live File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,307 @@ | ||
# MSC4244: RFC 9420 MLS for Matrix | ||
|
||
[RFC 9420](https://datatracker.ietf.org/doc/rfc9420/) defines the Messaging Layer Security (MLS) | ||
protocol for secure, end-to-end encrypted, group conversations. Matrix in its current design is capable | ||
of supporting multiple different types of encryption, including custom crypto, through the | ||
[`m.room.encryption`](https://spec.matrix.org/v1.13/client-server-api/#mroomencryption) state event. | ||
Much of the existing architecture is built around a Double Ratchet algorithm at its core (like Olm), | ||
which MLS is sufficiently different from. Therefore, architectural additions are needed to support | ||
MLS in an everyday Matrix room. | ||
|
||
Critically, MLS requires group changes ("Commits") to be strictly ordered across all participating | ||
servers, and must be append-only. This eliminates using traditional eventual consistency mechanisms | ||
as the key material will have already rolled forward by the time a server attempts to resolve a conflict. | ||
|
||
Meanwhile, the SCT has experimented with approaches such as | ||
[Decentralised MLS (DMLS)](https://gitlab.matrix.org/matrix-org/mls-ts/-/blob/decentralised2/decentralised.org) | ||
([MSC](https://github.com/matrix-org/matrix-spec-proposals/pull/2883)) | ||
and Linearized Matrix ([I-D](https://datatracker.ietf.org/doc/draft-ralston-mimi-linearized-matrix/), | ||
[MSC](https://github.com/matrix-org/matrix-spec-proposals/pull/3995)). These are valid approaches | ||
which come with different architectural tradeoffs. In the case of DMLS, forward secrecy is reduced | ||
(as key material must be retained in case of network partitions) and higher complexity is needed on | ||
the client-side to resolve conflicts. With Linearized Matrix, rooms become uncomfortably centralized | ||
on a server in the room. | ||
|
||
Continuing that work, we circulated some untested theories about how to restore those valuable | ||
properties in [these slides](https://conference.matrix.org/documents/talk_slides/LAB3%202024-09-20%2016_15%20Travis%20Ralston%20-%20DMLS,%20MIMI,%20etc.pdf) | ||
at Matrix Conference 2024. At the time, the idea was to lean a little less on Linearized Matrix, but | ||
still retain relative centrality in the room (participants could only continue using the room if they | ||
were on the 'good' side of a network partition). | ||
|
||
This still leaves some centralization, with a 'hub' server required per room in order to sequence and | ||
authenticate MLS commits, synchronizing them with Matrix membership state. This centralization could | ||
be mitigated by server elections, which would help to bifurcate the room with hubs on each side of a | ||
network partition, leading to those branches needing conflict resolution upon the partition being | ||
(partially) healed. Consensus mechanisms are thought to be the best approach to healing those conflicts, | ||
particularly when considering MLS's append-only requirement - the servers, in healing their connection, | ||
would reach a conclusion on what the MLS and room state is, and that result would become fact. This | ||
is notably different from the [existing state resolution algorithm](https://spec.matrix.org/v1.13/rooms/v11/#state-resolution) | ||
where participating servers are instructed on how to reach a factual representation of the room. | ||
Consensus mechanisms and leader election are deliberately left as concepts for a future MSC to discuss, | ||
though some non-normative ideas are discussed here to kickstart those future MSCs. The remainder of | ||
federation traffic (power level changes, message events, topic changes, etc) is sent over the normal | ||
full mesh federation in Matrix today, unaffected by the hub server. | ||
|
||
This proposal's changes, namely the partial centralization, are designed to be an opt-in feature at | ||
room creation time. It is therefore left to the room creators to decide what is best for their | ||
communities, particularly when it comes to encryption and the room model that it imposes. | ||
|
||
By using RFC 9420 MLS, this proposal also naturally brings cryptographically-constrained room membership | ||
to Matrix: users may only participate in the room if their devices are legally added to the MLS group | ||
state, rather than if their `m.room.member` event is successfully accepted to the room. This proposal | ||
still retains `m.room.member` for primarily communicating which users are "in" the room, even when | ||
they have no devices, but ties this state to the MLS group state to ensure authenticity. | ||
|
||
|
||
## Background | ||
|
||
In MLS, there is a concept of a Delivery Service (DS) which is responsible for tracking MLS group | ||
changes and membership. This DS can be a physical or logical server, and can have its role (theoretically) | ||
spread over multiple other servers. In Matrix, we'd ideally call the set of participating servers in | ||
a room the DS, however because of the linear append-only group state requirements, we assign this | ||
role to a single Matrix homeserver. | ||
|
||
Note that the group state is independent of the Matrix room state: room state tracks room configuration, | ||
as it always has, while group state tracks which devices are participating in that room. Which *users* | ||
are joined is tracked in room state, while their *devices* (if any) are tracked in group state. Group | ||
state is otherwise an encrypted binary blob passed around between devices, and stored on the DS. The | ||
DS has configurable visibility on the group state, up to and including zero visibility. | ||
|
||
|
||
## Proposal | ||
|
||
MLS can be enabled in a room only at creation time due to the room's underlying algorithms, like the | ||
authorization rules, changing behaviour depending on whether MLS is enabled. This is achieved using | ||
[MSC4245](https://github.com/matrix-org/matrix-spec-proposals/pull/4245)'s `encryption_algorithm` in | ||
the create event for the room. The initial Matrix-namespaced MLS encryption algorithm is `m.mls.10` | ||
to mirror the `mls10` `ProtocolVersion` defined by RFC 9420. `m.mls.*` encryption algorithms are | ||
*illegal* in `m.room.encryption` events, and clients MUST treat such configurations as though the | ||
room has an unknown encryption algorithm (unless of course `encryption_algorithm` is set, in which | ||
case `m.room.encryption`'s `algorithm` is meaningless under MSC4245). | ||
|
||
**Note:** The `m.mls.10` algorithm does not define primitives, against the specification's | ||
[request](https://spec.matrix.org/v1.13/client-server-api/#messaging-algorithm-names). This is because | ||
MLS is capable of changing its underlying ciphersuite and therefore primitives. Which ciphersuite is | ||
recommended, and how to figure out which one is in use, is discussed later in this proposal. | ||
|
||
**Note:** A new room version will be required due to the conditional behaviour of the underlying room | ||
algorithms. This is discussed in more detail later in this proposal. | ||
|
||
MLS Commits (and technically Proposals) run through a designated DS homeserver in the room. By default, | ||
this is the server which created the room. All other operations, like power level changes, messages, | ||
etc, transit the normal full mesh of Matrix. Transferring this role to another server is an unsolved | ||
problem in this version of the MSC (**TODO:** solve this). | ||
|
||
When a client wishes to Commit, Proposal, or other update to the MLS group state, it uses the | ||
to-device semantics defined by [MSC4246](https://github.com/matrix-org/matrix-spec-proposals/pull/4246), | ||
sending an `m.mls.message` event with the following respective contents: | ||
|
||
```jsonc | ||
{ | ||
"message": "<unpadded base64 MLSMessage>", // unpadded base64 per Matrix, MLSMessage per RFC 9420 | ||
|
||
// If a commit... | ||
"commit_info": { | ||
"welcome": "<unpadded base64 Welcome>", // optional | ||
} | ||
} | ||
``` | ||
|
||
**TODO:** It may be worth considering a GroupInfoOption similar to the MIMI concept at https://datatracker.ietf.org/doc/html/draft-ietf-mimi-protocol-02#name-update-room-state | ||
|
||
If the DS *rejects* the update, the client is informed of that with an `m.mls.rejected` to-device | ||
message, sent by the DS. Accepted updates are communicated to all joined devices in the MLS group | ||
over to-device from the DS as `m.mls.commit` events. | ||
|
||
**TODO:** Event/message shapes for `m.mls.rejected` and `m.mls.commit`. | ||
|
||
Providing deltas in this way prevents having to transfer large blobs of binary around servers and | ||
clients. Later in this proposal is a description of how a client (or server) can restore its MLS | ||
state if it were to lose it. | ||
|
||
**TODO:** Update/add auth rules to designate the DS server, including transfers. Implementations | ||
should meanwhile note it's the room creator, for experimentation purposes. | ||
|
||
**TODO:** KeyPackage exchange and device discovery (use existing Device Lists and OTK claim mechanisms) | ||
|
||
**TODO:** Ciphersuite negotiation to ensure devices can actually participate in the room (subset of | ||
OTK/KeyPackage exchange) | ||
|
||
MLS has its own notions of "membership", consisting of the *devices* in the room, which belong to | ||
users. This is different from Olm/Megolm which rely on the existing user-level membership denoted by | ||
`m.room.member` events - the devices simply belong to the room at the point in time an encrypted | ||
message is sent. Maintaining the concept of user-level membership is important to match the expected, | ||
and established, user experience where Alice joins a room, not Alice's laptop, and to keep as much | ||
compatibility with existing clients as possible - clients already show member lists as `m.room.member` | ||
events, and a large number of Matrix APIs do the same. | ||
|
||
Maintaining the `m.room.member` events structure also allows Matrix rooms to have a concept of being | ||
a member of the room with zero devices. A pure MLS room would consider the user to have left in this | ||
case, making recovery of the user's room list a feat. Instead, by keeping the user joined through | ||
`m.room.member`, the user can restore their room list when they go from zero to one (or more) devices, | ||
and use MLS to participate in the encrypted conversation again. Most notably however, the default | ||
operation mode of MLS prevents the user and their new devices from decrypting messages sent while | ||
they had zero devices. Clients SHOULD help their users understand this through explainer text or | ||
similar instead of showing a bunch of "unable to decrypt" errors. Alternatively, clients can make | ||
use of [MSC3814](https://github.com/matrix-org/matrix-spec-proposals/pull/3814)-style dehydrated | ||
devices to always consider the user as having 1 usable device. | ||
|
||
With those considerations, it's still important to have cryptographically-constrained membership, | ||
where the crypto layer (in this case, RFC 9420 MLS) has authority over the membership of the room, | ||
and proofs exist so other participants can verify that membership changes are legitimate. This is | ||
achieved by including the auth events which allow a user to perform a given action in the Additional | ||
Authenticated Data (AAD) of the MLS commit, and having the DS produce an `m.room.member` event which | ||
references that commit. Together, these layers allow downstream servers to authorize the event (and | ||
thus the commit), and clients SHOULD further verify the event by requesting the auth events individually | ||
and performing the subset of the [auth rules](https://spec.matrix.org/v1.13/rooms/v11/#authorization-rules) | ||
which apply to them (namely rule 4 and onwards, minus any server signature verification). | ||
|
||
After the user's initial device is added through MLS commits, their other devices may be added with | ||
minimal overhead. Likewise for device removals (as they get logged out, lost, etc). These still must | ||
be coordinated with the DS, but don't cause `m.room.member` events to be generated. | ||
|
||
The [membership transitions](https://spec.matrix.org/v1.13/client-server-api/#room-membership) change | ||
slightly to account for the changes actually happening within the MLS layer, though only in rooms | ||
where MLS is used. In other (possibly encrypted) rooms, the [existing auth rules](https://spec.matrix.org/v1.13/rooms/v11/#authorization-rules) | ||
and related algorithms apply unchanged. | ||
|
||
|
||
### Knocks | ||
|
||
Sending a knock does not exchange key material, and requires no changes. The knocking server coordinates | ||
with an already-joined server, which may be the DS, to send an `m.room.member` state event with | ||
`membership: knock` to the room, provided the auth rules permit such an action. | ||
|
||
If the knock is rejected, it's rejected using an MLS-independent `leave` event. If it's accepted, | ||
the invite sequence described below takes effect. | ||
|
||
|
||
### Invites | ||
|
||
From a cryptography point of view, an invite is essentially just adding the user to the room. In some | ||
cases this could appear as a force-join, especially when the sending user intends for the receiving | ||
user to see history immediately after the invitation. In other cases, the invite is more notional, | ||
like knocks above. To determine which case we're operating under, we rely on the [history visibility](https://spec.matrix.org/v1.13/client-server-api/#mroomhistory_visibility) | ||
for the room. | ||
|
||
If the room's history visibility is `joined`, the invite is notional and has no particular meaning to | ||
the MLS group state. The invite is sent as a regular `m.room.member` state event to the room, using | ||
the existing invite mechanisms. | ||
|
||
Otherwise, when the history visibility is `shared`, `world_readable`, or (critically) `invited`, the | ||
sending device must first retrieve a KeyPackage from one of the target user's devices. The first to | ||
respond with a suitable KeyPackage is the device which will act as the 'invited' device, and can add | ||
the user's other devices once fully added to the MLS group state. The sending device then prepares a | ||
Welcome message for the invited device, and asks their server for the `m.room.power_levels` and | ||
`m.room.history_visibility` event IDs. The sending device prepares an Add commit with the event IDs | ||
in the AAD (**TODO:** CSV?), and sends the combination of Add commit and Welcome message to the DS | ||
for inclusion in the MLS group state. | ||
|
||
The DS takes the Add and Welcome, runs the normal MLS-required checks, and further verifies that the | ||
auth events in the AAD are representative of current state for the room, and that they permit such an | ||
invite to happen. If any of these checks fail, the commit is rejected (and the client is informed of | ||
that). If they all pass, the DS forwards the Welcome message to the invited device (**TODO:** Define | ||
to-device message shape), sends the Add commit to all other joined devices (from the MLS group state | ||
perspective), and sends an `m.room.member` state event with `membership: invite` for the invited user. | ||
This event is signed by the target and sending servers (**TODO:** Using the existing /invite API, or | ||
a new one?), and includes a copy of the Add commit under a new top-level `mls_commit` field as | ||
[unpadded base64](https://spec.matrix.org/v1.13/appendices/#unpadded-base64). | ||
|
||
When a server receives the `m.room.member` event, the normal [auth rules](https://spec.matrix.org/v1.13/rooms/v11/#authorization-rules) | ||
apply with an added condition that the `mls_commit` MUST have AAD which references the same auth | ||
events as the membership event. Otherwise, the event is rejected. | ||
|
||
Servers MUST NOT inspect to-device messages, particularly those sent between the DS and local users. | ||
This is to ensure that the client receives the Add commit even if their server rejects the | ||
`m.room.member` event with `mls_commit`. A server which does intercept to-device messages would | ||
corrupt the encryption state for its users, making the room unusable for those devices. (**TODO:** | ||
Can we encrypt to-device messages between DS and users, to prevent inspection in the general case?) | ||
|
||
If a client receives the commit but no membership event, the client should assume the event was | ||
rejected by the server for some reason. The client can use the AAD to request the events and further | ||
verify if the membership event was supposed to be rejected, and choose to apply the Add commit | ||
accordingly. Clients SHOULD perform this check regardless of the membership event being accepted by | ||
their server. | ||
|
||
**Security consideration:** A server *could* lie about the `content` for an event when the client | ||
requests it by ID only. Perhaps the AAD should include the full federation-formatted (PDU) event JSON, | ||
because then the client can compare `content` because event IDs are hashes of the event, which covers | ||
the content hash, which (naturally) covers `content`. This would all be verifiable by the client | ||
without needing to know the intricacies of DAGs or state resolution - they can verify the event ID | ||
is correct, and compare `content` against what their server gave them for that same event ID. | ||
|
||
Clients receiving invites would receive key material for events they haven't seen yet, until they | ||
accept the invite and join the room (see below). If they reject the invite, they SHOULD discard key | ||
material they've collected. | ||
|
||
|
||
### Joins | ||
|
||
**TODO:** Similar to invites and knocks. Where the user is accepting an invite, just a normal | ||
membership change. Where joining from scratch, use external joins assisted by the DS. | ||
|
||
|
||
### Leaves | ||
|
||
**TODO:** DS or parting user spams Remove proposals, DS requires removal before it'll accept any | ||
other changes. | ||
|
||
|
||
### Kicks/Bans | ||
|
||
**TODO:** Similar to leaves, but with some added "you really need to remove these devices". | ||
|
||
|
||
### Adding/removing devices while joined to the room | ||
|
||
**TODO:** Happens purely in MLS, no Matrix state events required. | ||
|
||
|
||
## Potential issues | ||
|
||
Unresolved: | ||
* What happens if the DS no longer has any users in the room? | ||
* What if the DS doesn't transfer its role to another server? | ||
* The DS is effectively required to fully resolve the room state, and state res will need to be | ||
modified to rely on MLS group state (or its effects) as definitive truth. | ||
* How to actually specify ciphersuite and etc in the room? (probably just copy LM) | ||
* How to determine if your local server can behave as a DS? (try to create room with encryption_algorithm?) | ||
|
||
This proposal centralizes room membership operations onto a single server within the room (not across | ||
the federation), which may be undesirable to room operators. Rooms which want to retain full | ||
decentralization should not use this proposal's mechanism for encryption, instead relying on the | ||
existing Megolm standard. In future, it may be possible to retain the security properties of MLS in | ||
a fully decentralized environment. | ||
|
||
Adoption of off-the-shelf MLS also limits the ability to decrypt history from before the MLS Add | ||
Commit. Room operators should be aware of this limitation when deciding what encryption algorithm to | ||
use when creating the room. | ||
|
||
|
||
## Security considerations | ||
|
||
**TODO:** Improve this section. | ||
|
||
Keeping centralization to the absolute bare essentials is a strong consideration for this proposal. | ||
|
||
|
||
## Dependencies | ||
|
||
This proposal is dependent on [MSC4245](https://github.com/matrix-org/matrix-spec-proposals/pull/4245) | ||
and [MSC4246](https://github.com/matrix-org/matrix-spec-proposals/pull/4246). | ||
|
||
|
||
## Future considerations | ||
|
||
**TODO:** Discuss consensus from MXCONF2024 | ||
|
||
|
||
## Unstable prefix | ||
|
||
**TODO** | ||
|
||
|
||
## Credits | ||
|
||
Many thanks to Franziskus Kiefer and Karthikeyan Bhargavan at Cryspen for their thoughtful feedback | ||
on how best to integrate RFC 9420-standard MLS into Matrix. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any abuse potential?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's always abuse potential :p
what concerns are you thinking of here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not to firm in the details of comparable parts in current Matrix at this time, but actually regardless I was thinking of spam/DOS potential when not implementing this carefully enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spam/DOS to who though? KeyPackages are essentially just one-time keys, and treated accordingly.