Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4256: RFC 9420 MLS mode Matrix #4256

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dklimpel
Copy link
Contributor

@dklimpel dklimpel commented Jan 24, 2025

@ara4n ara4n added the proposal A matrix spec change proposal label Jan 24, 2025
@ara4n ara4n changed the title MSC: RFC 9420 MLS mode Matrix MSC4256: RFC 9420 MLS mode Matrix Jan 24, 2025
@tulir tulir added e2e client-server Client-Server API kind:core MSC which is critical to the protocol's success labels Jan 24, 2025
@turt2live turt2live added the implementation-needs-checking The MSC has an implementation, but the SCT has not yet checked it. label Jan 25, 2025

![Invite flow][invite-flow]

## Potential issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about display name and avatar url of members? Are they stored in the MLS group context too? And if so, how to prevent malicious users from changing them.

less strict validation, those rules tend to favour stricter validation. Some of those rules could be
made more lenient to allow more extensibility if such a need is expected.

1. if state_key is not empty, reject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe splitting hairs but empty and missing are considered equivalent here, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it probably needs to be reworded. Events with a missing state_key have to be able to pass this rule.

9. If the `can_propose` has entries not in `servers`, reject
10. Otherwise, allow
10. If the event type is `m.mls.pending_commit`:
1. If the epoch of this event is not exactly the epoch of the previous `m.mls.commit` event + 1, reject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the current m.mls.commit event" might be clearer here since m.mls.pending_commit doesn't directly replace that event. This could maybe also be applied to 9.2.

7. If the `powers` has entries not in `servers`, reject
8. If the `powers` start not with exactly the same entries in the same order as the subset of
entries in the previous `powers` above the `origin` , reject
9. If the `can_propose` has entries not in `servers`, reject
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we need to enforce that entries from powers also appear in can_propose to enable the process described in "Commit handling with multiple servers" below?

{
"type": "m.room.encrypted",
"content": {
"algorithm": "m.megolm.v1.aes-sha2",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be something starting with m.mls.v1?

Currently redactions are not supported in the outer protocol layer. This has the benefit of making
redactions invisible to the server, but also prevents any server side aggregation. This also means
content can’t be deleted. As we require all events to be encrypted, this might not be a major
problem, but policy makers might disagree. Redactions could be supported in various ways and we
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redactions are not only relevant in the context of data privacy but also for moderation. Not being able to delete malicious or illegal content seems like a significant problem. 😕

the expense of some of the security benefits of MLS. At its core having a coherent set of members in
a room and the ability to change membership at any time without the ability to broadcast these
changes to other servers in the room are at conflict. If any server can modify historical membership
of a room, this compromises Post-Compromise Security (PFS). In an earlier proposal, decentralised
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acronym here is a little confusing because it doesn't fit the written-out form:

Is it suppposed to be "Perfect Forward Secrecy (PFS)" or "Post-Compromise Security (PCS)"?

Copy link
Member

@uhoreg uhoreg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only read through part of this, but here are some (mostly nitpicky) comments so far. Generally, there's inconsistency between using "mls" and "MLS". It would be nice to use "MLS" everywhere.


Every valid state event in this room version is required to have an empty state key.

The `m.room.create` event has a mandatory `algorithm` field now to set the encryption algorithm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bikeshedding: I'd suggest using encryption_algorithm as the property name, as defined in MSC2425, to make it clear that it's encryption-related.

Every valid state event in this room version is required to have an empty state key.

The `m.room.create` event has a mandatory `algorithm` field now to set the encryption algorithm
for the room. This would be the MLS cipherset prefixed with `m.mls.v1.` for example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for the room. This would be the MLS cipherset prefixed with `m.mls.v1.` for example
for the room. This would be the MLS ciphersuite prefixed with `m.mls.v1.` for example

encrypted events with only an `algorithm` and `ciphertext` field. It contains the current MLS
commit as a MLS private message.

The MLS commit contains a CBOR encoded field in its unencrypted authenticated data of the MLS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give some rationale here for why you're using CBOR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is this CBOR-encoded data the only thing in the authenticated data, or are there other things that are in there alongside it?

less strict validation, those rules tend to favour stricter validation. Some of those rules could be
made more lenient to allow more extensibility if such a need is expected.

1. if state_key is not empty, reject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it probably needs to be reworded. Events with a missing state_key have to be able to pass this rule.

4. Considering the events auth events:
1. If there is not exactly one `m.room.create` event (with implicitly an empty state key), reject
2. If the event is not the `m.room.create` event (we never get here) or an `m.mls.commit` event
with epoch == 0 (we do get here), then if there is no `m.mls.commit` event in the auth events,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see anywhere in this MSC that mentions where the epoch comes from. I think it may be helpful to mention something about that somewhere, since this MSC will be read by people not completely familiar with MLS. (It comes from the ciphertext, which despite being called "ciphertext", contains some cleartext bits, including the epoch. Which means that servers will be expected to be able to parse the MLS ciphertext, at least enough to read the epoch, group ID, and anything else that's needed.)

5. If the `content` of the `m.room.create` event in the room state has the property `m.federate` set
to `false`, and the `origin` of the event does not match the `origin` domain of the create event,
reject.
6. If the `origin` is not listed in the `servers` of the previous `m.mls.commit` event (unless it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to explain what "previous" means here (presumably, the m.mls.commit event that is listed in the event's auth_events).

6. If the `powers` has entries not in `servers`, reject
7. If the `can_propose` has entries not in `servers`, reject
8. Otherwise, allow
11. If the epoch of the `ciphertext` is not the epoch of the current `m.mls.commit` event + 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the epoch of the ciphertext be equal to the epoch of the current m.mls.commit event (which I assume means the m.mls.commit event that is referenced in the event's auth_events)? Otherwise you're encrypting with an epoch that hasn't been created yet?

the first commit), reject
7. If the room_id or the MLS group_id do not match the room_id of the create event, reject
8. If the algorithm of the event does not match the create event, reject
9. If the event type is `m.mls.commit`:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reject if no state_key?

should ignore the `m.mls.proposed_commit`.

Clients can know their `m.mls.proposed_commit` failed to be applied if they haven’t received a
`m.mls.commit` event matching their `m.mls.proposed_commit` after "len(powers)*5" minutes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can clients also infer that their commit will fail to be applied if they receive a different m.mls.commit with the same epoch as their commit?


State resolution might result in some clients losing access to messages as they threw their private
state away immediately to provide PCS and forward secrecy. In that case the clients dropped by the
state event will have to be sent another welcome to rejoin them into the room.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way for clients to ask to be re-added? How do others know that the client needs to be re-added?

Having conflicting commits will lead to some undecryptable messages. For short durations that can be
mitigated by keeping the old state around for seconds to minutes. This is the common case when
people invite new users at the same time or do otherwise racing modifications on different servers.
For long durations no automatic recovery is possible and manual communication is necessary. We
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what this "manual communication" would entail? Are users going to have to talk to each other out-of-band and poke at the cryptographic state in their clients?


As no member events are used in this room version, the direct /state endpoint can not be used to
change membership in a room

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are user's new devices/deleted devices handled? Who is in charge of adding/removing them from the group? How do people leave the group?

not provide the security guarantees we are looking for and is incompatible with MLS. If state
resolution allows an arbitrary long amount of time between one server changing membership and other
servers being made aware of that change, secret keys to resolve those commits have to be kept around
for the same amount of time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be noted that the plan for MSC2883 was to try to apply https://eprint.iacr.org/2023/394 to address the issue of needing to keep private keys, so that epochs could not be derived a second time. However, we did not manage to work on it, so it is absolutely fair to say that the MSC as it currently stands has that weakness. (And, of course, applying that would be a further deviation from standard MLS.)


### Room state

Currently all state is stored in one GroupContext extension. This has obvious size limitations.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, I believe that this means that having the state in the GroupContext means that the entire state will be encoded in every single commit.


There are several options for how a client can send a new commit into a room.

* Users on the first server listed in the "powers" list can send commit state events directly. They
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if multiple people on that first server send a commit state event at the same time? Does he server just select one to accept and return an error to the others? If so, what error code does it use?


![Invite flow][invite-flow]

## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the possible delay in applying a commit due to "powers" servers being offline is a significant potential issue that should be listed here. For example, if the first server listed in "powers" is offline, then when you invite someone, it will take 5 minutes before they can actually be in the room. (Could they end up being halfway in the room, if the inviter closes their client after they try to invite, but before they see the commit event?)

This, combined with encrypted state, may also cause problems for MatrixRTC, which IIRC uses various state events. If it takes over 5 minutes to set up a call, or for someone to join a call, that could be problematic.

people invite new users at the same time or do otherwise racing modifications on different servers.
For long durations no automatic recovery is possible and manual communication is necessary. We
expect those cases to be rare even in larger federations. Long term server outages usually don’t
include membership modifications at the same time as usually one server is completely offline. Even
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Membership modifications aren't the only reasons for sending commits, though. Group members are expected to periodically send commits (or Updates which are committed by others), which will probably be more frequent than membership changes.

* Servers not listed as the first entry in the "powers" list should send a "m.mls.pending_commit"
as a normal message. This will then be applied according to the sorting rules of that server, or
according to the rules in the following section, and a commit state event will be sent by one of the
homeservers listed in the "powers" list. Only users on servers in the "can_propose" list can send
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This nearly tripped me the first time I read it, so a note for future readers: in the current existing room versions, if a server wants to send an event to a room, it needs to use a user ID of a room member that is on that server to send the event. However, in rooms covered by this MSC, there is no sender field, so servers can send events on their own, without needing to invoke a user ID.

After this the commit in the room is updated. Once this commit is accepted by the delivery service,
the inviting user should send out the welcome message as a to_device message and send an invite
using the /invite endpoint. To distribute the ratchet tree the ratchet tree MLS extension is
currently required.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In large groups, the Welcome message can get quite large, so it may not be appropriate to send the whole thing as a to-device message. In my MSC2883 work, I put large Welcome messages in the media repository, and sent a reference to the media repository as a to-device message. This has some advantages (e.g. the client can download the large message when it wants to, rather than have it clog up the /sync, and the message doesn't need to be base64-encoded). But it has some other problems (e.g. we don't have a way to delete the media when it's no longer needed, the sender's server could be down when the recipient wants to download it). We may need to look into some better way of sending large blobs between devices, as this is an issue that affects all MLS-related MSCs, as well as possibly some other things that we're working on.

user to the MLS group. The key packages should be validated and should be cross-signed as configured
in the additional authenticated data of the current MLS commit.
After this the commit in the room is updated. Once this commit is accepted by the delivery service,
the inviting user should send out the welcome message as a to_device message and send an invite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any definition of how the Welcome message is sent as a to-device event (e.g. what event type it uses, what properties the event has). Can you add that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
client-server Client-Server API e2e implementation-needs-checking The MSC has an implementation, but the SCT has not yet checked it. kind:core MSC which is critical to the protocol's success proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants