Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC4080: Cryptographic Identities (Client-Owned Identities) #4080

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

devonh
Copy link
Contributor

@devonh devonh commented Nov 15, 2023

@devonh devonh changed the title MSC XXXX: Cryptographic Identities (Client-Owned Identities) MSC 4080: Cryptographic Identities (Client-Owned Identities) Nov 15, 2023
@devonh devonh changed the title MSC 4080: Cryptographic Identities (Client-Owned Identities) MSC4080: Cryptographic Identities (Client-Owned Identities) Nov 15, 2023
@turt2live turt2live added e2e requires-room-version An idea which will require a bump in room version proposal A matrix spec change proposal unassigned-room-version Remove this label when things get versioned. kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. labels Nov 15, 2023
### Additional Attack Vectors

Clients can modify events prior to signing them and sending them to the server for processing. This can lead to
issues if the client were to change something such as the `prev_events` which could lead to further problems.
Copy link
Contributor

@neilalexander neilalexander Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more than just an attack vector — it's a fundamental part of how events are created and it cannot be an afterthought.

As it stands today, clients generally know nothing of "prev_events" or forward extremities. When the client wants to send an event into a room, it sends a "client event" containing only client-controlled data to the server, which the server then pads out and populates with additional fields and signatures. The "prev_events" form the DAG relationships between events and populating the "prev_events" with as many real forward extremities as possible is what keeps the graph moving in a forward direction and prevents long-lived forks in the room graph (ideally).

It's also worth noting that only servers are tracking forward extremities at this stage — partly because they need to do so in order to work out the current state of the room and enforce things like soft-fail, partly because clients simply can't be expected to do that work with the information that they have. Clients often only know a subset of what the server knows about the room history and /sync doesn't make any guarantees that the responses won't contain gaps. Different servers in the room might have different ideas about what the forward extremities are depending on history, but eventually if all servers are participating actively in the room by sending events and populating "prev_events" correctly, this should reconcile eventually.

As I see it there are only really two workable options:

  1. The client signature should only cover certain fields of the event, such as the "content" and "type" etc, and a server-added signature would need to cover additional things such as "prev_events" fields — anyone receiving the event then needs to verify both the client and the server signatures
  2. The client needs to be informed of the forward extremities before every single event is created, at the risk that multiple clients can and will race in creating events — this would increase the room graph complexity which in turn increases the processing/state resolution costs on the servers (whereas right now this race can only happen between servers sending at the same time)

This is largely why the room join dance over federation has the additional /make_join step, as a server already participating in the room is required to populate things like "prev_events" into the template event before the joining server signs it and sends it to /send_join.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything you said is correct. And we came to the same conclusion of those being the 2 options.
We chose to go with option 2 because option 1 leads to issues where you need to delegate which server currently has signing authority for a user (ie. if a user moves to a new homeserver, then the old homeserver should no longer be able to sign and create events on behalf of the user. This becomes an issue since the old homeserver can replay old events from the client with different auth_events & prev_events, making it possible for a malicious homeserver to be able to do bad things)

The bulk of this proposal outlines the endpoint changes necessary to make option 2 work such that the client is able to sign full versions of every PDU.

I left out some words around how to prevent a client from changing event fields (such as prev_events) since I didn't think it was relevant to the spec and should only be an implementation detail. But the gist of it is that a server should store the hash (eventID) sent to the client and only accept signed events from a client that have a matching hash. This doesn't prevent a malicious homeserver from colluding with a maclicious client. But that's also true of today's matrix network.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preventing a client from changing the "prev_events" is a pretty important detail and needs some explanation in the MSC in that case, as that is a critical security component.

I also think that serious consideration needs to be made for the fact that multiple clients could end up requesting proto-events with the same "prev_events" while other clients are signing/uploading their signed events, which increases the number of forward extremities and increases the state resolution frequency & cost on the server. What will the server do to protect itself from lots of users in the same room creating a mass amount of forward extremities at the same time, either by dumb chance or by collusion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add words describing how a homeserver should protect against modified events.

RE: increased forward extremities - this issue is already present today with federated homeservers.

A new room version will be required to account for the modifications to the auth rules.

Invite events no longer require a signature from the invited user’s homeserver. This signature requirement does not
appear to have an obvious benefit and would make invite events overly onerous with the new room invite process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not entirely without benefit. The invited server signature is what allows participating servers to prove that the invited server was online at the time of the invite and that the invited server didn't have good reason to reject the invite, i.e. because the user didn't exist or similar. Otherwise it becomes considerably easier to just fill the room state with masses of obviously fake invite events for obviously fake users which servers then need to resolve forever more without ever knowing if they have a genuine use or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for enlightening me!

If it is a matter of keeping a signature specifically from the invited user's homeserver, then that should be easy enough to keep in place. It only becomes a problem if the signature needed to come from the client. (And since myself and others weren't aware of this benefit, we figured we could just remove the "unnecessary" homeserver signature)

We are trying to avoid homeserver signatures as much as possible on events to remove the burden of fetching server keys. The only other server signature location with pseudoIDs/cryptoIDs is in the mxid_mapping field of the m.room.member event to verify the mxid for the user belonging to that homeserver.

Copy link
Contributor

@neilalexander neilalexander Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for all intents and purposes, the signature from the remote server where the user is known to live (in a way that can be proven) is what proves that the invite was delivered at all. Otherwise you could end up in a situation where you have an invite state event for a remote user but the remote server was never notified (i.e. because it was unreachable/offline at the time), so the invited user doesn't know they were invited and the existing participants think the remote user was invited but really have no way of knowing if it really went through.

This also creates somewhat of a chicken-and-egg problem with your mxid_mapping field though as being able to send an invite over federation would require you not just to know the user identity but also which server they are resident on and to have some kind of attestation from the remote side claiming that the residency is valid.

So there's a few things that might break the federated invite flow here:

  1. What if you know the cryptographic identity but nothing else?
  2. What if you know the cryptographic identity and the server the user is resident on — how can the server prove that the user is truly resident there if the user isn't online at the time to prove it themselves?
  3. What if the invited user has ported to a new server in the meantime, leaving invites behind on an old server with an old server mapping? What happens when they try to accept the invite or join and existing mappings are no longer correct?
  4. What if someone is lying about where the invited user lives and the remote side refuses to handle the invite at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a diagram of the proposed federated invite flow to help clarify things.

The proposed addition of one-time cryptoIDs involved when inviting users to a room provides an opportunity for the invited user's homeserver to ensure that user is resident on that homeserver at the time. Otherwise the homeserver should fail the invite during the call to /make_invite.

Invites still use the mxid in the proposal so you don't (and can't) know a user's cryptoID until you have sent the invite and the user's homeserver has allocated one of the one-time cryptoIDs as the user's cryptoID for that room. Until that time the only thing that can be known is the user's mxid.

You could run into the case where the user moves to a new homeserver in the middle of the invite flow. This only becomes an issue if the user never receives the invite event from any homeserver. As long as they receive the event at some point, they should also have the matching private key necessary in order to accept the invite, regardless of which homeserver they are currently resident on. There might need to be something in place when designing account portability to help minimize how often this happens. (such as a grace period to forward such incoming events from the old homeserver to the new).

I may not be understanding your fourth point fully. If someone lies about where an invited user lives, this is no different than trying to invite a user with the wrong mxid in today's matrix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure I'm following — in the "one-time" crypto ID case, who owns the private key for that identity? If the client isn't the specific owner of that key and isn't online at the time to sign the proposal, then how can the client trust an invite that it learns about later, i.e. after a migration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The client owns the private keys for all their cryptoIDs (one-time or otherwise).


**Advantages**: This has the advantage of events being fully signed by the cryptoID and avoiding a second round trip.

**Disadvantages**: This has the disadvantage of requiring clients to do state resolution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clients cannot do state state resolution — they don't have the necessary information about the state before/after each event in order to do so.

In any case, the problem here is not state resolution, but that it requires the clients to track forward extremities, which they also cannot really do given the fact that /sync and friends do not return breadth-first scans of the graph but rather a linearisation of it. It's also likely that new forward extremities can build up further back in history from newly arriving events over federation and clients may not have any reason to find out about these until a new event is sent that refers to one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for further detailing why this alternative is not viable.
I'll update this section to better explain why we don't recommend it.

prevents attacks such as changing the `membership` state of another user. Signing `content` prevents a malicious
homeserver from generating arbitrary `content` on behalf of a client.

Even with the above mitigation, a malicious homeserver could still replay an event in the same room, with the same
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The natural solution here is to include the "prev_events" in the signature, which anchors that event to a specific position in the graph and effectively prevents replays elsewhere, but this would require rethinking the signature scheme. This still also assumes that the client knows what the "prev_events" even are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. That would be the natural solution if a client could somehow know what the prev_events are.
This is the main reason we stopped investigating the client-delegated signing alternative and have instead proposed the changes outlined in this MSC.

event without anyone knowing. This could include fields such as `type`, `state_key`, `prev_events`, and `room_id`.
In order to minimize the effects of a replay attack, the client should sign the combination of `type`, `state_key`,
and `content`. Signing `type` prevents reusing the contents in an event of another event `type`. Signing `state_key`
prevents attacks such as changing the `membership` state of another user. Signing `content` prevents a malicious
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some consideration needs to be made for power events, i.e. kicks and bans, where another user or the server itself needs to effectively take action on behalf of another user.


**Problem**: How can a client generate a usable nonce?

**Problem**: How could a homeserver validate a nonce as being unique without requiring them to know the entire room DAG?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, this isn't feasible, and it may well open up a possible attack vector whereby multiple events for the same "type" and "state_key" can be sent with the same "nonce" and then it creates a race where different servers might accept one or the other depending on which one they learn about first. If this happened with power events then that would have knock-on effects with event auth and you'd fracture the room.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad you agree 😄
If someone can come up with a feasible approach to protecting against such attacks then this alternative may be a better approach to what is being proposed in this MSC.


Servers should also check that the full event was signed by one of the keys present in the `allowed_signing_keys`
field, or by the cryptoID itself. If the event was not signed by one of these keys, the server should reject the
event. Allowing events to be signed by the cryptoID keeps the possibility of clients to perform state resolution
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already made mention of this in another comment but clients shouldn't and can't be expected to do state resolution. Not least because in order for them to do so, they need to track the state before/after each event and to track forward extremities, and they need the ability to contact other servers over federation to fill in those knowledge gaps, by which point you've effectively built a homeserver. That is why P2P was predicated on portable homeservers. There's no way forward where clients running state res is the right answer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also some clients fundamentally cant talk to other HS over fed since they have no way to properly do DNS which is a fundamental requirement we have for S-S Api endpoints.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alternative proposal of client-delegated event signing isn't predicated on being able to run state resolution. The comment was added only as an aside to outline that it wasn't being completely designed out of the picture if this solution were to be accepted.

`signatures` have been added (including the new `nonce`, and `allowed_signing_keys` fields). The `hash` is required in
order to be able to verify the event `signature` if the event `content` is ever redacted.

All events are now required to possess the above mentioned fields inside of `content`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicating many of the top-level event keys and putting them under client control inside "content" instead of properly rethinking the top-level event format feels like it will create technical debt that will never be repaid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the replay attack issue present in the client-delegated event signing alternative proposal can be solved then this could be looked at further. This approach of modifying content was documented to make it clear where the problem arises and why this alternative isn't being proposed in this MSC.

Copy link
Contributor

@MTRNord MTRNord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly believe in its current form its not a good direction this MSC proposes

@@ -0,0 +1,553 @@
# MSC4080: Cryptographic Identities (Client-Owned Identities)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While @neilalexander already mentioned lots of technical flaws I think in the current for even from a purely logical standpoint its illogical.

A) It feels like you want a HS in the client. I strongly doubt a Client dev would implement this even if it became merged as is
B) It feels like this would be what you expect from P2P clients but at that point since you anyway write a HS for that the whole MSC would be obsolete at its core since the HS is already the client.

Also on phones something like this feels like a burden if not even impossible due to the common battery optimizations by the OS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment 😄

It seems as though these opinions are based around the alternative proposals of either having clients delegate event signing to a homeserver or generating prev_events themselves somehow.

I agree with you that we should avoid spec changes that require clients to behave like homeservers. The alternative proposals have been listed to provide context as to how we arrived at the actual proposal.

The proposal presented in this MSC is only asking that clients now sign events. No other homeserver behaviour is expected of clients.

@benjistokman
Copy link

Hi. I wrote a small article about this: https://benstokman.me/blog/my-thoughts-on-msc-4080/

@devonh
Copy link
Contributor Author

devonh commented Dec 12, 2023

Hi. I wrote a small article about this: https://benstokman.me/blog/my-thoughts-on-msc-4080/

Thank you for taking the time to review the MSC.

I agree that things would be simpler if we could just use a user's Master Signing Key.

The problem with this is solely around deniability. Currently, Matrix's encryption protocol provides users with plausable deniability. If all events are now signed client-side with a user's Master Signing Key (or PGP key), this deniability aspect of encryption will be greatly diminished. The system proposed in this MSC was designed to try and preserve the existing features of the protocol.
CryptoIDs are not cryptographically connected to a user's Master Signing Key or device keys. A cryptoID could be forged on a user's behalf, and in combination with a colluding or compromised homeserver, could generate all of the same events.

If there is some mechanism to cut down on the number of keys while maintaining plausable deniability, then that would be the preferred approach. But this MSC is exploring what it would look like without that.

Also, a system that allows changing signing keys may be possible, but it is also full of potential issues due to federation and state resolution. The old keys could still be used to send events with prev_events that refer to a point in the room DAG before the signing keys were changed. In the current matrix, these events would be valid because they could have just been received late.
This gets especially dangerous when considering state events.

@benjistokman
Copy link

benjistokman commented Dec 22, 2023

Edit: article version: https://benstokman.me/blog/my-new-thoughts-on-msc-4080/

I found a middle ground:

  • Devon's design is better. Any kind of self-signed events introduces the risk of communication failure after key loss. However, a way to back up/restore key data would have to be made anyway to support portable accounts. Periodic reminders to to a user to back up their keys would be useful.
    • Perhaps a way to procedurally generate the cryptoIDs could be implemented so an older backup can restore newer keys. [Single use] encryption keys aren't as important. I am not sure if this is feasible or possible within the limitation of plausible denyability.
  • Only events that deal with invites, power level changes, etc. should require a user-signed event. If a user looses their key setup but still has their account password, having the ability to send out a message saying something like, "Hey I lost my keys we'll need to re-verify when we meet next" would be very useful—even if unencrypted. The other users' clients should still show a warning the same way as any unsigned and/or unencrypted event.
    • Valid server-signed events that are properly encrypted/signed should be treated the same way as a user-signed message of the same event type.
    • There should be a client-side option to "hide server-signed events" should a homeserver start sending spam.
  • There still needs to be a way to change singing keys. Being entirely unable to expire/replace cryptographic keys is plain and simple bad security practice.
    • In order to get around the race condition, there should be seprate events to add keys and another to remove keys. To do a key migration, a client first adds the new key, then prompts the user to open all their devices/applications. It then removes the old key some reasonable time in the future, say, 4 hours later.
  • There should be a setting per room that requires users to acknowledge account migrations before they are shown new messages. Many users will be confused by account migrations. Imagine if you're in an email chain with someone and you're suddenly composing a reply to an unrecognized address.
    • Putting a "this user has migrated their account" in the chat feed is not sufficient. There's no guarantee whatsoever that every user would always notice this. Requiring an action is better.
    • This would be a per-room setting similar to the "never send encrypted messages to unverified sessions" setting.
    • This should be the default in encrypted rooms only. Public+unencrypted rooms don't really need it.
    • Keys should not be sent to sessions before the user acknowledges the migration.
    • This may also help limit consequences from key leaks.

@erlend-sh
Copy link

https://matrix.org/blog/2024/03/why-matrix-org/

There are alternatives to a prominent onboarding homeserver like the Matrix.org homeserver. We don't find any of them satisfactory:

  • Picking a server randomly when a user wants to sign up in the app is confusing and has a lot of security implications. This is not acceptable.
  • Forcing the user to choose a server when they sign up is a highly cumbersome process that does not respect the user's time and resources.
  • Portable identities are very desirable but will take time to materialise.
  • P2P Matrix depends on a portable identity mechanism and is further away in the future.

It’s painfully obvious that portable identities are the best solution to a whole range of problems in the federated paradigm.

Since lack of funding is the main issue, I’d like to point @devonh & co. to https://summerofprotocols.com/research/sop2024

Specifically the Protocol Improvement Grant:

Protocol Improvement Grants: This track is aimed at discovering the general principles of improving protocols. The track will award grants of $90,000 each to 5 two-person teams to prototype and field-test improvements to a real-world protocol.

Comment on lines +230 to +235
```
{
event_id: string,
pdu: PDU
}
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, is the "event_id" field included in the PDU that needs signing? I assume it is, given that events currently have "event_id" outside the "unsigned" field, making it subject to being included in event signatures.

### Event Signing

Events are required to be signed by the cryptoID. In order for this to work with client-owned keys, clients need to
obtain the full version of events before they can be signed. This proposal introduces a few changes to the C-S API
Copy link

@andrewzhurov andrewzhurov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could client craft the full version of an event on its own?
afaik, the only field that server fills up is prev_events.
Client does have Room events locally (they may be stale, but I'd argue we want to bake that "stale" causality - that is indeed what client sees, baking server's causality is wrong, as it's not the one observed when creating an event. Prob intsead of calling it "stale" a better word is "true":).

Further, would allow clients to issue events while being offline.

UPD: I see it's been considered down below.


### Identity Sharing Between Devices

The cryptoIDs of a user are shared between devices using secret storage similar to the way encryption keys are shared.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we could make use of Profile Room to manage authorized devices & their currently used keys, as they are rotated over time.
This way keys are non-extractable, giving increased security.
And devices help each other to maintain user's identity. One device gets lost - unlink it.
Also, Profile Room can be used to anchor "interaction" (e.g., message) events, proving it's authenticity even across key rotations.


**Disadvantages**: This has the disadvantage of giving over full event control to the delegated homeserver. It also has
the disadvantage of trying to resolve `allowed_signing_keys` if a client wants to remove authority from a homeserver
or there are conflicts in the room DAG. Revocation of a delegated key is known to be extremely problematic.
Copy link

@andrewzhurov andrewzhurov May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good remark. Bummer we don't have prev_events with sync.
Serving clients full events would make it possible. (Synapse allows) This approach is mentioned in MSC3871.

Comment on lines +52 to +53
Fully formed PDUs are sent to this endpoint to be committed to a room DAG. Clients are expected to have signed the
events sent to this endpoint. Homeservers should reject any event which isn’t properly signed by the client.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since among a PDU's required fields is the timestamp of when the event was created (origin_server_ts), would it be an issue if there were a large delay between the time an event is created & when it's actually inserted into the DAG (i.e. if a client waits a long time between calling /send and /send_pdus)?

AIUI that would at least affect time-based syncing, such that events would be synced in timestamp order rather than DAG order (which would be different only if other events got inserted between /send and /send_pdus).

But given that timestamp/DAG discrepancy is allowed for timestamp massaging by appservices, perhaps this is nothing out of the ordinary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal requires-room-version An idea which will require a bump in room version unassigned-room-version Remove this label when things get versioned.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants