Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3984: Sending key queries to appservices #3984

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
201 changes: 201 additions & 0 deletions proposals/3984-sending-key-queries-to-appservices.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,201 @@
# MSC3984: Sending key queries to appservices

As mentioned in [MSC3983](https://github.com/matrix-org/matrix-spec-proposals/pull/3983), appservices
can save some data by accepting OTK claims directly, however they would still need to upload device
keys for their users. If an appservice wanted to be completely independent from the homeserver and
avoid uploading any keys, it would use this MSC to proxy the `/keys/query` endpoint alongside MSC3983.

*Note*: Readers are encouraged to read MSC3983 for background on this problem space.

Appservices which implement this MSC would generally be expected to *not* upload fallback keys or
one-time keys - implementations can make decisions to optimize for this usecase if they choose.

## Proposal

Similar to MSC3983, the homeserver *always* proxies the following APIs for an appservice's exclusive users
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reivilibre pointed out during the Synapse review that we might want to shout a bit louder that this only applies to exclusive users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And maybe some rationale as to why bad things would happen if you allow non-exclusive users?)

to the appservice using the new API described below:
* [`/_matrix/client/v3/keys/query`](https://spec.matrix.org/v1.6/client-server-api/#post_matrixclientv3keysquery)
* [`/_matrix/federation/v1/user/keys/query`](https://spec.matrix.org/v1.6/server-server-api/#post_matrixfederationv1userkeysquery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a /user/devices query which returns device information in a slightly different format (and some other information that we don't talk about in this MSC, I think).

Synapse uses this to "sync device lists" when it does not know the devices of a user (or if it thinks devices are out of sync). In order to properly pipe this information back and forth I think we need to somehow proxy this to appservices too.

In matrix-org/synapse#15539 I was able to "proxy" it by actually making the /keys/query request to the appservice and reformatting the returned data. I'm not sure this is the ideal way though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


**`POST /_matrix/app/v1/keys/query`**
```jsonc
// Request
{
"@alice:example.org": ["DEVICEID"],
"@bob:example.org": [], // indicates "all device IDs"
// ...
}
```
```jsonc
// Response
// This is the same response format as `/_matrix/client/v3/keys/query`
{
"device_keys": {
"@alice:example.org": {
"DEVICEID": {
"algorithms": [
"m.olm.v1.curve25519-aes-sha2",
"m.megolm.v1.aes-sha2"
],
"device_id": "DEVICEID",
"keys": {
"curve25519:DEVICEID": "3C5BFWi2Y8MaVvjM8M22DBmh24PmgR0nPvJOIArzgyI",
"ed25519:DEVICEID": "lEuiRJBit0IG6nUf5pUzWTUEsRVVe/HJkoKuEww9ULI"
},
"signatures": {
"@alice:example.org": {
"ed25519:DEVICEID": "dSO80A01XiigH3uBiDVx/EjzaoycHcjq9lfQX0uWsqxl2giMIiSPR8a4d291W1ihKJL/a+myXS367WT6NAIcBA"
}
},
"unsigned": {
"device_display_name": "Alice's mobile phone"
},
"user_id": "@alice:example.org"
}
},
// ... also a @bob:example.org record if appropriate
},
"master_keys": {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the cross-signing fields probably don't work as easily as we're portraying them here.

TODO: Figure out how cross-signing works, and represent it accordingly.

"@alice:example.org": {
"keys": {
"ed25519:base64+master+public+key": "base64+master+public+key"
},
"usage": [
"master"
],
"user_id": "@alice:example.org"
}
// ... also a @bob:example.org record if appropriate
},
"self_signing_keys": {
"@alice:example.org": {
"keys": {
"ed25519:base64+self+signing+public+key": "base64+self+signing+master+public+key"
},
"signatures": {
"@alice:example.org": {
"ed25519:base64+master+public+key": "signature+of+self+signing+key"
}
},
"usage": [
"self_signing"
],
"user_id": "@alice:example.org"
}
// ... also a @bob:example.org record if appropriate
},
"user_signing_keys": {
"@alice:example.org": {
"keys": {
"ed25519:base64+user+signing+public+key": "base64+user+signing+master+public+key"
},
"signatures": {
"@alice:example.org": {
"ed25519:base64+master+public+key": "signature+of+user+signing+key"
}
},
"usage": [
"user_signing"
],
"user_id": "@alice:example.org"
}
// ... also a @bob:example.org record if appropriate
}
}
```

*Note*: Like other appservice endpoints, this endpoint should *not* be ratelimited and *does* require
normal [authentication](https://spec.matrix.org/v1.6/application-service-api/#authorization).

If the appservice responds with an error of any kind (including timeout), the homeserver should treat
the appservice's response as `{}`. If the appservice indicates that the route is
Comment on lines +110 to +111
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we have some explanation for why this is safe to do?

This doesn't seem very convincing to me — it doesn't feel right that this functionality is so unimportant that we can just shrug off errors silently.
Given our pain with encryption bugs and the area this MSC surrounds, it'd be good to have this sort of stuff figured out before it lands in the spec and we have to try and work backwards

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in absence of having an editor open, the rationale is there's a lot which can go wrong so it's on the appservice to either implement proper error handling or not. For example, it may be possible to complete the request for one user/device, but not another: the appservice should return a partial response in this case, but it can also decide to throw a normal error instead.

The homeserver could retry the request if it wanted to, but ultimately for speed it's relatively important that the request fails fast. The vast majority of errors will be application errors that prevent any keys from being served, or the appservice being offline (in which case this API isn't able to be used).

The MSC acknowledges that this is a high risk area. I'd even go as far as recommending people don't use it unless they're willing to take on the implied operational stability requirements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I slightly see what you're saying, but it'd be good to have this written out and explained in the MSC, up to the point of what the actual impact on clients/users this is.

If the user triggers this to be queried and the AS is down, the response is assumed to be {} and then .... what? What happens? Does anything break?

I'm not super enthusiastic about having this, even with as much discouragement as possible, because eventually it's probably going to come down to one of us to dig into this and find out why things are buggy. However, if we must have it, let us at least understand what happens when it goes wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The short version is encryption as a whole breaks down: when clients don't know the device keys, they can't send to those devices. The appservice effectively needs higher uptime than the homeserver in this case.

It's similar to what would happen if you contact a new user while that user's homeserver is down (regardless of appservices being involved).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breakdown of encryption sounds like it deserves a mention in the 'major problems' section.

To be frank, I'm not keen — I don't really like the idea of having to debug this when it goes wrong. On balance, it seems like a large cost to pay for what seems like a fairly small benefit (if I understand the MSC properly). Is there really no alternative here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the potential issues section should say "No major issues not already mentioned" (with the assumption that this thread's context lands in the MSC).

As far as I'm aware, there's no viable alternative, though the MSC is less than a week old.

[unknown](https://spec.matrix.org/v1.6/application-service-api/#unknown-routes), homeservers can back
off to avoid slowing down requests. It would be assumed that if the endpoint is unknown to the appservice
that the appservice is uploading keys rather than aiming to proxy them.

The appservice's response is then merged on top of any details the homeserver currently holds, up to the
device level. For example, if the homeserver believes the following for Alice:

```jsonc
{
"device_keys": {
"@alice:example.org": {
"DEVICEID_1": { /* ... */ },
"DEVICEID_2": { /* ... */ }
}
},
"master_keys": {
"@alice:example.org": { /* ... */ }
},
"self_signing_keys": {
"@alice:example.org": { /* ... */ }
},
"user_signing_keys": {
"@alice:example.org": { /* ... */ }
}
}
```

... and the appservice responds with:

```jsonc
{
"device_keys": {
"@alice:example.org": {
"DEVICEID_1": { /* ... */ }
// note lack of DEVICEID_2
}
},
"master_keys": {
"@alice:example.org": { /* ... */ }
}
// note lack of self/user signing keys (an appservice doing this is unlikely, but here for illustrative purposes)
}
```

... the homeserver would merge the two, prioritizing the appservice's details.

```jsonc
{
"device_keys": {
"@alice:example.org": {
"DEVICEID_1": { /* ... */ }, // as per appservice response
"DEVICEID_2": { /* ... */ } // as per homeserver's existing knowledge
}
},
"master_keys": {
"@alice:example.org": { /* ... */ } // as per appservice response
},
"self_signing_keys": {
"@alice:example.org": { /* ... */ } // as per homeserver's existing knowledge
},
"user_signing_keys": {
"@alice:example.org": { /* ... */ } // as per homeserver's existing knowledge
}
}
```

## Potential issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't provide a way for the appservice to communicate to the homeserver that there's a new device (or that a device has changed). This would need to be used to apply the device management section of the federation spec so that remote servers actually pull new device information.

(This is akin to what needs to happen when /keys/upload is called on the Client-Server API.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is fully as speced or not, but for synapse at least it seems you can do a PUT /_matrix/client/v3/devices/<device ID> to force Synapse to send a m.device_list_update EDU for that device.


No major issues.

## Alternatives

No major alternatives.

## Security considerations

No major considerations.

## Unstable prefix

While this MSC is not considered stable, implementations should use
`/_matrix/app/unstable/org.matrix.msc3984/keys/query` as the endpoint instead. There is no version
compatibility check: homeservers implementing this functionality would receive an error from appservices
which don't support the endpoint and thus engage in the behaviour described by the MSC.

## Dependencies

This MSC has no direct dependencies, however is of little use without being partnered with something
like [MSC3202](https://github.com/matrix-org/matrix-spec-proposals/pull/3202) and
[MSC3983](https://github.com/matrix-org/matrix-spec-proposals/pull/3983).