-
Notifications
You must be signed in to change notification settings - Fork 380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC3202: Encrypted appservices #3202
base: old_master
Are you sure you want to change the base?
Changes from 10 commits
d56527b
efabfe4
3f394a8
f551ae5
4f6e256
9001a67
472226b
937e4c5
9893a5f
208decb
e586c39
f875ffe
47d2080
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# MSC3202: Encrypted Appservices | ||
|
||
Presently, appservices in Matrix are capable of attaching themselves to a homeserver for high-traffic | ||
bot-like usecases, such as bridging and operationally expensive bots. Traditionally, these appservices | ||
only work in unencrypted rooms due to not having enough context on the encryption state to actually | ||
function properly. | ||
|
||
This MSC targets the missing bits to support encryption at the appservice level: other MSCs, such as | ||
[MSC2409](https://github.com/matrix-org/matrix-doc/pull/2409) and [MSC2778](https://github.com/matrix-org/matrix-doc/pull/2778) | ||
give appservices foundational pieces to get device IDs and to-device messages, as required by encryption. | ||
|
||
## Proposal | ||
|
||
This proposal takes inspiration from [MSC2409](https://github.com/matrix-org/matrix-doc/pull/2409) by | ||
defining a new set of keys on the appservice `/transactions` endpoint, similar to sync: | ||
|
||
```json5 | ||
{ | ||
"events": [ | ||
// as defined today | ||
], | ||
"ephemeral": [ | ||
// MSC2409 | ||
], | ||
"to_device": [ | ||
// MSC2409 | ||
], | ||
"device_lists": { | ||
"changed": ["@alice:example.org"], | ||
"left": ["@bob:example.com"] | ||
}, | ||
"device_one_time_keys_count": { | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"@_irc_bob:example.org": { | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"DEVICEID": { | ||
"curve25519": 10, | ||
"signed_curve25519": 20 | ||
} | ||
} | ||
}, | ||
"device_unused_fallback_key_types": { | ||
"@_irc_bob:example.org": { | ||
"DEVICEID": ["signed_curve25519"] | ||
} | ||
} | ||
} | ||
``` | ||
|
||
These fields are heavily inspired by [the extensions to /sync](https://matrix.org/docs/spec/client_server/r0.6.1#id84) | ||
in the client-server API. | ||
|
||
All the new fields can be omitted if there are no changes for the appservice to handle. For | ||
`device_one_time_keys_count` and `device_unused_fallback_key_types`, the format is slightly different | ||
from the client-server API to better map the appservice's user namespace users to the counts. Users | ||
in the namespace without keys or which have unchanged keys since the last transaction can be omitted | ||
(more details on this later on). Note that fallback keys are described in | ||
Comment on lines
+53
to
+55
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How should the AS distinguish between users without keys and users with unchanged keys if you can omit them for both cases? In my implementation thus far I've been sending empty dicts for the 'without keys' case. I'm now reading that this is unnecessary*, but would be interested in the answer to the above. *I'm especially sceptical of 'no unused fallback keys' and 'no change in your unused fallback keys' having the same representation over the wire (and thus being indistinguishable). I personally think it's 'wrong' :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MSC2732 has this to say about including the field in /sync: The device_unused_fallback_key_types parameter must be present if the server supports fallback keys. Clients can thus treat this field as an indication that the server supports fallback keys, and so only upload fallback keys to servers that support them. My personal opinion here is that the per-user entries should be omissible if and only if there have been no changes for that user. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agh, after reading my code, I notice that I also include empty lists for all the devices if there are no unused fallback keys. I think this is in keeping with what There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory the appservice is doing its own state tracking for whether or not it has uploaded keys (of any variety). If for some reason it loses this state, it can do two things to recover:
So it should be perfectly safe to exclude fields in both cases, though the MSC is also assuming the server is working off a transaction model under the hood and over the wire: it's a lot easier to see that the appservice was already informed of a change when there's a transaction log to say that was sent. Whether the server supports fallback keys or not is somewhat irrelevant for appservices: if an appservice requires it, it can just mark it in documentation for server compatibility. Plus, with the new Matrix versioning, the appservice can declare a minimum supported spec version that operators can compare against their homeserver installs. Clients are a bit different because it's a lot easier to accidentally/on purpose use a modern client against an old server, so the feature check becomes a bit more important (at least while fallback keys weren't in a spec release). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @turt2live Mostly reasonable, but I'm left with the question of: how is the appservice meant to distinguish between 'no difference in your fallback keys' and 'all your fallback keys have been consumed' if both of them are represented by omission? My suggestion for how this should work would be: if a device entry is specified, then the list (UFBKs)/dict (OTK counts) is the whole new state for that device and replaces the AS's current state for that device. If a device entry is omitted, then there's no change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The state change of being unused to used would be in a transaction, but further transactions don't need to include it. This is represented by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Perfect; I think the wording of the '...users without keys can be omitted...' should be altered to clarity that |
||
[MSC2732](https://github.com/matrix-org/matrix-doc/pull/2732) as of writing. | ||
|
||
Like MSC2409, any user the appservice would be considered "interested" in (user in the appservice's | ||
namespace, or sharing a room with an appservice user/namespaced room) would qualify for the device | ||
list changes section. | ||
turt2live marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Note that it's typical for clients to pause sync loops when processing device list changes to avoid | ||
a scenario where they are unable to decrypt/encrypt a message from/to a particular device. Appservices | ||
are expected to mirror this by ensuring the transaction request does not complete until processing | ||
is complete. In the worst case, the server will time out the request and retry it verbatim, so | ||
appservices might wish to track which device list changes in which transaction they already processed | ||
or keep processing transactions in the background while retries are attempted. | ||
|
||
In order to allow the appservice to masquerade as its users, an extension to the existing | ||
[identity assertion](https://matrix.org/docs/spec/application_service/r0.1.2#identity-assertion) | ||
ability is proposed. To compliment the (optional) `user_id` when using an `as_token` as an access | ||
token, a similarly optional `device_id` query parameter is proposed. When provided, the server asserts | ||
that the device ID is valid for the user, and that the appservice is able to masquerade as that user. | ||
Comment on lines
+72
to
+73
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what should be the HTTP and matrix-level error code if it's not valid? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for lack of a better answer, let's reuse 400 |
||
If valid, that device ID should be assumed as being used for that request. For many requests, this | ||
means updating the "last seen IP" and "last seen timestamp" for the device, however for some endpoints | ||
it means interacting with that device (such as when uploading keys). | ||
|
||
### Optimization: when to send OTKs/fallback keys | ||
|
||
As mentioned above, in order to keep the transaction byte size down the server can (and should) exclude | ||
OTK counts and unused fallback keys when they haven't changed since the last transaction. Appservices | ||
however should be tolerable of the server over-communicating the counts as a quick/cheap approach would | ||
be to *always* include the OTK counts/unused fallback keys for all known users rather than trying to | ||
detect changes. | ||
|
||
As a middle ground, servers might be interested in an algorithm which doesn't detect changes between | ||
transactions but does attempt to reduce traffic. If the appservice is about to receive an event or | ||
message typically associated with encryption, the counts for the affected users could be included. This | ||
would result in the following rules: | ||
* If an `m.room.encrypted` event is being included in the transaction's `events`, include OTK counts and | ||
unused fallback key types for all appservice users which reside in that room. | ||
* If an appservice user is receiving a to-device message in the transaction's `to_device` array, include | ||
OTK counts and unused fallback key types for that user. | ||
|
||
This approach has the advantage of typically minimal changes to the internals of the homeserver, works | ||
similar to `/sync`, and reduces noisy traffic in the transaction sending. This additionally still honours | ||
the "when they change, send the counts" requirement to a reasonable degree: typically a use of an OTK will | ||
be followed by a to-device message. It is theoretically possible for an appservice to run out of OTKs if | ||
a remote user claims all OTKs without actually using them. Implementations may be interested in | ||
[MSC2732: Fallback keys](https://github.com/matrix-org/matrix-doc/pull/2732) which will avoid a scenario | ||
where the appservice can no longer decrypt messages. | ||
|
||
However, as mentioned, servers are free to include this information as little or often as they'd like, | ||
provided they send it at least as often as when it changes. | ||
|
||
### Optimization: Don't encrypt as often | ||
|
||
Appservices theoretically do not need to establish Olm sessions with other appservice users as the appservice | ||
will typically be managing the devices in one place. In short, this means that a room with 10k appservice | ||
users and only 1 non-appservice user can be sped up by only encrypting from the appservice's users to the | ||
non-appservice user. The appservice would not need to set up 10k * 10k Olm sessions given the encryption | ||
and decryption all happens in the same place. As an added bonus, this improves performance of the appservice | ||
as it doesn't have to handle to-device messages sent to itself. | ||
|
||
Some implementations might not be able to support this sort of optimization though, so it is still permitted | ||
to establish sessions and such between appservice users. | ||
|
||
## Potential issues | ||
|
||
Servers would have to track and send this information to appservices, however this is still perceived | ||
to be more performant than appservices using potentionally thousands of `/sync` streams. | ||
|
||
Appservices additionally cannot opt-in (or out) of this functionality unlike with MSC2409. It is | ||
expected that servers will optimize for not including/calculating the fields if the appservice has | ||
no interest in the information. Specifically, appservices which don't have any keys under their user | ||
namespace can be assumed to not need device list changes and thus can be optimized out. | ||
|
||
## Alternatives | ||
|
||
An endpoint for appservices to poll could work, though this is extra work for the appservice and would | ||
likely need pagination and such, which is all heavyweight for the server. Instead, having the server | ||
batch up updates and send them to the appservice is likely faster. | ||
|
||
## Security considerations | ||
|
||
None relevant - this is the same information the appservice would get if it spawned `/sync` streams for | ||
all the users in its namespace. | ||
|
||
## Unstable prefix | ||
|
||
While this MSC is not considered stable for implementation, implementations should use `org.matrix.msc3202.` | ||
as a prefix to the fields on the `/transactions` endpoint. For example: | ||
* `device_lists` becomes `org.matrix.msc3202.device_lists` | ||
* `device_one_time_keys_count` becomes `org.matrix.msc3202.device_one_time_keys_count` | ||
* `device_unused_fallback_key_types` becomes `org.matrix.msc3202.device_unused_fallback_key_types` | ||
|
||
Appservices which support encryption but never see these fields (ie: server is not implementing this in an | ||
unstable capacity) should be fine, though encryption might not function properly for them. It is the | ||
responsibility of the appservice to try and recover safely and sanely, if desired, when the server is not | ||
implementing this in an unstable capacity. This is not a concern once the MSC becomes stable in a released | ||
version of the specification, as servers will be required to implement it. | ||
|
||
For servers wishing to force appservices to opt-in to this behaviour, they may use `org.matrix.msc3202: true` | ||
in the registration file. Servers will be able to check for "opt-in" behaviour once this MSC is stable by | ||
seeing whether or not the appservice has an encryption-capable device recorded in its users namespaces. | ||
|
||
To use device ID masquerading, implementations should use `org.matrix.msc3202.device_id` instead of `device_id` | ||
in the query string while this MSC is considered unstable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.