bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

turt2live · 2022-06-29T21:34:34Z

in a test to see if all the bits of encryption are working:

The sequence diagram for this would be the following (roughly). ExampleUser is a bot and TravisR is a human using Element Web.

sequenceDiagram
    participant TravisR
    participant ExampleUser
    ExampleUser->>TravisR: Create DM room
    TravisR->>ExampleUser: Establish Olm session (claim key, etc)
    TravisR->>ExampleUser: Send Megolm key
    TravisR->>ExampleUser: Send encrypted message
    ExampleUser->>ExampleUser: Query keys for joined members of the room (`updateTrackedUsers`)
    ExampleUser->>ExampleUser: Create room key & send it (`shareRoomKey` followed by a loop over `outgoingRequests`)
    ExampleUser->>ExampleUser: `encryptRoomEvent`
    ExampleUser->>TravisR: Send encrypted message

Hywan · 2022-07-05T07:10:16Z

Can you provide a minimal working example please? Or at least a source code where I can understand what's happening? Thanks.

poljar · 2022-07-05T11:04:14Z

Let me chime in here, since I suspect I know what's going on.

The step called Query keys for joined members of the room (updateTrackedUsers) in the sequence diagram is not actually querying the keys of the joined members, in other words a /keys/query request will not be sent out immediately after this method has been called.

It's only marking the member's devices to be tracked. Next time OlmMachine::outgoing_requests() is called, a /keys/query requests will be part of this list. Until this request is sent out, the device list will not be complete and thus the OlmMachine won't try to share the room key with the missing devices.

By the time the second message from the image has been received OlmMachine::outgoing_requests() has been processed and the device list has been populated. The OlmMachine will now share the room key with the previously missing devices but the room key is now in a ratcheted state and the first message won't be decryptable (this is not a bug, this is a feature, new devices don't have access to message history).

There are nowadays ways to wait for a /keys/query to be done, if some given users are marked to be part of such a requests, but since this is a manual test I suspect that the bot-sdk is marking users as tracked (calling updateTrackedUsers) way too late. From the sequence diagram it seems that it's doing so right before it's trying to send the room message.

Marking users as tracked should be done as part of the sync response processing.

poljar · 2022-07-05T11:45:55Z

Actually looking at the sequence diagram a bit more, this step seems to be a bit misguided:

Create room key & send it (shareRoomKey followed by a loop over outgoingRequests)

Processing outgoingRequests should happen after a sync response has been processed, which does include calls to updateTrackedUsers.

If you reverse this step, first process outgoingRequests and then call shareRoomKey this issue should go away, shareRoomKey doesn't produce any outgoingRequests anyways, it gives you the requests you need to send out directly as the return value.

Also, if you do end up calling outgoingRequests() in other places, instead just after a sync has been handled, remember that you need a lock to not send out duplicate requests.

turt2live · 2022-07-05T15:33:08Z

it gives you the requests you need to send out directly as the return value.

but it also doesn't give enough information to mark those requests as sent, so the client ends up sending them out as part of the next outgoingRequests loop.

There's very little documentation around when and how to call outgoingRequests - calling it only on sync feels far too slow. Because of how the bot-sdk works, the tracked users are being updated during sync, just a few hundred lines of code between the start of sync processing and the tracked users update. It's not entirely feasible to move the tracking up in the code without unreasonable performance penalties at the moment.

poljar · 2022-07-06T10:04:23Z

it gives you the requests you need to send out directly as the return value.

but it also doesn't give enough information to mark those requests as sent, so the client ends up sending them out as part of the next outgoingRequests loop.

It does not give you enough information? What is missing? You need three things to mark a request as sent:

The unique id
The request type
The response

shareRoomKey returns a list of /sendToDevice requests, i think you could mark the request as sent even if you never get a response:

/sendToDevice requests have a transactionId, this is used as the unique request ID
Well, you know the type, it's a RequestType::ToDevice
/sendToDevice responses are always an empty {}

As for those requests ending up as part of outgoingRequests, are you sure about that? AFAIK requests from shareRoomKey never end up in the list that outgoingRequests returns.

There's very little documentation around when and how to call outgoingRequests - calling it only on sync feels far too slow. Because of how the bot-sdk works, the tracked users are being updated during sync, just a few hundred lines of code between the start of sync processing and the tracked users update. It's not entirely feasible to move the tracking up in the code without unreasonable performance penalties at the moment.

Only receiveSyncChanges() will produce any changes in the state machine that will queue up requests for outgoingRequests(). That's a bit of an oversimplification, because sending out an /keys/claim request might do so as well.

So the way other implementations use outgoingRequests() is:

Receive a sync response from the server
Call receiveSyncChanges() as the first part of the sync handling
Do your usual sync handling, with two important E2EE steps:
1. Call updateTrackedUsers() when there's a new user in an E2EE enabled room
2. Call decryptRoomEvent() if you want to decrypt room events
Optionally, call getMissingSessions() with an empty user list.
Call outgoingRequests()
Go back to 1.

You can bundle 4 and 5 into a method and call the bundle once or twice per sync.

A couple more rules:

Do not call outgoingRequests() in a loop until you have no requests anymore.
You must ensure that you don't send out the same request multiple times out.
Either use a lock to make sure that you don't call outgiongRequests() multiple times at the same time, or deduplicate the requests that you get from outgiongRequests().

The same rule applies to getMissingSessions() and shareRoomKey(), you can have a lock per room for shareRoomKey(). Those calls will cache and persist requests that you need to send out, the requests will be part of the return value until you have marked them as sent (there are exceptions to this, i.e. if you discard a room key manually but it's generally true).

If you have more questions like these please feel free to ask them, I'll try to update the relevant docs with this information.

turt2live · 2022-07-06T16:42:49Z

/sendToDevice requests have a transactionId, this is used as the unique request ID

this part definitely wasn't clear :p

I was seeing the requests show up in outgoingRequests and didn't see a request ID/type so assumed that it would be safe. For documentation, it would be really nice to have it spelled out how you're supposed to use the functions as there's lots of room for assumption/interpretation to go sideways (as it has in my case).

I'll rework my code to use the approach mentioned in your comment as best as I can and report back. For clarity: updateTrackedUsers doesn't need to be called with a complete set of all known users? (how does one stop tracking a user?)

poljar · 2022-07-06T16:57:32Z

/sendToDevice requests have a transactionId, this is used as the unique request ID

this part definitely wasn't clear :p

I think the Rust side has a request_id() method which makes this somewhat clearer but still sub-optimal.

I'll rework my code to use the approach mentioned in your comment as best as I can and report back. For clarity: updateTrackedUsers doesn't need to be called with a complete set of all known users? (how does one stop tracking a user?)

So the way device tracking works in Matrix land is that the server is supposed to give you the list of users you share encrypted rooms with which have new or changed devices.

Synapse doesn't really respect the whole "users you share an encrypted room with", instead it seems to do so for any user you share a room with. It also doesn't give you any users in this list when you initially sync.

To avoid fetching devices of users you don't care about, and to initially populate the list of devices you do care about, we'll need to tell the Rust side which users we are sharing an encrypted room with.

This is what updateTrackedUsers does, the intention is to call this with the subset of users you share an encrypted room with. You don't need to stop tracking users as, presumably, the server won't notify you about any changes from an user you don't need to track anymore.

If you fail to mark users for tracking we won't have the list of devices for this user and the user won't in return receive any room keys.

turt2live · 2022-07-06T17:04:01Z

sorry, I'm still unclear with how I'm supposed to use updateTrackedUsers: can I call it with just the users who had their devices changed, or do I somehow need to keep track of all the user IDs I've supplied to that function and keep calling it with that?

turt2live · 2022-07-06T17:09:58Z

having re-read your comment a few times though, it sounds like the client needs to keep track itself and somehow make it work. This is a bit harder for the bot-sdk, but not impossible.

poljar · 2022-07-06T17:10:06Z

You need to inspect m.room.member events, those are not passed to the Rust side since it would duplicate the membership information.

If the m.room.member events if for an encrypted room, and the membership state of the member is join or invite, you pass it to updateTrackedUsers.

Once a user has been marked for tracking you don't need to mark it again, though if you do, it'll be a noop.

Users who have their devices changed are passed to the Rust side with the receiveSyncChanges() method, so you don't need to keep track of who has a changed device.

In other words you only need to call this method with:

Users that become part of an encrypted room.
All users of a room if it has become encrypted.

turt2live · 2022-07-06T19:25:58Z

right, using the library correctly seems to fix it - thanks :)

A worked example would cover a lot of the traps I fell into, I think. At least something to denote how to handle the combination of updateTrackedUsers, shareRoomKey, and outgoingRequests.

turt2live mentioned this issue Jun 29, 2022

Replace rust-sdk bindings with ones that work turt2live/matrix-bot-sdk#236

Merged

14 tasks

turt2live changed the title ~~feat(bindings/crypto-nodejs): first message sent by OlmMachine has wrong message index~~ bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index Jun 29, 2022

Hywan self-assigned this Jun 30, 2022

Hywan added bug Something isn't working bindings labels Jun 30, 2022

turt2live closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

turt2live commented Jun 29, 2022

Hywan commented Jul 5, 2022

poljar commented Jul 5, 2022

poljar commented Jul 5, 2022

turt2live commented Jul 5, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022

turt2live commented Jul 6, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022

bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

Comments

turt2live commented Jun 29, 2022

Hywan commented Jul 5, 2022

poljar commented Jul 5, 2022

poljar commented Jul 5, 2022

turt2live commented Jul 5, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022

turt2live commented Jul 6, 2022

poljar commented Jul 6, 2022

turt2live commented Jul 6, 2022