Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index #802

Closed
Tracked by #236
turt2live opened this issue Jun 29, 2022 · 11 comments
Closed
Tracked by #236
Assignees
Labels
bindings bug Something isn't working

Comments

@turt2live
Copy link
Member

in a test to see if all the bits of encryption are working:
image

The sequence diagram for this would be the following (roughly). ExampleUser is a bot and TravisR is a human using Element Web.

sequenceDiagram
    participant TravisR
    participant ExampleUser
    ExampleUser->>TravisR: Create DM room
    TravisR->>ExampleUser: Establish Olm session (claim key, etc)
    TravisR->>ExampleUser: Send Megolm key
    TravisR->>ExampleUser: Send encrypted message
    ExampleUser->>ExampleUser: Query keys for joined members of the room (`updateTrackedUsers`)
    ExampleUser->>ExampleUser: Create room key & send it (`shareRoomKey` followed by a loop over `outgoingRequests`)
    ExampleUser->>ExampleUser: `encryptRoomEvent`
    ExampleUser->>TravisR: Send encrypted message
Loading
@turt2live turt2live changed the title feat(bindings/crypto-nodejs): first message sent by OlmMachine has wrong message index bindings/crypto-nodejs: first message sent by OlmMachine has wrong message index Jun 29, 2022
@Hywan Hywan self-assigned this Jun 30, 2022
@Hywan Hywan added bug Something isn't working bindings labels Jun 30, 2022
@Hywan
Copy link
Member

Hywan commented Jul 5, 2022

Can you provide a minimal working example please? Or at least a source code where I can understand what's happening? Thanks.

@poljar
Copy link
Contributor

poljar commented Jul 5, 2022

Let me chime in here, since I suspect I know what's going on.

The step called Query keys for joined members of the room (updateTrackedUsers) in the sequence diagram is not actually querying the keys of the joined members, in other words a /keys/query request will not be sent out immediately after this method has been called.

It's only marking the member's devices to be tracked. Next time OlmMachine::outgoing_requests() is called, a /keys/query requests will be part of this list. Until this request is sent out, the device list will not be complete and thus the OlmMachine won't try to share the room key with the missing devices.

By the time the second message from the image has been received OlmMachine::outgoing_requests() has been processed and the device list has been populated. The OlmMachine will now share the room key with the previously missing devices but the room key is now in a ratcheted state and the first message won't be decryptable (this is not a bug, this is a feature, new devices don't have access to message history).

There are nowadays ways to wait for a /keys/query to be done, if some given users are marked to be part of such a requests, but since this is a manual test I suspect that the bot-sdk is marking users as tracked (calling updateTrackedUsers) way too late. From the sequence diagram it seems that it's doing so right before it's trying to send the room message.

Marking users as tracked should be done as part of the sync response processing.

@poljar
Copy link
Contributor

poljar commented Jul 5, 2022

Actually looking at the sequence diagram a bit more, this step seems to be a bit misguided:

Create room key & send it (shareRoomKey followed by a loop over outgoingRequests)

Processing outgoingRequests should happen after a sync response has been processed, which does include calls to updateTrackedUsers.

If you reverse this step, first process outgoingRequests and then call shareRoomKey this issue should go away, shareRoomKey doesn't produce any outgoingRequests anyways, it gives you the requests you need to send out directly as the return value.

Also, if you do end up calling outgoingRequests() in other places, instead just after a sync has been handled, remember that you need a lock to not send out duplicate requests.

@turt2live
Copy link
Member Author

it gives you the requests you need to send out directly as the return value.

but it also doesn't give enough information to mark those requests as sent, so the client ends up sending them out as part of the next outgoingRequests loop.

There's very little documentation around when and how to call outgoingRequests - calling it only on sync feels far too slow. Because of how the bot-sdk works, the tracked users are being updated during sync, just a few hundred lines of code between the start of sync processing and the tracked users update. It's not entirely feasible to move the tracking up in the code without unreasonable performance penalties at the moment.

@poljar
Copy link
Contributor

poljar commented Jul 6, 2022

it gives you the requests you need to send out directly as the return value.

but it also doesn't give enough information to mark those requests as sent, so the client ends up sending them out as part of the next outgoingRequests loop.

It does not give you enough information? What is missing? You need three things to mark a request as sent:

  1. The unique id
  2. The request type
  3. The response

shareRoomKey returns a list of /sendToDevice requests, i think you could mark the request as sent even if you never get a response:

  1. /sendToDevice requests have a transactionId, this is used as the unique request ID
  2. Well, you know the type, it's a RequestType::ToDevice
  3. /sendToDevice responses are always an empty {}

As for those requests ending up as part of outgoingRequests, are you sure about that? AFAIK requests from shareRoomKey never end up in the list that outgoingRequests returns.

There's very little documentation around when and how to call outgoingRequests - calling it only on sync feels far too slow. Because of how the bot-sdk works, the tracked users are being updated during sync, just a few hundred lines of code between the start of sync processing and the tracked users update. It's not entirely feasible to move the tracking up in the code without unreasonable performance penalties at the moment.

Only receiveSyncChanges() will produce any changes in the state machine that will queue up requests for outgoingRequests(). That's a bit of an oversimplification, because sending out an /keys/claim request might do so as well.

So the way other implementations use outgoingRequests() is:

  1. Receive a sync response from the server
  2. Call receiveSyncChanges() as the first part of the sync handling
  3. Do your usual sync handling, with two important E2EE steps:
    1. Call updateTrackedUsers() when there's a new user in an E2EE enabled room
    2. Call decryptRoomEvent() if you want to decrypt room events
  4. Optionally, call getMissingSessions() with an empty user list.
  5. Call outgoingRequests()
  6. Go back to 1.

You can bundle 4 and 5 into a method and call the bundle once or twice per sync.

A couple more rules:

  • Do not call outgoingRequests() in a loop until you have no requests anymore.
  • You must ensure that you don't send out the same request multiple times out.
    Either use a lock to make sure that you don't call outgiongRequests() multiple times at the same time, or deduplicate the requests that you get from outgiongRequests().

The same rule applies to getMissingSessions() and shareRoomKey(), you can have a lock per room for shareRoomKey(). Those calls will cache and persist requests that you need to send out, the requests will be part of the return value until you have marked them as sent (there are exceptions to this, i.e. if you discard a room key manually but it's generally true).

If you have more questions like these please feel free to ask them, I'll try to update the relevant docs with this information.

@turt2live
Copy link
Member Author

/sendToDevice requests have a transactionId, this is used as the unique request ID

this part definitely wasn't clear :p

I was seeing the requests show up in outgoingRequests and didn't see a request ID/type so assumed that it would be safe. For documentation, it would be really nice to have it spelled out how you're supposed to use the functions as there's lots of room for assumption/interpretation to go sideways (as it has in my case).

I'll rework my code to use the approach mentioned in your comment as best as I can and report back. For clarity: updateTrackedUsers doesn't need to be called with a complete set of all known users? (how does one stop tracking a user?)

@poljar
Copy link
Contributor

poljar commented Jul 6, 2022

/sendToDevice requests have a transactionId, this is used as the unique request ID

this part definitely wasn't clear :p

I think the Rust side has a request_id() method which makes this somewhat clearer but still sub-optimal.

I'll rework my code to use the approach mentioned in your comment as best as I can and report back. For clarity: updateTrackedUsers doesn't need to be called with a complete set of all known users? (how does one stop tracking a user?)

So the way device tracking works in Matrix land is that the server is supposed to give you the list of users you share encrypted rooms with which have new or changed devices.

Synapse doesn't really respect the whole "users you share an encrypted room with", instead it seems to do so for any user you share a room with. It also doesn't give you any users in this list when you initially sync.

To avoid fetching devices of users you don't care about, and to initially populate the list of devices you do care about, we'll need to tell the Rust side which users we are sharing an encrypted room with.

This is what updateTrackedUsers does, the intention is to call this with the subset of users you share an encrypted room with. You don't need to stop tracking users as, presumably, the server won't notify you about any changes from an user you don't need to track anymore.

If you fail to mark users for tracking we won't have the list of devices for this user and the user won't in return receive any room keys.

@turt2live
Copy link
Member Author

sorry, I'm still unclear with how I'm supposed to use updateTrackedUsers: can I call it with just the users who had their devices changed, or do I somehow need to keep track of all the user IDs I've supplied to that function and keep calling it with that?

@turt2live
Copy link
Member Author

having re-read your comment a few times though, it sounds like the client needs to keep track itself and somehow make it work. This is a bit harder for the bot-sdk, but not impossible.

@poljar
Copy link
Contributor

poljar commented Jul 6, 2022

You need to inspect m.room.member events, those are not passed to the Rust side since it would duplicate the membership information.

If the m.room.member events if for an encrypted room, and the membership state of the member is join or invite, you pass it to updateTrackedUsers.

Once a user has been marked for tracking you don't need to mark it again, though if you do, it'll be a noop.

Users who have their devices changed are passed to the Rust side with the receiveSyncChanges() method, so you don't need to keep track of who has a changed device.

In other words you only need to call this method with:

  1. Users that become part of an encrypted room.
  2. All users of a room if it has become encrypted.

@turt2live
Copy link
Member Author

right, using the library correctly seems to fix it - thanks :)

A worked example would cover a lot of the traps I fell into, I think. At least something to denote how to handle the combination of updateTrackedUsers, shareRoomKey, and outgoingRequests.

@turt2live turt2live closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bindings bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants