Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To-device messages may be lost if /keys/query fails #24682

Closed
richvdh opened this issue Feb 27, 2023 · 9 comments · Fixed by matrix-org/matrix-react-sdk#12630
Closed

To-device messages may be lost if /keys/query fails #24682

richvdh opened this issue Feb 27, 2023 · 9 comments · Fixed by matrix-org/matrix-react-sdk#12630
Labels
A-E2EE O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Z-Awaiting-Element-R Issues in code which will be replaced by the port of Element's crypto layer to Rust

Comments

@richvdh
Copy link
Member

richvdh commented Feb 27, 2023

Currently, after we have decrypted an Olm message, but before we process it, we check that our list of the sender's devices is up-to-date, and if not, wait for it to be updated. This was added in matrix-org/matrix-js-sdk@a587d7c.

The logs in https://github.com/matrix-org/element-web-rageshakes/issues/19889 show a key download taking 7.5 hours to get started. The reasons it takes so long aren't entirely clear, but the upshot is that we don't process a room key message for that long.

Even worse: Once the /keys/query request finally happens, it fails, which means that the Olm message is marked as a decryption failure, so the room-key is lost.

@richvdh richvdh added T-Defect S-Major Severely degrades major functionality or product features, with no satisfactory workaround A-E2EE O-Uncommon Most users are unlikely to come across this or unexpected workflow labels Feb 27, 2023
@richvdh
Copy link
Member Author

richvdh commented Feb 27, 2023

Likely we will fix this as part of #21972

@erikjohnston
Copy link
Member

Currently, after we have decrypted an Olm message, but before we process it, we check that our list of the sender's devices is up-to-date, and if not, wait for it to be updated. This was added in matrix-org/matrix-js-sdk@a587d7c.

Out of interest why do we do this? The linked Commit and PR don't seem to provide a rationale there.

@richvdh
Copy link
Member Author

richvdh commented Jun 19, 2023

An excellent question, and one which I had myself. I added some more context here: https://github.com/matrix-org/matrix-js-sdk/blob/99a15831196cf8050edd4f2871b318667a77ccfb/src/crypto/algorithms/olm.ts#L195-L210

@erikjohnston
Copy link
Member

erikjohnston commented Jun 19, 2023

Thanks! That mostly makes sense, but I'm not sure I see why the device lists have to always be up to date. If I'm understanding this correctly then could we put in a small mitigation by checking if we already know about the sender key, and only fetch the devices if we don't?

I know we're focusing on integrating with the rust SDK, but just checking in case its a two line change that might make things faster in the common case.

@BillCarsonFr
Copy link
Member

Actually in case of m.room.key we want to delay this check. So I'd skip this check for m.room.key

But we have then to do it when decrypting the room event (but yet without waiting on a download), so we would need to update getEventEncryptionInfo accordingly to warn on such cases.

ALso my understanding of downloadKeys([event.getSender()!], false) is that it was not doing any download at all. But looks like if there is one pending it will wait for it.

@erikjohnston
Copy link
Member

@BillCarsonFr Thanks. Then I guess the question is if that work will happen sufficiently soon (or is sufficiently easy) that there's no point doing the quick workaround. Adding the fast path looks to be easy enough that even I'd be able to PR it, whereas the "proper" fix sounds like more work.

@richvdh
Copy link
Member Author

richvdh commented Jun 19, 2023

@erikjohnston It sounds like the crypto team aren't going to get to this in the immediate future. If you want to PR a quickfix, I'd say be my guest.

@erikjohnston
Copy link
Member

Have created a PR at matrix-org/matrix-js-sdk#3486

@erikjohnston
Copy link
Member

Have created a PR at matrix-org/matrix-js-sdk#3486

That PR has now landed on develop, would be interesting to know if that affects the number of UTDs or not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-E2EE O-Uncommon Most users are unlikely to come across this or unexpected workflow S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect Z-Awaiting-Element-R Issues in code which will be replaced by the port of Element's crypto layer to Rust
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants