-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key backups: some backups have incorrect MAC causing UTDs on login #2338
Comments
@kegsay FWIW if you try to decrypt a room key with the wrong key you will get this same error So it could also be a problem with a client having an incorrect key in cache? |
We're going to return a more sensible error once matrix-org/matrix-rust-sdk#3250 is merged. |
@davidegirardi was also hit by this bug, so we went into a deep rabbit hole (mostly in this thread) of trying to figure out a root cause. We hit a dead end before being able to do so, but we did eliminate a bunch of possibilities. We first suspected that a client might be incorrectly calculating the padding (but otherwise encrypting the key entry correctly), but this was eliminated as an option by manually deriving the AES and MAC keys and then attempting to decrypt using low-level methods which don't error out on incorrect padding. If it was only the padding that was wrong, this should've gotten us a mostly reasonable plaintext with some incorrect padding. Instead, we got gibberish. The remaining possibilities are that the encrypted key entry ciphertext was somehow corrupted as a whole or that the wrong decryption key is being tried, which should only happen if some client had inserted the encrypted key entry into the wrong backup version. |
The error looks to be now
|
On Friday 2024-04-12, we spotted a client using two different backup keys for the same user and backup version which could also explain this bug. More generally, there seem to be several spots in the Rust SDK backup code where we could ostensibly desync on the backup public key due to using too few checks when enabling backups. One example is here, where the first
Another hypothetical way this could happen is due to a server bug. EDIT (2024-04-20): While the above is not wrong, we now know that (most?) occurrences of this bug were due to https://github.com/matrix-org/internal-config/issues/1518. |
Indeed, we know the root cause here and a fix is incoming. |
Fixed in the EXA and EXI releases from earlier this week. |
Since I don't see it explained here, the problem here was that we were using the secret key to encrypt backed up keys, instead of the public key. (The intention of key backup is that any device should be able to add keys to your key backup, whereas only a verified device should be able to fetch keys; hence the use of a public key to add new keys). The same problem caused GHSA-9ggc-845v-gcgv. The fix was matrix-org/matrix-rust-sdk@11de044 (backported into matrix-sdk-crypto 0.7.1 by matrix-org/matrix-rust-sdk#3402). |
Filing in element-meta as it's not clear which client is uploaded the bad backups currently. Part of #245
Context
We have had rageshakes on Element X where some messages are UTD. These messages are historical messages, which arrived prior to the device logging in. In this case, we expect the messages to be undecryptable if there is no key backup set up. However, these rageshakes do have key backup set up, and there is a key for the given room/session. Therefore, we expect the message to be decryptable but it isn't, hence the bug.
Problem
The key fails to decrypt with the following error:
2024-03-12T11:02:22.098722Z WARN matrix_sdk::encryption::backups: Couldn't decrypt a room key we downloaded from backups, session ID: ..., error: InvalidPadding(UnpadError) | crates/matrix-sdk/src/encryption/backups/mod.rs:469 | spans: root
The padding is in reference to PKCS Padding which is padding added to the plaintext prior to encryption. From a look at the underlying source this error can happen when:
Through some digging, it appears that the affected ciphertext is a multiple of the block size (16), indicating that the padding may be malformed. Needs confirmation.
If this is true, then some client somewhere is calculating padding incorrectly, causing EX to fail to decrypt the key backup, causing UTDs.
We have reason to believe libolm may be too lax as vdh puts it: "if the last byte is, say, 20, then libolm will happily decrypt it and strip off the last 20 bytes, but presumably another impl will reject it". This merely means that libolm based clients (e.g legacy Element-Web) will not return an error if the padding is malformed, rather than implying Element-Web is at fault for uploading a bad backup.
Frequency
The pad error has subsequently been seen on Element X iOS and Android clients on a range of homeservers, including matrix.org. There are at least 2 independent reports of this specifically mentioning the failure mode described e.g "Fresh login, unable to decrypt some of the latest messages". However, it clearly isn't prolific, as we have many many instances of users using key backup without issue, involving thousands of keys.
The text was updated successfully, but these errors were encountered: