Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

e2e: fetch cross-signing keys when device is signed but no key cached #10668

Conversation

TheJJ
Copy link
Contributor

@TheJJ TheJJ commented Aug 20, 2021

when for a remote user we don't have cached e2e_cross_signing_keys
but we do have cached the devices in device_lists_remote_cache
and the user has cross-signing set up, missing keys are not fetched
and cross-signing verification will fail.

with this patch, the missing cross-signing keys are requested
if the cached devices list does have cross-signing signatures.

this is a follow-up for #8455.

@TheJJ
Copy link
Contributor Author

TheJJ commented Aug 20, 2021

I'm not sure if the signature device being in the self_signing_keys is the best and correct way to check for a signature originating from a cross-signing key, but it was the most obvious way. Please advise :)

@TheJJ TheJJ force-pushed the federation-fetch-crosssigning-keys-when-not-cached branch 5 times, most recently from 8f8c6f2 to 23511b1 Compare August 23, 2021 15:15
when for a remote user we don't have cached `e2e_cross_signing_keys`
but we do have cached the devices in `device_lists_remote_cache`
and the user has cross-signing set up, missing keys are not fetched
and cross-signing verification will fail.

with this patch, the missing cross-signing keys are requested
if the cached devices list does have cross-signing signatures.

Signed-off-by: Jonas Jelten <jj@sft.lol>
@TheJJ TheJJ force-pushed the federation-fetch-crosssigning-keys-when-not-cached branch from 23511b1 to f24b399 Compare August 23, 2021 16:05
@erikjohnston erikjohnston requested a review from a team August 23, 2021 16:14
@richvdh richvdh self-assigned this Aug 26, 2021
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for the contribution!

This code is really hard to follow - a single function should not be over 250 lines long, and some of the code has 11 levels of indentation. That's not entirely your fault - it was pretty bad before - but we have to draw a line somewhere and I don't think I want to let it get any worse. In short, I'm afraid I think we need to refactor it a bit to split it up.

However, more to the point, I'm not convinced this is a correct change. As far as I can see, there is nothing in the spec that says that server implementations are expected to parse the signtatures field and fetch cross-signing keys - rather, it is left up to the clients to request the cross-signing keys if they are required. @uhoreg: do you have any memory of how this stuff is supposed to work?

[If this is a correct change, then we ought to add system tests to sytest or complement to ensure that other server implementations do it correctly.]

if not signatures:
continue

for sig_user, sigs in signatures.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that this is pulled over federation from a potentially untrusted remote server, which may itself not have validated the content of the key object, what guarantee do we have that signatures is actually a dict? We should handle invalid responses more gracefully than throwing no attribute 'items' exceptions). Likewise we should not make assumptions about the content of the dict.

@richvdh richvdh added the X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged label Aug 26, 2021
@richvdh richvdh removed their assignment Aug 26, 2021
@TheJJ
Copy link
Contributor Author

TheJJ commented Aug 26, 2021

Thanks for your thoughts, I agree my patch is pretty hacky, but maybe we can come up with a better solution by the symptoms of the bug:

  • friend on homeserver A wants to verify me with cross-signing, I'm on B.
  • xsign-verification succeeds (not sure why that works)
  • friend on A now doesn't see me with green badge, for me on B he is green
  • homeserver of A doesn't have my cross-signing keys cached, but has my devices cached
  • when the A-client requests /keys/query on A, he gets back the cached devices, but not the cross-signing keys
    • A won't federate since all requested user's devices are cached
  • in the /keys/query result no master/selfsign-keys are returned, hence my guess that this is why there's no green badge
  • (my B-client request to /keys/query do return his xsign-keys, hence he's green?)

My solution approach now was to invalidate the cache when devices have xsign-sigs, but these keys are not cached yet.

I mean the critical point is to determine the cache is invalid - devices+keys are cached but xkeys are not.

@uhoreg
Copy link
Member

uhoreg commented Aug 26, 2021

The spec doesn't say that implementations should parse the signatures and see there's a cross-signing key, but I think it's allowable for servers to do so. IIRC, even if clients request the cross-signing keys, the server won't re-fetch them if it thinks that it's already up to date.

@TheJJ
Copy link
Contributor Author

TheJJ commented Aug 31, 2021

So what is your suggestion on how we solve this bug? :) I know at least 5 users affected by this, so I guess it's plenty more.

@richvdh
Copy link
Member

richvdh commented Aug 31, 2021

The spec doesn't say that implementations should parse the signatures and see there's a cross-signing key, but I think it's allowable for servers to do so.

@uhoreg: I'm not really following this, I'm afraid. Surely it must be up to either the client or the server to decide if it is missing a cross-signing key. If the spec doesn't say that the server needs to do it, then the clients can't rely on the server doing it, so the clients have to do it?

@richvdh
Copy link
Member

richvdh commented Aug 31, 2021

IIRC, even if clients request the cross-signing keys, the server won't re-fetch them if it thinks that it's already up to date.

If it's already up to date, why would the server need to re-fetch them?

@uhoreg
Copy link
Member

uhoreg commented Aug 31, 2021

IIRC, even if clients request the cross-signing keys, the server won't re-fetch them if it thinks that it's already up to date.

If it's already up to date, why would the server need to re-fetch them?

It may not be up to date, but it may think that it's up to date if it missed a message notifying it of new keys.

The spec doesn't say that implementations should parse the signatures and see there's a cross-signing key, but I think it's allowable for servers to do so.

@uhoreg: I'm not really following this, I'm afraid. Surely it must be up to either the client or the server to decide if it is missing a cross-signing key. If the spec doesn't say that the server needs to do it, then the clients can't rely on the server doing it, so the clients have to do it?

The current spec kind of assumes that EDUs don't get dropped, so it doesn't really define a way of dealing with that situation. (It has a way of dealing with dropped EDUs, but only after you've received another key update message -- it doesn't help in the situation where the dropped message is the last one that was sent.) This is similar to the situation where the device list gets out of sync and you fail to encrypt to new devices because you don't know they exist. IIRC Erik added a thing a while back where if you get a message claiming to be from a device you don't know about, Synapse will automagically re-fetch the user's devices from the remote server.

@richvdh
Copy link
Member

richvdh commented Aug 31, 2021

The current spec kind of assumes that EDUs don't get dropped, so it doesn't really define a way of dealing with that situation

wait? what? Which EDUs are getting dropped? Surely that's not right? And we should be fixing that rather than layering more and more unspecified hacks on top of it?

I still don't think we should be doing this unless it's specced, so if we should be doing it, then it needs to be specced.

@uhoreg
Copy link
Member

uhoreg commented Aug 31, 2021

The current spec kind of assumes that EDUs don't get dropped, so it doesn't really define a way of dealing with that situation

wait? what? Which EDUs are getting dropped? Surely that's not right? And we should be fixing that rather than layering more and more unspecified hacks on top of it?

I don't know the details of how EDUs can get dropped. Maybe Erik remembers something from investigating the device list?

@callahad
Copy link
Contributor

callahad commented Sep 2, 2021

@erikjohnston Could you please weigh in on this?

@erikjohnston
Copy link
Member

erikjohnston commented Sep 2, 2021

The current spec kind of assumes that EDUs don't get dropped, so it doesn't really define a way of dealing with that situation

wait? what? Which EDUs are getting dropped? Surely that's not right? And we should be fixing that rather than layering more and more unspecified hacks on top of it?

I don't know the details of how EDUs can get dropped. Maybe Erik remembers something from investigating the device list?

We have a bunch of logic to try and ensure that device list update EDUs don't get dropped and will be retried, though the cross signing stuff was tacked on at a later date so the logic may be subtly broken somehow. I.e. if we're seeing cross signing keys not being correctly cached then that's a bug we should investigate.

It's probably worth noting we do have some paranoia code when we receive encrypted events over federation to check if our cache for the sending device is stale: https://github.com/matrix-org/synapse/blob/develop/synapse/handlers/federation_event.py#L918

@erikjohnston erikjohnston removed their request for review September 2, 2021 15:00
@erikjohnston erikjohnston removed their assignment Sep 2, 2021
@richvdh
Copy link
Member

richvdh commented Sep 2, 2021

@erikjohnston

I.e. if we're seeing cross signing keys not being correctly cached then that's a bug we should investigate.

so to be clear - you don't think we should be taking this approach of inspecting the signatures, and instead should find out why the cross-signing keys are going missing?

@erikjohnston
Copy link
Member

@erikjohnston

I.e. if we're seeing cross signing keys not being correctly cached then that's a bug we should investigate.

so to be clear - you don't think we should be taking this approach of inspecting the signatures, and instead should find out why the cross-signing keys are going missing?

Yes. I'm not super against having extra checks to try and detect when things go wrong, but if we do that we should still investigate occurrences of it happening.

@richvdh
Copy link
Member

richvdh commented Sep 3, 2021

Thanks. I'm mostly opposed to adding yet more complication and magic to this already complicated, magical bit of code. If there is any way we can avoid doing so I think we should take it.

@TheJJ
Copy link
Contributor Author

TheJJ commented Sep 3, 2021

Sure, we should find the root cause, but still we need a way to clean up the out-of-sync state.

And my signature check was the most obvious thing to me :) But i agree that doesn't seem like a good solution.

What we should keep though is the bottom code hunk where newly fetched xsignkeys are actually returned, because currently only cached keys make it to the return.

@TheJJ
Copy link
Contributor Author

TheJJ commented Sep 14, 2021

So how should we proceed here? If we're sure now state is synched properly, just flushing the cache once in some db migration might do the trick already?

@erikjohnston
Copy link
Member

So how should we proceed here? If we're sure now state is synched properly, just flushing the cache once in some db migration might do the trick already?

In terms of hunting down the root cause: step 1 would be to try and reproduce this reliably (and produce a test case). If that doesn't work out then we resort to trying to add enough logging such that if it happens again we have some hope of at least narrowing down what's going on. Unfortunately, we're a bit swamped at the minute so we likely won't have a chance to work on that ourselves right now, even though flakey cross signing is a big concern.

What we should keep though is the bottom code hunk where newly fetched xsignkeys are actually returned, because currently only cached keys make it to the return.

If there are things in this PR that would be useful besides the inspection of signature stuff then it's probably easiest to open them in a separate PR? Naively that sounds like just a good bug fix that we should land regardless of the anything else (which I'd missed) 🙂

@TheJJ
Copy link
Contributor Author

TheJJ commented Sep 24, 2021

Ok, I've submitted the return-fetched-keys pull request in #10912.

About finding the root cause: I'm not sure this is the right approach. It may be that this bug may no longer occur, but there is said cache corruption for the keys in 3 homeservers i administer. It may be from the early days of the cross-signing implementation. Which is why we should try to detect the corruption and refetch the correct state, and that was my attempt in the patch.

@erikjohnston
Copy link
Member

I think all the changes we wanted to land in this PR have been landed separately, so I'm going to go ahead and close this (shout if that's wrong!). Thanks @TheJJ for spending the time investigating this and coming up with the various solutions 👍

@TheJJ
Copy link
Contributor Author

TheJJ commented Feb 23, 2022

Thanks! I'll test if it now works :)

@TheJJ
Copy link
Contributor Author

TheJJ commented May 7, 2022

No sorry, my cross-signing-keys have still not been stored in the other homeserver:

  • select * from e2e_cross_signing_keys where user_id ~* 'myyuser'; -> empty
    the e2exsignkeys of another user on my homeserver are cached there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
X-Awaiting-Changes A contributed PR which needs changes and re-review before it can be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants