-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Cross-signing signatures not being always federated correctly #7418
Comments
Possibly related: #7350 |
Since this issue was opened, Andrew connected a new device to his account and verified it from Riot Web. I can now see ZYMXYYQZTP as verified, but not the new device. afau two things could be happening here:
|
Right, I think I've figured out what's happening here. I've noticed that Andrew's So I think the explanation is the following: when server A sends out a device list update to server B, server B will look at the updates' stream ids to see if it has missed any, if it did it will try to get the missing keys via If that request fails, server B will:
In that specific issue it looks like abolivier.bzh's
Thus server B is stuck with an outdated devices list that it'll never try to refresh, except if server A sends another update and isn't under too much load that it wouldn't time out on a The correct fix here would either be to fail the whole transaction (which would also fail the PDUs so might not be the correct one) or have server B still send a 200 back and retry I believe the latter solution might be the most preferable one. I'm unfamiliar with the existing backoff/retry mechanism, so I might be forcing through an open door here, but we should absolutely persist that retry schedule in the database otherwise just restarting Synapse could lead to device lists staying out of sync. |
This issue would be fixed by #7453 |
woo, thanks for hunting this down. device list sync is the most wobbly bit of synapse, imo :( |
Agreed. Hopefully #7453 will make it way less wobbly. |
Not sure if this is related, but I recently set up a new homeserver B and am missing everyone's When I compared a sample user in device_lists_remote_extremeties I found that their stream_id was much larger (59005256) than on my long-running homeserver A (26308567). Unfortunately I can't check the other end (as it's matrix.org). I tested checking out #7453 and inserted their user_id into device_lists_remote_resync but after it was removed from the table I still didn't have
|
Not sure if your problem is the same issue - however thanks to your logs it's obvious that I didn't do the right thing wrt logging contexts in #7453, I'll try to fix that soon. |
Glad I could help. I am currently stuck in this situation of missing many people's signing keys, let me know if there's other things I can check. If you determine it's not the same we should reopen #7350. |
I checked with one of my friend's running a homeserver. My server does not exist in their Is there any way to safely insert the keys into the database or force a refresh? |
Hey, so after thinking about it, I don't think your issue share a common cause with this one (nor with #7350). I've opened #7504 to track its progress.
The only way I see of forcing a refresh would be to run #7453 and do the insertion mentioned at #7453 (comment), but if that doesn't solve it I can't think of anything else without more investigation. |
Thanks, sadly it didn't seem to solve the issue though I'll try again when this lands. I'm a little hesitant to run development versions of synapse against my regular HS since it seems to perform a non-reversible upgrade to the database (which I'm guessing could also get federation out of sync when I revert the db) and I'd like to be able to run the stable version ongoing, however if there are other things you'd like me to try I could set up a development HS. Feel free to reach out to me, @flackr:serializer.ca for higher bandwidth discussion. |
Will do when I'll get to investigating it 👍 |
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it. #6776 introduced some code infrastructure to mark device lists as stale/out of sync. This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now. It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`. Fixes #7418
For people getting bitten by this issue after 1.14.0 is out, or people seeing users with stale device lists from before 1.14.0, just run the following SQL in Synapse's database: INSERT INTO device_lists_remote_resync
VALUES ('USER_ID', (EXTRACT(epoch FROM NOW()) * 1000)::BIGINT); where |
When a call to `user_device_resync` fails, we don't currently mark the remote user's device list as out of sync, nor do we retry to sync it. matrix-org#6776 introduced some code infrastructure to mark device lists as stale/out of sync. This commit uses that code infrastructure to mark device lists as out of sync if processing an incoming device list update makes the device handler realise that the device list is out of sync, but we can't resync right now. It also adds a looping call to retry all failed resync every 30s. This shouldn't cause too much spam in the logs as this commit also removes the "Failed to handle device list update for..." warning logs when catching `NotRetryingDestination`. Fixes matrix-org#7418
Over the past couple of days I've seen a few occurrences of people on other servers verifying a new device, but my server not receiving the resulting signature, thus leaving them with a red shield from my pov. I've also seen people saying they were seeing others with a red shield whereas I would see them with a green one.
@bwindels had a look at my Riot logs upon failing to see one of @anoadragon453's devices as verified, and could see that it was indeed missing a signature, with the signatures on Andrew's device being:
and the ones my Riot would see for that same device being:
Looking at the
device_lists_remote_cache
on my homeserver's database, I can see that it's indeed missing the signature from Andrew's self-signing key (ed25519:QjSD8srN17RiDzIBgzVbncj+NMdvDHRY4N2b8w+oq9Y
).https://github.com/matrix-org/riot-web-rageshakes/issues/2740#issuecomment-623992322 provides more info about that specific occurrence.
The text was updated successfully, but these errors were encountered: