Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Remove error spam when users query the keys of departed remote users #13826

Merged
merged 1 commit into from
Sep 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13826.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a long standing bug where device lists would remain cached when remote users left and rejoined the last room shared with the local homeserver.
21 changes: 12 additions & 9 deletions synapse/handlers/e2e_keys.py
Original file line number Diff line number Diff line change
Expand Up @@ -188,18 +188,21 @@ async def query_devices(
)
invalid_cached_users = cached_users - valid_cached_users
if invalid_cached_users:
# Fix up results. If we get here, there is either a bug in device
# list tracking, or we hit the race mentioned above.
# Fix up results. If we get here, it means there was either a bug in
# device list tracking, or we hit the race mentioned above.
# TODO: In practice, this path is hit fairly often in existing
# deployments when clients query the keys of departed remote
# users. A background update to mark the appropriate device
# lists as unsubscribed is needed.
# https://github.com/matrix-org/synapse/issues/13651
# Note that this currently introduces a failure mode when clients
# are trying to decrypt old messages from a remote user whose
# homeserver is no longer available. We may want to consider falling
# back to the cached data when we fail to retrieve a device list
# over federation for such remote users.
user_ids_not_in_cache.update(invalid_cached_users)
for invalid_user_id in invalid_cached_users:
remote_results.pop(invalid_user_id)
# This log message may be removed if it turns out it's almost
# entirely triggered by races.
logger.error(
"Devices for %s were cached, but the server no longer shares "
"any rooms with them. The cached device lists are stale.",
invalid_cached_users,
)
Comment on lines -196 to -202
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit sad to see the logging go on the off-chance it's useful for debugging in the future. Would it be worth keeping this around as logger.debug, maybe on its own dedicated logger?

If you're confident that the diagnosis on lines +193 and below explains the logspam, then that's fine and we should just drop this logger.

Copy link
Contributor Author

@squahtx squahtx Sep 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be useful in future, once we resolve the remaining bits of #13651. For now it's just not that useful and the diagnosis in the comments explains all the instances I looked at.


for user_id, devices in remote_results.items():
user_devices = results.setdefault(user_id, {})
Expand Down