-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Deduplicate redundant lazy-loaded members #3331
Conversation
as per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence. we do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client, and not sending them again. For now we hardcode N to 100. Each cache for a given (user,device) tuple is in turn cached for up to X minutes (to avoid the caches building up). For now we hardcode X to 30.
@matrixbot retest this please |
@matrixbot retest this please |
@@ -426,6 +432,9 @@ def limit(self): | |||
def lazy_load_members(self): | |||
return self.filter_json.get("lazy_load_members", False) | |||
|
|||
def include_redundant_members(self): | |||
return self.filter_json.get("include_redundant_members", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as per https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit?disco=AAAACEx0noo, we should consider validating the value passed by the client - presumably in the constructor rather than here.
(this applies to lazy_load_members too, of course; I just forgot it there.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i've fixed up the proposal doc to explicitly demand {true|false} there. This is however being strictly validated anyway via the JSON schema validation over at: https://github.com/matrix-org/synapse/pull/3331/files#diff-ed81002a2d319904392e1a6f871eb2edR121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooh I hadn't spotted that. well, yay!
synapse/handlers/sync.py
Outdated
# ExpiringCache((User, Device)) -> LruCache(membership event_id) | ||
self.lazy_loaded_members_cache = ExpiringCache( | ||
"lazy_loaded_members_cache", self.clock, | ||
max_len=0, expiry_ms=LAZY_LOADED_MEMBERS_CACHE_MAX_AGE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trailing comma please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
synapse/handlers/sync.py
Outdated
@@ -520,9 +540,24 @@ def compute_state_delta(self, room_id, batch, sync_config, since_token, now_toke | |||
) | |||
] | |||
|
|||
if not include_redundant_members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we doing this here, rather than where the cache
is used below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hysterical reasons. fixed.
synapse/handlers/sync.py
Outdated
} | ||
logger.debug("...to %r", state_ids) | ||
|
||
# add any member IDs we are about to send into our LruCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems problematic that we only populate the cache if lazy_load_members and not include_redundant_members
. what if the client calls sync with include_redundant_members=False
, and then later calls it with it True?
I can see an efficiency argument, but if we're going to say that's a thing that clients can't do, let's spell it out in the proposal, along with the steps they would need to take to change their mind (presumably a re-initial-sync?)
Relatedly, is there a danger of it breaking for people who switch between client versions that have support and those that don't? I can't think of a failure offhand, but it might be worth thinking a bit harder about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bwindels already hit this actually whilst implementing it on riot-web. we'll need to mandate that clients do a re-initialsync if they change their lazy-loading config (whether that's wrt redundancy or laziness). i'll add it to the prop.
synapse/handlers/sync.py
Outdated
state_ids = { | ||
t: state_id | ||
for t, state_id in state_ids.iteritems() | ||
if not cache.get(state_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a feeling this isn't going to be adequate. It's possible for state to revert to an earlier event thanks to state resolution: so for example Bob's member event might be A, then B, then back to A. In this case we won't tell clients it's gone back to A, because A is already in the cache.
(Admittedly there are probably other bugs in the sync code in this area, but let's not add more.)
I suspect you need to maintain the latest (type, state_key) => event_id
mapping in the cache, rather than just a list of event ids.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(although I maintain the cache as state_key => event_id
for simplicity and efficiency, as type is redundant)
synapse/handlers/sync.py
Outdated
logger.debug("filtering state from %r...", state_ids) | ||
state_ids = { | ||
t: state_id | ||
for t, state_id in state_ids.iteritems() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I won't ask you change the existing state_ids
var, but can you s/state_id/event_id/
if it's an event id? To me a state_id sounds more like a state group id than an event id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@richvdh ptal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
(It's worrying that the postgres tests are failing, though it looks unrelated :/) |
retest this please |
As per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence (by assuming any client capable of persisting a sync token and understanding lazy-loaded members is also capable of persisting membership details).
We do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client in a LruCache, and not sending them again. For now we hardcode N to 100.
Each such cache for a given
(user,device)
tuple is in turn cached for up to X minutes (to avoid the caches building up) in an ExpiringCache. For now we hardcode X to 30.Builds on #2970.
Sytest at matrix-org/sytest#467
It deliberately doesn't attempt to deduplicate redundant members in a limited sync response (for now).
To do: