Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Deduplicate redundant lazy-loaded members #3331

Merged
merged 17 commits into from
Jul 26, 2018

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Jun 3, 2018

As per the proposal; we can deduplicate redundant lazy-loaded members which are sent in the same sync sequence (by assuming any client capable of persisting a sync token and understanding lazy-loaded members is also capable of persisting membership details).

We do this heuristically rather than requiring the client to somehow tell us which members it has chosen to cache, by instead caching the last N members sent to a client in a LruCache, and not sending them again. For now we hardcode N to 100.

Each such cache for a given (user,device) tuple is in turn cached for up to X minutes (to avoid the caches building up) in an ExpiringCache. For now we hardcode X to 30.

Builds on #2970.
Sytest at matrix-org/sytest#467

It deliberately doesn't attempt to deduplicate redundant members in a limited sync response (for now).

To do:

  • see if it works
  • sytest
  • make it optional so that thick clients like bots can opt out.

as per the proposal; we can deduplicate redundant lazy-loaded members
which are sent in the same sync sequence. we do this heuristically
rather than requiring the client to somehow tell us which members it
has chosen to cache, by instead caching the last N members sent to
a client, and not sending them again.  For now we hardcode N to 100.
Each cache for a given (user,device) tuple is in turn cached for up to
X minutes (to avoid the caches building up).  For now we hardcode X to 30.
@ara4n ara4n changed the title WIP: deduplicate redundant lazy-loaded members Deduplicate redundant lazy-loaded members Jul 19, 2018
@ara4n
Copy link
Member Author

ara4n commented Jul 19, 2018

@matrixbot retest this please

@ara4n ara4n mentioned this pull request Jul 23, 2018
@ara4n ara4n changed the base branch from matthew/filter_members to develop July 25, 2018 23:16
@ara4n
Copy link
Member Author

ara4n commented Jul 25, 2018

@richvdh this is next up in the LL stack. however i suspect it's sufficiently decoupled from #2970 that someone else could take it if you're fed up with this.

@ara4n
Copy link
Member Author

ara4n commented Jul 26, 2018

@matrixbot retest this please

@richvdh richvdh requested a review from a team July 26, 2018 14:45
@@ -426,6 +432,9 @@ def limit(self):
def lazy_load_members(self):
return self.filter_json.get("lazy_load_members", False)

def include_redundant_members(self):
return self.filter_json.get("include_redundant_members", False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit?disco=AAAACEx0noo, we should consider validating the value passed by the client - presumably in the constructor rather than here.

(this applies to lazy_load_members too, of course; I just forgot it there.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i've fixed up the proposal doc to explicitly demand {true|false} there. This is however being strictly validated anyway via the JSON schema validation over at: https://github.com/matrix-org/synapse/pull/3331/files#diff-ed81002a2d319904392e1a6f871eb2edR121

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooh I hadn't spotted that. well, yay!

# ExpiringCache((User, Device)) -> LruCache(membership event_id)
self.lazy_loaded_members_cache = ExpiringCache(
"lazy_loaded_members_cache", self.clock,
max_len=0, expiry_ms=LAZY_LOADED_MEMBERS_CACHE_MAX_AGE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trailing comma please

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -520,9 +540,24 @@ def compute_state_delta(self, room_id, batch, sync_config, since_token, now_toke
)
]

if not include_redundant_members:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing this here, rather than where the cache is used below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hysterical reasons. fixed.

}
logger.debug("...to %r", state_ids)

# add any member IDs we are about to send into our LruCache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems problematic that we only populate the cache if lazy_load_members and not include_redundant_members. what if the client calls sync with include_redundant_members=False, and then later calls it with it True?

I can see an efficiency argument, but if we're going to say that's a thing that clients can't do, let's spell it out in the proposal, along with the steps they would need to take to change their mind (presumably a re-initial-sync?)

Relatedly, is there a danger of it breaking for people who switch between client versions that have support and those that don't? I can't think of a failure offhand, but it might be worth thinking a bit harder about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bwindels already hit this actually whilst implementing it on riot-web. we'll need to mandate that clients do a re-initialsync if they change their lazy-loading config (whether that's wrt redundancy or laziness). i'll add it to the prop.

state_ids = {
t: state_id
for t, state_id in state_ids.iteritems()
if not cache.get(state_id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a feeling this isn't going to be adequate. It's possible for state to revert to an earlier event thanks to state resolution: so for example Bob's member event might be A, then B, then back to A. In this case we won't tell clients it's gone back to A, because A is already in the cache.

(Admittedly there are probably other bugs in the sync code in this area, but let's not add more.)

I suspect you need to maintain the latest (type, state_key) => event_id mapping in the cache, rather than just a list of event ids.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(although I maintain the cache as state_key => event_id for simplicity and efficiency, as type is redundant)

logger.debug("filtering state from %r...", state_ids)
state_ids = {
t: state_id
for t, state_id in state_ids.iteritems()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't ask you change the existing state_ids var, but can you s/state_id/event_id/ if it's an event id? To me a state_id sounds more like a state group id than an event id.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@richvdh richvdh assigned ara4n and unassigned richvdh Jul 26, 2018
@ara4n
Copy link
Member Author

ara4n commented Jul 26, 2018

@richvdh ptal

@ara4n ara4n assigned richvdh and unassigned ara4n Jul 26, 2018
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@richvdh
Copy link
Member

richvdh commented Jul 26, 2018

(It's worrying that the postgres tests are failing, though it looks unrelated :/)

@richvdh
Copy link
Member

richvdh commented Jul 26, 2018

retest this please

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants