Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Implement the lazy_load_members room state filter parameter #2970

Merged
merged 58 commits into from
Jul 25, 2018

Conversation

ara4n
Copy link
Member

@ara4n ara4n commented Mar 12, 2018

This is a first cut at filtering out room_members from sync responses unless they're actually needed to render the timeline (as proposed at https://docs.google.com/document/d/11yn-mAkYll10RJpN0mkYEVqraTbU3U4eQx9MNrzqX1U/edit#)

My hope is to get this merged so that client developers can experiment with lazy_loading and see how much it speeds up their clients, and to check how badly clients handle members trickling in on demand.

Must-have todo:

  • Proactively add in new state events to responses as they're required.
  • pep8
  • pydoc
  • write sytest
  • Actually add API to turn the filtering on & off; atm it's hardcoded on
  • opportunistically deduplicate members sent to clients within the same series of sync tokens (if the client wants it) - now done in Deduplicate redundant lazy-loaded members #3331
  • calculate room names serverside, given the client no longer has the full list of members in order to do it itself. (We might also want to revisit what these names look like, as "Matthew and 2 others" is a lot less useful than "Matthew, Tom, Amand..." or similar
  • Handle disambiguating display names, given the client can't do it any more.
  • ...actually, the client /can/ disambiguate, as it should only ever need to disambiguate if it has actually seen some ambiguous members. So even if in total there are ambiguous users in a room, if the client isn't aware of this, it shouldn't need to disambiguate, i think?
  • supporting /messages -- we need this in order for historical displaynames & user_names to be correct at all.
  • Some API needed to provide historical profile details when doing /context etc?
  • Advise how to search the membership list. (do we just pull in /state without a filter for it?)
  • test under postgres

Lower priority:

  • being smarter about which members are needed for a given block of timeline (atm we just pull in those for the senders of the events in the timeline, but need to consider invite targets etc too)
  • supporting /context

@turt2live
Copy link
Member

Another thing that may need supporting is the context API if I'm not mistaken. Does this work as-is with Riot or should I hold off on deploying this to my test environment?

@ara4n
Copy link
Member Author

ara4n commented Mar 12, 2018

i haven’t tested it against riot yet, but in theory it should work. good call on /context. i would be very interested in a “this sped up initial sync from X to Y, and reduced the json from P to Q” stat.

Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks plausible to me

@richvdh richvdh assigned ara4n and unassigned richvdh Mar 12, 2018
@ara4n
Copy link
Member Author

ara4n commented Mar 13, 2018

now updated to try to always add in the necessary member events for a chunk of timeline (although sytest is reporting 500s so clearly needs more work)

@ara4n
Copy link
Member Author

ara4n commented Mar 14, 2018

So I just put this live with a custom sync worker against my @matthew2:matrix.org account. matthew2's vital stats are:

total state events: 57227
total member events: 56085
total timeline events: 2479
total rooms: 135
possible speedup factor: 32.859658778205834x

Before (matrix-org-hotfixes branch)

  • cold initial sync: 36.7s
  • warm initial sync: 8.5s
  • sync size: 7197kB
  • v8 heap in riot/web: 252MB

After (matthew/filter_members branch)

  • cold initial sync: 12.0s
  • warm initial sync: 5.2s
  • sync size: 1746k
  • v8 heap after: 110MB

So, it looks like (with this impl at least) we're seeing a roughly 2-3x improvement on various metrics.

(more datapoints from @turt2live at https://gist.github.com/turt2live/a689cdf3cb0f2ddf3c93aa20f2440c16)

@turt2live
Copy link
Member

It looks like this isn't sending the senders of events on incremental syncs. It's not the end of the world yet (as it doesn't actually break anything as far as I can tell), it just looks bad in Riot.

(I also realize this isn't near complete yet - just lodging the bug now for consideration)

turt2live added a commit to turt2live/evelium that referenced this pull request Mar 15, 2018
To counteract the behaviour currently being demonstrated in matrix-org/synapse#2970
@ara4n
Copy link
Member Author

ara4n commented Mar 15, 2018

yeah, incremental syncs are borked. my next step here is to write some sytests to try to get some level of confidence it actually works properly.

@ara4n ara4n assigned richvdh and unassigned ara4n Jul 24, 2018
@richvdh richvdh assigned ara4n and unassigned richvdh Jul 24, 2018
@ara4n
Copy link
Member Author

ara4n commented Jul 24, 2018

@richvdh ptal for a final time, i hope

@ara4n ara4n assigned richvdh and unassigned ara4n Jul 24, 2018
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are proving with 15 rounds of back-and-forth here, this stuff is finicky and hard to get right by inspection. Please add some tests to check the holes I am identifying.


if (
state_key is None or
filtered_types is not None and typ not in filtered_types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lucky for you, and has higher precedence than or. I had to go and look it up though. Parens please.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if valid_state_keys is None:
return True
if state_key in valid_state_keys:
return True
return False

got_all = is_all or not missing_types
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just write is_all or (not missing_types and filtered_types is not None) rather than special-casing over a particular bug elsewhere in the algorithm?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i find is_all or (not missing_types and filtered_types is not None) incredibly cryptic to reason about, to the extent i'm failing to convince myself it's even right.

For instance, if this is called with types=[] and filtered_types=[whatever], then is_all could well be false (if the cache is empty), and missing_types would be falsey, and filtered_types would not be None... so got_all would be true, which is the wrong answer.

Which is why I very deliberately spelt out the special case we're handling here where types=[], so we can't trust missing_types, which feels a lot easier to reason about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but the fact that you are special-casing the empty list makes me suspect that there are bugs elsewhere.

and sorry, it should have been is_all or (not missing_types and filtered_types is None). Maybe it should be:

got_all = is_all
if not got_all:
    # the cache is incomplete. We may still have got all the results we need, if 
    # we don't have any wildcards in the match list.
    if not missing_types and filtered_types is None:
        got_all = True

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay.

# filtered_types list. missing_types will always be empty, so we ignore it.
got_all = is_all
else:
got_all = is_all or not missing_types

return {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth noting that missing_types isn't actually used in the result. Suggest removing it; I think it might open some clearer options in how to implement this function

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it feels like a useful thing for the function to be returning tbh, even if it isn't used, but given the filtered_types stuff means that we can't easily enumerate the types which are missing from the result, i've removed it.

@@ -460,7 +511,7 @@ def _get_state_group_for_events(self, event_ids):

defer.returnValue({row["event_id"]: row["state_group"] for row in rows})

def _get_some_state_from_cache(self, group, types):
def _get_some_state_from_cache(self, group, types, filtered_types=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think this is right; I don't think e22700c actually fixed the case I mentioned.

Prove me wrong with a test!

i strongly suspect i'd have missed it in a UT too.

Well, can you add one now, please.

@richvdh richvdh assigned ara4n and unassigned richvdh Jul 25, 2018
@ara4n
Copy link
Member Author

ara4n commented Jul 25, 2018

@richvdh ptal.

@ara4n ara4n assigned richvdh and unassigned ara4n Jul 25, 2018
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

57 commits later... \o/

for the love of god please don't forget to squash when merging.

if not got_all:
# the cache is incomplete. We may still have got all the results we need, if
# we don't have any wildcards in the match list.
if not missing_types and filtered_types is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and now that we aren't returning missing_types, I wonder why we are bothering to build a set rather than just using a boolean. But let's just land the damn thing.

@richvdh richvdh assigned ara4n and unassigned richvdh Jul 25, 2018
@ara4n ara4n merged commit 1bcd049 into develop Jul 25, 2018
@ara4n
Copy link
Member Author

ara4n commented Jul 25, 2018

As per #synapse-dev:

* Matthew sighs
ironically GH didn't give me a squash merge button then
so i clicked merge anyway assuming it would prompt in a second phase
but it didn't
* Matthew wonders how to unpick that
given i assume a revert will make even more of a mess of the history
and force-pushing to develop will be even more antisocial.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants