Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Speed up fetching device lists changes in sync. #7423

Merged
merged 4 commits into from
May 6, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/7423.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Speed up fetching device lists changes in sync.
richvdh marked this conversation as resolved.
Show resolved Hide resolved
12 changes: 8 additions & 4 deletions synapse/handlers/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -1143,10 +1143,14 @@ async def _generate_sync_entry_for_device_list(
user_id
)

tracked_users = set(users_who_share_room)

# Always tell the user about their own devices
tracked_users.add(user_id)
# Always tell the user about their own devices. We check as the user
# ID is almost certainly already included (unless they're not in any
# rooms) and taking a copy of the set is relatively expensive.
if user_id not in users_who_share_room:
users_who_share_room = set(users_who_share_room)
users_who_share_room.add(user_id)

tracked_users = users_who_share_room

# Step 1a, check for changes in devices of users we share a room with
users_that_have_changed = await self.store.get_users_whose_devices_changed(
Expand Down
4 changes: 2 additions & 2 deletions synapse/storage/data_stores/main/devices.py
Original file line number Diff line number Diff line change
Expand Up @@ -541,8 +541,8 @@ def get_users_whose_devices_changed(self, from_key, user_ids):

# Get set of users who *may* have changed. Users not in the returned
# list have definitely not changed.
to_check = list(
self._device_list_stream_cache.get_entities_changed(user_ids, from_key)
to_check = self._device_list_stream_cache.get_entities_changed(
user_ids, from_key
)

if not to_check:
Expand Down
19 changes: 15 additions & 4 deletions synapse/util/caches/stream_change_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,13 @@
# limitations under the License.

import logging
from typing import Dict, Iterable, List, Mapping, Optional, Set
from typing import Dict, FrozenSet, List, Mapping, Optional, Set, Union

from six import integer_types

from sortedcontainers import SortedDict

from synapse.types import Collection
from synapse.util import caches

logger = logging.getLogger(__name__)
Expand Down Expand Up @@ -85,16 +86,26 @@ def has_entity_changed(self, entity: EntityType, stream_pos: int) -> bool:
return False

def get_entities_changed(
self, entities: Iterable[EntityType], stream_pos: int
) -> Set[EntityType]:
self, entities: Collection[EntityType], stream_pos: int
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've checked all the callers and this matches their types fwiw

) -> Union[Set[EntityType], FrozenSet[EntityType]]:
"""
Returns subset of entities that have had new things since the given
position. Entities unknown to the cache will be returned. If the
position is too old it will just return the given list.
"""
changed_entities = self.get_all_entities_changed(stream_pos)
if changed_entities is not None:
result = set(changed_entities).intersection(entities)
# We now do an intersection, trying to do so in the most efficient
# way possible (some of these sets are *large*). First check in the
# given iterable is already set that we can reuse, otherwise we
# create a set of the *smallest* of the two iterables and call
# `intersection(..)` on it (this can be twice as fast as the reverse).
Comment on lines +101 to +102
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://wiki.python.org/moin/TimeComplexity#set has some stuff to say about this:

Average case: O(min(len(s), len(t))
Worst case: O(len(s) * len(t))

replace "min" with "max" if t is not a set

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that is interesting, I did:

In [1]: from synapse.util.stringutils import random_string                                                                                                                                                         

In [2]:                                                                                                                                                                                                            

In [2]: a = [random_string(10) for _ in range(10000)]                                                                                                                                                              

In [3]: b = [random_string(10) for _ in range(10)]                                                                                                                                                                 

In [4]: %timeit set(a).intersection(b)                                                                                                                                                                             
299 µs ± 3.41 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [5]: %timeit set(b).intersection(a)                                                                                                                                                                             
141 µs ± 9.19 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I'm slightly surprised that they're symmetric when t is not a set

(I'm going to wait for you to point out my embarrassing typo now....)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane to me. The only thing I can think of is that this is partially bound by the creation of the set() and not by the intersction code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that'd make sense if they ignored the time requirements for creating the set s

Copy link
Member

@clokep clokep May 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so it'd be interesting to see:

>>> sa = set(a)
>>> sb = set(b)
>>> %timeit sa.intersection(b)
>>> %timeit sb.intersection(a)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at this a bit more and I think your conclusion is sane, but the gist of my exploration is that converting things to sets is expensive. 👍

if isinstance(entities, (set, frozenset)):
result = entities.intersection(changed_entities)
elif len(changed_entities) < len(entities):
result = set(changed_entities).intersection(entities)
else:
result = set(entities).intersection(changed_entities)
self.metrics.inc_hits()
else:
result = set(entities)
Expand Down