Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

caching on have_seen_event is somewhat broken #13865

Closed
richvdh opened this issue Sep 22, 2022 · 5 comments · Fixed by #13863
Closed

caching on have_seen_event is somewhat broken #13865

richvdh opened this issue Sep 22, 2022 · 5 comments · Fixed by #13863
Assignees
Labels
A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@richvdh
Copy link
Member

richvdh commented Sep 22, 2022

As discovered in #13863 (comment) and discussed in #synapse-dev. Part of #13856

The cache on have_seen_event uses a TreeCache, which means the cache is structured as {room_id: {event_id: result}}. However, _have_seen_events_dict tries to use it with @cachedList, which stores entries as {(room_id, event_id): result}. As a result, caching is inefficient, and cache invalidation is unreliable.

@richvdh
Copy link
Member Author

richvdh commented Sep 22, 2022

We suspect this has been the case ever since the cache was added to have_seen_event in #9953, but it would be interesting to test this to confirm.

Fixing this probably involves updating @cachedList to work correctly with TreeCaches?

@richvdh
Copy link
Member Author

richvdh commented Sep 22, 2022

Relatedly: have_seen_events calls _have_seen_events_dict, so the thing gets cached twice in the same cache, once as a room_id -> event_id -> result tree, and the other as a (room_id, event_id) -> result.

@richvdh
Copy link
Member Author

richvdh commented Sep 22, 2022

(Fortunately, have_seen_event is mostly an optimisation, so failure to invalidate is unlikely to have had too much visible effect other than inefficiency.)

@erikjohnston
Copy link
Member

I think that the usage of @cachedList is wrong. It expects to be given the same number of arguments as the underlying cached function, just with one of the arguments being an iterable

@cachedList(cached_method_name="have_seen_event", list_name="keys")
async def _have_seen_events_dict(
self, keys: Collection[Tuple[str, str]]
) -> Dict[Tuple[str, str], bool]:

I think that should actually be:

 @cachedList(cached_method_name="have_seen_event", list_name="event_ids") 
 async def _have_seen_events_dict( 
     self, room_id: str, event_ids: Collection[str],
 ) -> Dict[str, bool]: 

@richvdh
Copy link
Member Author

richvdh commented Sep 22, 2022

oh right, that's quite a lot easier.

@DMRobertson DMRobertson added A-Performance Performance, both client-facing and admin-facing S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. O-Occasional Affects or can be seen by some users regularly or most users rarely labels Sep 22, 2022
@MadLittleMods MadLittleMods self-assigned this Sep 22, 2022
MadLittleMods added a commit that referenced this issue Sep 27, 2022
Fix #13856
Fix #13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](matrix-org/complement#214 (comment)). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to #13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- #13856


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants