You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently there are a couple separate data structures related to caching event contents in memory in Synapse (see code for fetching an event from cache/db here):
EventsWorkerStore._get_event_cache - An instance of AsyncLruCache implemented as a map of EventID -> (EventBase, redacted EventBase | None).
This cache is populated after fetching an event from the database.
Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. Dummy issue #11521), redacted or marked as rejected.
Entries in this cache are invalidated when the size limit of the LruCache are reached.
EventsWorkerStore._event_ref - A WeakValueDictionary which serves as a single point of reference for EventBase's in memory, ensuring that we don't end up with multiple, unnecessary copies of a single EventBase in memory.
This data structure is populated after fetched an event from the database.
Because this is a WeakValueDictionary, entries in this cache are invalidated when all other references to the EventBase in an entry are gone.
Entries in this cache are invalidated when an event is deleted from the database (in most cases, c.f. Dummy issue #11521), redacted or marked as rejected.
Entries in this cache are not invalidated when an entry is evicted from EventsWorkerStore._get_event_cache, as something else may still be processing the event, even if it's been removed from that cache.
What's the problem?
See matrix-org/synapse#11521; because each of these caches are keyed by EventID alone, it becomes tricky to invalidate them when all you have is a RoomID (i.e. when purging a room completely). We could query all known events for a room from the database, but that may result in millions of events. Ideally we'd have some map of RoomID -> EventID which only covers the events that are actually currently held in memory. We could then use that to invalidate all three of these caches.
Additionally, as get_event_cache contains mutable EventCacheEntrys (comprised of EventBase, redacted EventBase | None), invalidating them is necessary when an event is both redacted or marked as rejected. These can differ per-homeserver, so removing this component from the cache entries opens up avenues for multiple homeservers sharing the same, immutable event cache.
Proposal
After speaking with @erikjohnston we've (mostly Erik :) came up with the following idea:
EventsWorkerStore._get_event_cache would simply become a map of EventID -> EventBase.
We add a separate cache which is a nested map of RoomID -> EventID -> {rejected_status: bool, redacted_event_content: Optional[EventBase]}.
Entries are added to this map when an event is pulled from the database. We know the RoomID at this point.
Entries are not invalidated from this map when an entry is EventsWorkerStore._get_event_cache is invalidated due to hitting the cache size.
This does mean that we'll need to know the RoomID when querying for rejected/redacted status though... But we can get that from the event cache?
The beauty of this is that we no longer need to invalidate the _get_event_cache at all (unless the size limit is hit)! Even in the room purge use case! How? Here are some examples of using this system:
Fetch EventID A which is not in-memory
Some calling function asks for EventID A.
This does not exist in _get_event_cache (nor other caches) so we query from the database. The event and related metadata is fetched from the DB (event_json, redactions, rejections) and both the _get_event_cache and event metadata cache are populated.
Return information from the database.
Fetch EventID A which is in-memory
Some calling function asks for EventID A.
This already exists in _get_event_cache, and presumably the metadata cache. We take the RoomID from the EventBase in the _get_event_cache and query the event metadata cache.
Return information from both caches.
EventID A is not in-memory but the event has been purged
Some calling function asks for EventID A.
This already exists in _get_event_cache, and presumably the metadata cache. We take the RoomID from the EventBase in the cache and query the event metadata cache - but uh oh, there's no matching entry in the metadata cache! The event must have been purged.
We invalidate the entry in the event cache as well and return None.
Thus when purging a room, we only need to purge entries in the metadata cache (which we can easily do by RoomID due to the metadata cache's structure). Entries in the get_event_cache and event_ref will be invalidated as they are fetched.
I'm curious for thoughts on whether this sounds reasonable from other members of the Synapse team + cc @Fizzadar.
The text was updated successfully, but these errors were encountered:
This issue has been migrated from #13916.
In the hopes of fixing matrix-org/synapse#11521 and paving the way towards an immutable external event cache (matrix-org/synapse#2123), a new architecture for the event cache is proposed.
The current state
Currently there are a couple separate data structures related to caching event contents in memory in Synapse (see code for fetching an event from cache/db here):
EventsWorkerStore._get_event_cache
- An instance of AsyncLruCache implemented as a map ofEventID
-> (EventBase
, redactedEventBase
| None).EventsWorkerStore._event_ref
- A WeakValueDictionary which serves as a single point of reference for EventBase's in memory, ensuring that we don't end up with multiple, unnecessary copies of a single EventBase in memory.EventsWorkerStore._get_event_cache
, as something else may still be processing the event, even if it's been removed from that cache.What's the problem?
See matrix-org/synapse#11521; because each of these caches are keyed by EventID alone, it becomes tricky to invalidate them when all you have is a RoomID (i.e. when purging a room completely). We could query all known events for a room from the database, but that may result in millions of events. Ideally we'd have some map of RoomID -> EventID which only covers the events that are actually currently held in memory. We could then use that to invalidate all three of these caches.
Additionally, as
get_event_cache
contains mutableEventCacheEntry
s (comprised ofEventBase, redacted EventBase | None
), invalidating them is necessary when an event is both redacted or marked as rejected. These can differ per-homeserver, so removing this component from the cache entries opens up avenues for multiple homeservers sharing the same, immutable event cache.Proposal
After speaking with @erikjohnston we've (mostly Erik :) came up with the following idea:
EventsWorkerStore._get_event_cache
would simply become a map ofEventID
->EventBase
.RoomID
->EventID
->{rejected_status: bool, redacted_event_content: Optional[EventBase]}
.EventsWorkerStore._get_event_cache
is invalidated due to hitting the cache size.The beauty of this is that we no longer need to invalidate the
_get_event_cache
at all (unless the size limit is hit)! Even in the room purge use case! How? Here are some examples of using this system:Fetch EventID A which is not in-memory
_get_event_cache
(nor other caches) so we query from the database. The event and related metadata is fetched from the DB (event_json
,redactions
,rejections
) and both the_get_event_cache
and event metadata cache are populated.Fetch EventID A which is in-memory
_get_event_cache
, and presumably the metadata cache. We take the RoomID from the EventBase in the_get_event_cache
and query the event metadata cache.EventID A is not in-memory but the event has been purged
_get_event_cache
, and presumably the metadata cache. We take the RoomID from the EventBase in the cache and query the event metadata cache - but uh oh, there's no matching entry in the metadata cache! The event must have been purged.Thus when purging a room, we only need to purge entries in the metadata cache (which we can easily do by RoomID due to the metadata cache's structure). Entries in the
get_event_cache
andevent_ref
will be invalidated as they are fetched.I'm curious for thoughts on whether this sounds reasonable from other members of the Synapse team + cc @Fizzadar.
The text was updated successfully, but these errors were encountered: