Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Error backfilling in room for 10 minutes after purging history #9864

Open
erikjohnston opened this issue Apr 22, 2021 · 6 comments
Open

Error backfilling in room for 10 minutes after purging history #9864

erikjohnston opened this issue Apr 22, 2021 · 6 comments
Labels
A-Corruption Things that have led to unexpected state in Synapse or the database S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@erikjohnston
Copy link
Member

The error in the logs is:

2021-04-21 12:48:14,592 - synapse.handlers.federation - 1239 - ERROR - GET-2201 - Failed to backfill from localhost:8875 because Trying to persist state with unpersisted prev_group: 3369
Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/synapse/handlers/federation.py", line 1208, in try_backfill
    dom, room_id, limit=100, extremities=extremities
  File "/venv/lib/python3.7/site-packages/synapse/handlers/federation.py", line 1031, in backfill
    context = await self.state_handler.compute_event_context(event)
  File "/venv/lib/python3.7/site-packages/synapse/state/__init__.py", line 385, in compute_event_context
    current_state_ids=state_ids_after_event,
  File "/venv/lib/python3.7/site-packages/synapse/storage/state.py", line 592, in store_state_group
    event_id, room_id, prev_group, delta_ids, current_state_ids
  File "/venv/lib/python3.7/site-packages/synapse/storage/databases/state/store.py", line 492, in store_state_group
    "store_state_group", _store_state_group_txn
  File "/venv/lib/python3.7/site-packages/synapse/storage/database.py", line 667, in runInteraction
    **kwargs,
  File "/venv/lib/python3.7/site-packages/synapse/storage/database.py", line 743, in runWithConnection
    self._db_pool.runWithConnection(inner_func, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/twisted/python/threadpool.py", line 238, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/venv/lib/python3.7/site-packages/twisted/python/threadpool.py", line 255, in <lambda>
    ctx, func, *args, **kw
  File "/venv/lib/python3.7/site-packages/twisted/python/context.py", line 118, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/venv/lib/python3.7/site-packages/twisted/python/context.py", line 83, in callWithContext
    return func(*args, **kw)
  File "/venv/lib/python3.7/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/venv/lib/python3.7/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction
    return function(*args, **kwargs)
  File "/venv/lib/python3.7/site-packages/twisted/python/compat.py", line 403, in reraise
    raise exception.with_traceback(traceback)
  File "/venv/lib/python3.7/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/venv/lib/python3.7/site-packages/synapse/storage/database.py", line 737, in inner_func
    return func(db_conn, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/synapse/storage/database.py", line 531, in new_transaction
    r = func(cursor, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/synapse/storage/databases/state/store.py", line 419, in _store_state_group_txn
    % (prev_group,)
Exception: Trying to persist state with unpersisted prev_group: 3369

This happens when something pulls the state group into the cache just before we purge history and delete the state group, at which point we fail to invalidate the state cache. If we backfill the purged history again, the worker hits the cache and gets the old, deleted state group.

This is likely not much an issue outside of tests, as the state_cache expires after an hour.

@erikjohnston
Copy link
Member Author

Annoyingly, the cache is not part of the data store and we can't just use the cache invalidation there.

@erikjohnston erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Apr 22, 2021
@erikjohnston erikjohnston changed the title We don't invalidate state cache when deleting state groups during room history purge Error backfilling in room for 10 minutes after purging history Apr 22, 2021
@erikjohnston
Copy link
Member Author

The cache is also indexed by a set of event IDs, and so there isn't even an easy way of invalidating it based on state group

@richvdh
Copy link
Member

richvdh commented May 19, 2022

related: #11521

@anoadragon453
Copy link
Member

anoadragon453 commented Mar 17, 2023

For clarity, the cache in question that is not being correctly invalidated while purging an event is _get_state_group_for_event (the _get_state_group_for_events method also makes use of this cache).

However we do appear to invalidate this cache by event ID when we purge events from the database:

self._invalidate_cache_and_stream(
txn, self._get_state_group_for_event, (event_id,)
)

Though we were doing so all the way back in 2020, so it's empirically not working/what is needed to solve this issue. @erikjohnston could you give some more detail on which cache is not being invalidated correctly and in what way?

See below.

@richvdh
Copy link
Member

richvdh commented Mar 17, 2023

For clarity, the cache in question that is not being correctly invalidated while purging an event is _get_state_group_for_event

I think it's StateResolutionHandler._state_cache which is not being invalidated?

@anoadragon453
Copy link
Member

Ah, I'm glad I asked! Indeed, we don't look to invalidate that cache anywhere, not even in SQLBaseStore._invalidate_state_caches.

@MadLittleMods MadLittleMods added the A-Corruption Things that have led to unexpected state in Synapse or the database label Jun 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Corruption Things that have led to unexpected state in Synapse or the database S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

4 participants