-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Long running schema deltas are killed by systemd and lead to bootlooping #14401
Comments
Is it possible to make the sqlite schema deltas transactional too? (Were they not before---did I break something in #13873?) |
I forgot to mention, that understanding was based on the docstring for synapse/synapse/storage/engines/sqlite.py Lines 127 to 144 in 8c94dd3
|
We could try injecting |
Probably a duplicate of #14100, also see @richvdh's comment in #13193, which I kind of agree with:
Use postgres or run update_synapse_database. Nothing wrong with preventing corruption in case this problem occurs, but I wouldn't try to support even larger sqlite instances through arbitrary timeouts (until someone hits even that limit). |
It looks like this may have been introduced by #13873, which replaced the old handrolled code for parsing sql files with something more conventional. That was probably a good idea, but it does break the transactional semantics we were relying on for our migrations... |
I wonder if it would be better to split this into two issues:
The latter could probably be fixed by "just" making sure that the transaction gets committed after the update to |
Our server was affected by the problem.
With the help of @richvdh I was able to do the following and get the server running again:
|
Be aware that this is not a general solution to this issue: it is only appropriate if the upgrade fails at a very specific point. If your server is hanging somewhere else in the upgrade cycle, these commands could cause data loss and additional problems. |
Sorry, that's my mistake. FWIW the old parser still exists, see e.g. synapse/synapse/storage/prepare_database.py Line 609 in 0a38c7e
The only other option would be to inject an explicit (I recall that when I was making the full schema dumps that parser couldn't handle something postgres-specific. Trigger definitions maybe?) |
It's not, but I think it would have the right semantics, and would avoid us maintaining our own SQL parser, with all the pitfalls that brings. |
The non-transactional execution on sqlite has been split out into #14909. |
Not sure if this is worth a separate issue.
Synapse 1.92.3. Postgres 16
Fresh database
---
synapse logs
Resolved. Wrong systemd config where |
@aceArt-GmbH The logs don't show that Synapse is getting killed during the schema migration:
You should look at systemd's logs to see what is happening. If they show evidence of a bug in Synapse, please open a new issue. |
When a schema delta takes too long, Synapse will get restarted by systemd, possibly repeatedly.
~~When using postgres, schema deltas are transactional. Synapse will run the schema delta and time out again. ~~
When using sqlite, schema deltas are not fully transactional (see docstring here). The schema delta will end up being half applied and the database will end up in an inconsistent state. Depending on the nature of the schema delta, the next run may fail.Tracked in #14909.This manifested as:
Things we may consider doing:
dmr: Make sqlite schema deltas transactional too?Tracked in Schema deltas are no longer transactional on sqlite #14909.The text was updated successfully, but these errors were encountered: