disable federation if sqlite is used #2917

uhoreg · 2018-02-27T23:41:39Z

unless the admin set a yes_i_want_my_server_to_catch_on_fire option. This preserves the ability to get something up and running for testing, but reduce the surprise of peoples' servers melting when joining a large federated room like HQ. Of course, there will need to be a big warning somewhere so that people know that federation is disabled.

The text was updated successfully, but these errors were encountered:

neilisfragile · 2018-02-28T09:52:22Z

Agreed on need for big warning - we'll be revamping our docs and onboarding rsn so I would see this as a good thing to include.

(Not really docs, but tagging since it would need clear doc'ing)

ShellCode33 · 2018-02-28T19:39:59Z

This feature would be great even if using postgresql ! I love Synapse and the Matrix environment, but I don't have a big server and being able to disable federation just in order to be able to chat with my friends on my private homeserver would be wonderful ! Synapse is regularly killed by the OOM Killer because of its RAM consumption and having to restart it every two or three days is a real PITA...

ara4n · 2018-02-28T19:59:25Z

disabling federation in general landed on develop a few weeks ago and will be in synapse 0.27

krombel · 2018-03-13T00:52:26Z

Just as a reference: #2619

// EDIT as question got answered in the meantime

ara4n · 2018-04-05T20:37:53Z

see also #2889

ara4n · 2018-04-09T20:34:23Z

bumping priority as more and more people are getting a horrible experience due to this :(

neilisfragile · 2018-04-11T16:22:01Z

The problem with disabling/refusing to federate large rooms is that we end up breaking sytest and existing deployments. So conclusion is that we will

Investigate sqlite regressions to see if anything obvious exists
Add a health warning logs on synctl start up
Make set up guide default to postgres and extract installation notes into it's own page

Valodim · 2018-12-18T19:18:29Z

So. What's the actual point of even supporting sqlite? It's not testing - there are working tests with postgres. It's not for trying out matrix - no need to run a homeserver for that. If it's for "running small instances" - if running a postgres instance is the relevant hurdle for some sysadmin on whether to run a synapse homeserver or not, it is frankly a question of time until that goes sideways and they'll walk away burnt. Is there a good reason to keep sqlite support around?

neilisfragile · 2019-01-03T16:18:40Z

It's mainly there to help people get going quickly with a view to upgrading to postgres as soon as their usage gets more serious.

Having postgres in CI definitely reduces the case for retaining sqlite support and supporting two db engines doesn't come for free.

neilisfragile · 2019-02-11T15:51:11Z

To summarise.
Disable federation by default for sqlite, allow a i-know-this-is-slow flag in config, ensure that the reason that federation fails is clear to the admin.

Half-Shot · 2019-03-17T21:55:54Z

This needs to happen pre 1.0, and ideally now-ish as people get excited for 1.0 and start installing their homeservers.

richvdh · 2019-04-24T06:45:03Z

I am not sure I agree that it should be pre-1.0, for the reasons set out at #5078 (review).

this option is stressful & reduce server performance. ref: - matrix-org/synapse#2619 - matrix-org/synapse#2917

richvdh · 2020-02-12T11:01:28Z

My current thinking on this is:

there should be a configuration option federation_enabled, which is set to false in the generated config (I'd love it to be false by default, but that will break everybody's existing setup)
synapse ignores attempts to enable federation listeners (and make outbound federation connections) if federation_enabled is set to false. (It should instead warn. This approach means that federation can be temporarily disabled without editing half the config file).
when you enable federation_enabled, we can do some sanity checks including refusing to start if the database engine is sqlite and you haven't set i_want_my_server_to_catch_on_fire: true.

richvdh · 2020-07-30T16:46:11Z

there was some discussion about whether this should apply to existing servers, or not.

On the one hand, it does seem quite harsh to break existing, federating setups. On the other hand: how do we differentiate between:

"existing, federating setup"
"existing setup which has not yet done any federation"

Doing so is tricky, and I feel pretty strongly that we want to catch the latter class, particularly given that the recommended installation procedure is to get things working locally first and then worry about federation.

I think I'm more in favour of the nuclear option of just doing it. We can always tell people about i_want_my_server_to_catch_on_fire in UPGRADE.rst.

richvdh · 2020-07-30T16:50:14Z

I also wonder how important this is now that INSTALL.md has been clarified thanks to https://github.com/matrix-org/synapse/pull/7899/files#diff-7d442b7eb49f5fc377f51e74b291cfc1R32.

aaronraimist · 2020-07-31T02:16:11Z

I still believe this is very important for several reasons.

Admins of existing installs won't notice the change to the documentation but they will notice federation not working
Many people use random old tutorials from around the internet rather than using the official install documentation
Even if they do read the official documentation they might miss it, think it doesn't apply to them ("I'm sure my server will have a light workload"), or just not feel like setting up Postgres right now

If you want to lessen the impact on existing installs, the change could happen in a couple of phases. The first phase could be a mention in UPGRADE.rst that federation will be disabled for SQLite installs in an upcoming version and log a bunch of warnings/errors each time Synapse starts. Then a few versions later you actually disable it. This issue has been open for years so another couple of months delay in actually disabling it isn't a big deal.

aaronraimist · 2020-07-31T02:16:35Z

Also fixing #2317 / #4270 would be a good first step.

richvdh · 2020-08-17T17:00:39Z

it might (also) be worth investigating if fixing #8105 makes this less of a problem.

MadLittleMods · 2021-07-08T00:43:51Z

Related to #6401

thestinger · 2021-07-09T12:20:18Z

At the moment, the issue is not that SQLite is slow but that Synapse is using SQLite incorrectly. It's not simply a performance issue with SQLite and it won't be fixed simply by enabling WAL mode and allowing multiple threads. I do genuinely mean that it's using SQLite incorrectly rather than simply in a mode that has low performance. Until this is fixed, you should really not be supporting SQLite as a backend at all because it's broken even for small deployments right now.

First of all, WAL mode should be enabled unconditionally and should be the only supported way to use it. WAL mode supports N concurrent readers with 1 concurrent writer. Non-WAL mode only supports either N concurrent readers or 1 writer, so writes can indefinitely block reads which makes things drastically worse. However, you'll still get busy errors before the timeout elapses unless you fix your code to do transactions properly.

If you're using WAL mode, the database won't ever be busy for reads. It can still be busy for writes, but if you do things properly it will only report that the database is busy once the transaction attempting to perform the write has waited for the entire busy timeout.

You should configure the busy timeout appropriately so that the write transactions will wait for a potentially long time if a lot of writes are being done, particularly via long transactions.

A deferred write transaction will immediately fail as the database being busy when it enters write mode if a concurrent write occurred which could have invalidated earlier reads. You should completely avoid deferred write transactions to avoid this.

Any transaction where writes MAY be performed needs to be started with BEGIN IMMEDIATE rather than BEGIN in order to obtain the writer lock as part of starting the transaction. This makes sure that the transaction will be able to proceed. Using BEGIN implies using a deferred writer lock which means that the transaction needs to fail as the database being busy if a write occurred which may have modified the state of the database seen by a read in the transaction.

Once you fix these things by using WAL mode unconditionally and avoiding deferred write transactions, you should permit many concurrent connections rather than restricting it to 1 thread. You should really permit at least 64 concurrent connections to optimize for machines with an SSD. Using an HDD isn't viable once the working set of the database stops fitting in memory anyway.

ShadowJonathan · 2021-07-09T12:31:18Z

There is one caveat to using WAL which i ran into recently, which is that the WAL file itself can grow unconditionally if readers are constantly accessing it. The SQLite documentation says;

A checkpoint operation takes content from the WAL file and transfers it back into the original database file. A checkpoint can run concurrently with readers, however the checkpoint must stop when it reaches a page in the WAL that is past the end mark of any current reader. The checkpoint has to stop at that point because otherwise it might overwrite part of the database file that the reader is actively using. The checkpoint remembers (in the wal-index) how far it got and will resume transferring content from the WAL to the database from where it left off on the next invocation.

Thus a long-running read transaction can prevent a checkpointer from making progress. But presumably every read transaction will eventually end and the checkpointer will be able to continue.

Whenever a write operation occurs, the writer checks how much progress the checkpointer has made, and if the entire WAL has been transferred into the database and synced and if no readers are making use of the WAL, then the writer will rewind the WAL back to the beginning and start putting new transactions at the beginning of the WAL. This mechanism prevents a WAL file from growing without bound.

I ran into this issue when adding support for a sqlite backend for conduit, SQLite is - by default - in PRAGMA wal_checkpoint=PASSIVE;, meaning that it will try to do as much as it can, but unless nothing is reading from the file, it'll grow without bounds. (see also the PR description of above link)

SQLite documentation on wal_checkpoint=PASSIVE;

Checkpoint as many frames as possible without waiting for any database readers or writers to finish. Sync the db file if all frames in the log are checkpointed.

In conclusion, I think a core reason why WAL might've not been turned in the past (and why I can see the hesitance to enable it now) would be because, under load, that file can grow without bounds. The solution to this is to rwlock all database access and perform noop with wal_checkpoint=TRUNCATE, which'll force the WAL file to shrink, but that introduces extra logic to the storage part of the codebase.

richvdh · 2021-09-10T10:44:33Z

Honestly the only reason that we haven't investigated things like WAL (or indeed just using multiple threads) is that sqlite performance isn't really a priority for us - Postgres is unquestionably a more appropriate solution for deployment at scale (apart from anything else, it makes it much easier to separate your application servers from your database servers), so we've ended up focussing on that.

If any enthusiastic contributors would like to investigate supporting WALs and creating a PR to support them, that would be great, but it's unlikely the Synapse core team will find time to prioritise that sort of work in the near future.

asymmetric · 2022-09-24T08:26:15Z

I do genuinely mean that it's using SQLite incorrectly rather than simply in a mode that has low performance.

@thestinger we have been running a very small (~20 users), federated instance and have never had any issues, performance- or otherwise.

Could you or someone elucidate how specifically this incorrectness would manifest in an SQLite setup?

Because given our experience, trading off SQLite for PostgreSQL doesn't really seem worth the additional overhead.

neilisfragile added enhancement z-p2 (Deprecated Label) A-Docs things relating to the documentation labels Feb 28, 2018

ara4n removed the A-Docs things relating to the documentation label Apr 5, 2018

ara4n added p1 and removed z-p2 (Deprecated Label) labels Apr 9, 2018

This was referenced Dec 7, 2018

Use Postgres by default #4270

Closed

Use Postgres by default matrix-org/package-synapse-debian#40

Closed

hawkowl added the Synapse v1.0 label Apr 16, 2019

babolivier self-assigned this Apr 18, 2019

babolivier mentioned this issue Apr 18, 2019

Disable federation when using SQLite by default #5078

Closed

neilisfragile removed the Synapse v1.0 label May 15, 2019

d3m3vilurr added a commit to d3m3vilurr/matrix-dockers that referenced this issue Jun 26, 2019

synapse: Temporary disable federation

ac3e403

this option is stressful & reduce server performance. ref: - matrix-org/synapse#2619 - matrix-org/synapse#2917

richvdh mentioned this issue Mar 30, 2020

SQLite: too many SQL variables (federation) #7174

Closed

lunainvictum mentioned this issue Jan 12, 2021

Question about Synapse Server completly isolated #9094

Closed

MadLittleMods added the T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. label Jul 7, 2021

richvdh mentioned this issue Sep 10, 2021

we only use a single thread with sqlite #8105

Open

richvdh mentioned this issue Nov 24, 2021

Consider using SQLite's write-ahead log (WAL) mode instead of the rollback journal mode matrix-org/sydent#472

Open

richvdh mentioned this issue Aug 12, 2022

Synapse won't start after upgrade attempt from 1.59.1 to 1.62: slow database migration means we get timed out by systemd #13193

Closed

DMRobertson removed the z-enhancement label Aug 25, 2022

erikjohnston unassigned babolivier Sep 7, 2022

erikjohnston added S-Minor Blocks non-critical functionality, workarounds exist. O-Occasional Affects or can be seen by some users regularly or most users rarely A-Performance Performance, both client-facing and admin-facing and removed z-p1 labels Sep 7, 2022

asymmetric mentioned this issue Sep 24, 2022

Enable WAL for SQLite #13897

Merged

4 tasks

matrixbot mentioned this issue Dec 21, 2023

disable federation if sqlite is used element-hq/synapse#2917

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disable federation if sqlite is used #2917

disable federation if sqlite is used #2917

uhoreg commented Feb 27, 2018

neilisfragile commented Feb 28, 2018

ShellCode33 commented Feb 28, 2018

ara4n commented Feb 28, 2018

krombel commented Mar 13, 2018 •

edited

Loading

ara4n commented Apr 5, 2018

ara4n commented Apr 9, 2018

neilisfragile commented Apr 11, 2018

Valodim commented Dec 18, 2018

neilisfragile commented Jan 3, 2019

neilisfragile commented Feb 11, 2019

Half-Shot commented Mar 17, 2019

richvdh commented Apr 24, 2019

richvdh commented Feb 12, 2020

richvdh commented Jul 30, 2020

richvdh commented Jul 30, 2020

aaronraimist commented Jul 31, 2020 •

edited

Loading

aaronraimist commented Jul 31, 2020 •

edited

Loading

richvdh commented Aug 17, 2020

MadLittleMods commented Jul 8, 2021

thestinger commented Jul 9, 2021 •

edited

Loading

ShadowJonathan commented Jul 9, 2021

richvdh commented Sep 10, 2021

asymmetric commented Sep 24, 2022 •

edited

Loading

disable federation if sqlite is used #2917

disable federation if sqlite is used #2917

Comments

uhoreg commented Feb 27, 2018

neilisfragile commented Feb 28, 2018

ShellCode33 commented Feb 28, 2018

ara4n commented Feb 28, 2018

krombel commented Mar 13, 2018 • edited Loading

ara4n commented Apr 5, 2018

ara4n commented Apr 9, 2018

neilisfragile commented Apr 11, 2018

Valodim commented Dec 18, 2018

neilisfragile commented Jan 3, 2019

neilisfragile commented Feb 11, 2019

Half-Shot commented Mar 17, 2019

richvdh commented Apr 24, 2019

richvdh commented Feb 12, 2020

richvdh commented Jul 30, 2020

richvdh commented Jul 30, 2020

aaronraimist commented Jul 31, 2020 • edited Loading

aaronraimist commented Jul 31, 2020 • edited Loading

richvdh commented Aug 17, 2020

MadLittleMods commented Jul 8, 2021

thestinger commented Jul 9, 2021 • edited Loading

ShadowJonathan commented Jul 9, 2021

richvdh commented Sep 10, 2021

asymmetric commented Sep 24, 2022 • edited Loading

krombel commented Mar 13, 2018 •

edited

Loading

aaronraimist commented Jul 31, 2020 •

edited

Loading

aaronraimist commented Jul 31, 2020 •

edited

Loading

thestinger commented Jul 9, 2021 •

edited

Loading

asymmetric commented Sep 24, 2022 •

edited

Loading