-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
disable federation if sqlite is used #2917
Comments
Agreed on need for big warning - we'll be revamping our docs and onboarding rsn so I would see this as a good thing to include. (Not really docs, but tagging since it would need clear doc'ing) |
This feature would be great even if using postgresql ! I love Synapse and the Matrix environment, but I don't have a big server and being able to disable federation just in order to be able to chat with my friends on my private homeserver would be wonderful ! Synapse is regularly killed by the OOM Killer because of its RAM consumption and having to restart it every two or three days is a real PITA... |
disabling federation in general landed on develop a few weeks ago and will be in synapse 0.27 |
Just as a reference: #2619 // EDIT as question got answered in the meantime |
see also #2889 |
bumping priority as more and more people are getting a horrible experience due to this :( |
The problem with disabling/refusing to federate large rooms is that we end up breaking sytest and existing deployments. So conclusion is that we will
|
So. What's the actual point of even supporting sqlite? It's not testing - there are working tests with postgres. It's not for trying out matrix - no need to run a homeserver for that. If it's for "running small instances" - if running a postgres instance is the relevant hurdle for some sysadmin on whether to run a synapse homeserver or not, it is frankly a question of time until that goes sideways and they'll walk away burnt. Is there a good reason to keep sqlite support around? |
It's mainly there to help people get going quickly with a view to upgrading to postgres as soon as their usage gets more serious. Having postgres in CI definitely reduces the case for retaining sqlite support and supporting two db engines doesn't come for free. |
To summarise. |
This needs to happen pre 1.0, and ideally now-ish as people get excited for 1.0 and start installing their homeservers. |
I am not sure I agree that it should be pre-1.0, for the reasons set out at #5078 (review). |
this option is stressful & reduce server performance. ref: - matrix-org/synapse#2619 - matrix-org/synapse#2917
My current thinking on this is:
|
there was some discussion about whether this should apply to existing servers, or not. On the one hand, it does seem quite harsh to break existing, federating setups. On the other hand: how do we differentiate between:
Doing so is tricky, and I feel pretty strongly that we want to catch the latter class, particularly given that the recommended installation procedure is to get things working locally first and then worry about federation. I think I'm more in favour of the nuclear option of just doing it. We can always tell people about |
I also wonder how important this is now that INSTALL.md has been clarified thanks to https://github.com/matrix-org/synapse/pull/7899/files#diff-7d442b7eb49f5fc377f51e74b291cfc1R32. |
I still believe this is very important for several reasons.
If you want to lessen the impact on existing installs, the change could happen in a couple of phases. The first phase could be a mention in |
it might (also) be worth investigating if fixing #8105 makes this less of a problem. |
Related to #6401 |
At the moment, the issue is not that SQLite is slow but that Synapse is using SQLite incorrectly. It's not simply a performance issue with SQLite and it won't be fixed simply by enabling WAL mode and allowing multiple threads. I do genuinely mean that it's using SQLite incorrectly rather than simply in a mode that has low performance. Until this is fixed, you should really not be supporting SQLite as a backend at all because it's broken even for small deployments right now. First of all, WAL mode should be enabled unconditionally and should be the only supported way to use it. WAL mode supports N concurrent readers with 1 concurrent writer. Non-WAL mode only supports either N concurrent readers or 1 writer, so writes can indefinitely block reads which makes things drastically worse. However, you'll still get busy errors before the timeout elapses unless you fix your code to do transactions properly. If you're using WAL mode, the database won't ever be busy for reads. It can still be busy for writes, but if you do things properly it will only report that the database is busy once the transaction attempting to perform the write has waited for the entire busy timeout. You should configure the busy timeout appropriately so that the write transactions will wait for a potentially long time if a lot of writes are being done, particularly via long transactions. A deferred write transaction will immediately fail as the database being busy when it enters write mode if a concurrent write occurred which could have invalidated earlier reads. You should completely avoid deferred write transactions to avoid this. Any transaction where writes MAY be performed needs to be started with Once you fix these things by using WAL mode unconditionally and avoiding deferred write transactions, you should permit many concurrent connections rather than restricting it to 1 thread. You should really permit at least 64 concurrent connections to optimize for machines with an SSD. Using an HDD isn't viable once the working set of the database stops fitting in memory anyway. |
There is one caveat to using WAL which i ran into recently, which is that the WAL file itself can grow unconditionally if readers are constantly accessing it. The SQLite documentation says;
I ran into this issue when adding support for a sqlite backend for conduit, SQLite is - by default - in SQLite documentation on
In conclusion, I think a core reason why WAL might've not been turned in the past (and why I can see the hesitance to enable it now) would be because, under load, that file can grow without bounds. The solution to this is to rwlock all database access and perform noop with |
Honestly the only reason that we haven't investigated things like WAL (or indeed just using multiple threads) is that sqlite performance isn't really a priority for us - Postgres is unquestionably a more appropriate solution for deployment at scale (apart from anything else, it makes it much easier to separate your application servers from your database servers), so we've ended up focussing on that. If any enthusiastic contributors would like to investigate supporting WALs and creating a PR to support them, that would be great, but it's unlikely the Synapse core team will find time to prioritise that sort of work in the near future. |
@thestinger we have been running a very small (~20 users), federated instance and have never had any issues, performance- or otherwise. Could you or someone elucidate how specifically this incorrectness would manifest in an SQLite setup? Because given our experience, trading off SQLite for PostgreSQL doesn't really seem worth the additional overhead. |
unless the admin set a
yes_i_want_my_server_to_catch_on_fire
option. This preserves the ability to get something up and running for testing, but reduce the surprise of peoples' servers melting when joining a large federated room like HQ. Of course, there will need to be a big warning somewhere so that people know that federation is disabled.The text was updated successfully, but these errors were encountered: