Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

postgres: create databases for all services #740

Merged
merged 83 commits into from
Dec 23, 2020

Conversation

jdreichmann
Copy link
Contributor

@jdreichmann jdreichmann commented Nov 29, 2020

If a service is enabled, a database for it is created in postgres with a uniqque password. The service can then use this database for data storage instead of relying on sqlite.

  • Point all bridges which support it to the postgres container (if matrix_postgres is enabled)
  • Roles should default to sqlite, so should the playbook as a whole if postgres is running externally
  • Roles should auto-detect existing SQLite DBs and warn/error/auto-migrate before they change the config to the postgres-default
  • Testing
  • Migration can be done using --tags=import-generic-sqlite-db (from the matrix-postgres role) to import sqlite DBs (we still have this tag, even though we now do automatic migration for everything and don't advertise it at all)

Implementation:

  • appservice_discord (done and tested; automatic migration supported)
  • appservice_irc (done and ~tested; automatic migration supported)
  • appservice_slack (done and ~tested; automatic migration supported)
  • mautrix_facebook (done and tested; automatic migration supported)
  • mautrix_whatsapp (done, but untested; automatic migration supported)
  • mautrix_telegram (done, but untested; automatic migration supported)
  • mautrix_hangouts (done, but untested; automatic migration supported)
  • mx_puppet_skype (done and ~tested; automatic migration supported, mx-puppet-* added by @jdreichmann )
  • mx_puppet_slack (done and ~tested; automatic migration supported))
  • mx_puppet_discord (done and ~tested; automatic migration supported)
  • mx_puppet_steam (done and ~tested; automatic migration supported)
  • mx_puppet_instagram (done and ~tested; automatic migration supported)
  • mx_puppet_twitter (done and ~tested; automatic migration supported)
  • dimension (done and ~tested; automatic migration supported)
  • ma1sd (done and tested; automatic migration supported)
  • matrix-reminder-bot (done and tested; automatic migration supported)
  • matrix-registration (done and tested; defaulting to Postgres blocked by Postgres not supported when using the container image (cannot import psycopg2) zeratax/matrix-registration#44; automatic migration supported)

Feel free to pick a bridge and implement the migration :)

@@ -162,3 +162,33 @@
- matrix-change-user-admin-status
- matrix-postgres-update-user-password-hash
when: "not matrix_postgres_enabled|bool"

# Create additional databases
- name: Retrieve IP of postgres container
Copy link
Contributor

@pushytoxin pushytoxin Dec 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the database server's host name? The Docker DNS server should resolve it for you - {{ matrix_synapse_database_host }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you reach a docker container by it's hostname on the host? Last time i checked, a similar feature was only available for docker-on-mac - and ansible (i.e.: this module) runs on the host. The IP however should be routable from the host, regardless of docker version / OS distribution

@jdreichmann
Copy link
Contributor Author

jdreichmann commented Dec 6, 2020

Bridges compatible with postgres DB store:

  • matrix-appservice-discord
  • matrix-appservice-slack
  • matrix-appservice-irc
  • mautrix-facebook
  • mautrix-hangouts
  • mautrix-whatsapp
  • mautrix-telegram
  • mautrix-instagram
  • mx-puppet-*
  • matrix-dimension

(incomplete)

@jdreichmann jdreichmann force-pushed the postgres-per-default branch 2 times, most recently from 16145a1 to 4d3ed47 Compare December 10, 2020 16:53
If a service is enabled, a database for it is created in postgres with a uniqque password. The service can then use this database for data storage instead of relying on sqlite.
Moving it above the "uninstalling" set of tasks is better.
Extracting it out to another file at the same time, for readability,
especially given that it will probably have to become more complex in
the future (potentially installing `jq`, etc.)
We'll most likely use one that matches the database name, but
it's better to have it configurable.
Quotes are necessary around dictionary field names.

There was a missing `}` as well.
To avoid needing to have `jq` installed on the machine, we could:
- try to run jq in a Docker container using some small image providing
that
- better yet, avoid `jq` altogether
> Invalid data passed to 'loop', it requires a list, got this instead: matrix_postgres_additional_databases. Hint: If you passed a list/dict of just one element, try adding wantlist=True to your lookup invocation or use q/query instead of lookup.

Well, or working around it, as I've done in this commit (which seems
more sane than `wantlist=True` stuff).
The tasks in `create_additional_databases.yml` will likely
ensure `matrix-postgres.service` is started, etc.

If no additional databases are defined, we'd rather not execute that
file and all these tasks that it may do in the future.
While these modules are really nice and helpful, we can't use them
for at least 2 reasons:

- for us, Postgres runs in a container on a private Docker network
(`--network=matrix`) without usually being exposed to the host.
These modules execute on the host so they won't be able to reach it.

- these modules require `psycopg2`, so we need to install it before
using it. This might or might not be its own can of worms.
People can toggle between them now. The playbook also defaults
to using SQLite if an external Postgres server is used.

Ideally, we'd be able to create databases/users in external Postgres
servers as well, but our initialization logic (and `docker run` command,
etc.) hardcode too many things right now.
Using the result of `password_hash` works for creating them,
but authentication seems to be failing with some tools like pgloader.

It's possible that we're not escaping things properly somewhere.
Ideally, it'd be nice to solve that. But the easier (and still
relatively safe/good) solution is to just turn that password hash
into a UUID that's safe for passing around without worrying about
escaping.
This can be used by various bridges, etc., to import an SQLite
(or some other supported) database into Postgres.
@spantaleev
Copy link
Owner

Thank you for the great start on this! 👍

It's been a good inspiration and I've spent a lot of time progressing tonight based on it.

I believe to have made some significant progress.
I've pushed my commits to your branch and I've tried to keep them atomic and understandable.

As an overview of all the changes:

  • I'm trying to move away from bridge roles reaching into matrix_postgres_additional_databases. It'd be nicer if we can keep roles separate as much as possible with group_vars/matrix_servers wiring them together. See how I've reversed things around for the Appservice Discord bridge. This is to be done for the other bridges as well.

  • I'm trying to let people still use SQLite for their bridges (and even having that as the default in some cases), instead of forcing them into Postgres. See how it's done for the Appservice Discord bridge.

  • when people use the roles separately from the playbook, the goal is to have the bridge roles default to whatever they can do in such a configuration (which is: use SQLite)

  • when people use an external Postgres server, we also aim for defaulting to SQLite, because we're not ready to automatically create users/databases in something which is not matrix-postgres. Ideally, we'd be able to, but for now we make a lot of assumptions and hardcode stuff. I don't have such an external-Postgres setup on hand right now, nor the motivation to deal with these edge cases myself. Still, we don't break things for these people - we default to whatever worked (SQLite). They can also prepare their own Postgres users/databases and repoint bridges there, so whoever is inclined enough can do it manually. Or they can submit a PR, of course ;)

  • I've added |to_uuid to the logic for password generation to avoid escaping issues (and even potential "password too long" issues). I've hit some escaping issues when importing data with pgloader, but we may have the same problem in other places. Using UUIDs for passwords is probably a safe enough default

  • I've migrated the tasks away from using Postgres Ansible modules (postgresql_user and postgresql_db), because they run on the host and can't reach the matrix-postgres container. Also, they require additional libraries (psycopg2) to be installed on the server, which may be a pain to do, especially for all the bunch of distros that we support.

  • aim for having an *_engine: postgres default for bridges (see the Appservice Discord bridge), but still have the bridge role tasks detect an old database and error out before overwriting the configuration and pointing the bridge to Postgres

  • as part of this erroring out, we provide 2 resolution paths (letting people stay on SQLite; or telling them how to migrate to Postgres)

  • to do the (SQLite -> Postgres) migration we have a new command --tags=import-generic-sqlite-db living in the matrix-postgres role. It's to be used to import any database (from any bridge, etc.), into Postgres. Whenever it completes, it renames the database file (file.db -> file.db.backup). Re-running the playbook after that should avoid the aforementioned "erroring out" and should set up a bridge to use Postgres

As mentioned in the summary above, most of these things are only implemented for the Appservice Discord bridge. I'm yet to spend time on doing the same for the other bridges. Well, if someone else wants to help with that, that'd be great. Now that the complicated part is done, it should be relatively easy to fix these things up.

Talking about the Appservice Discord bridge, I've already used the playbook's instructions to migrate my data from SQLite to Postgres. It seems like it worked, but I'm yet to see any Discord activity and confirm whether it all works.

I'll likely spend time migrating the Facebook bridge tomorrow. At least that's a bridge I can test easily.

We were running into conflicts, because having initialized
the roles (users) and databases, trying to import leads to
errors (role XXX already exists, etc.).

We were previously ignoring the Synapse database (`homeserver`)
when upgrading/importing, because that one gets created by default
whenever the container starts.

For our additional databases, it's a similar situation now.
It's not created by default as soon as Postgres starts with an empty
database, but rather we create it as part of running the playbook.

So we either need to skip those role/database creation statements
while upgrading/importing, or to avoid creating the additional database
and rely on the import for that. I've gone for the former, because
it's already similar to what we were doing and it's simpler
(it lets `setup_postgres.yml` be the same in all scenarios).
@spantaleev
Copy link
Owner

I've added (SQLite + Postgres) support to the remaining services.

In dd797ba, I've also fixed --tags=upgrade-postgres and --tags=import-postgres to not choke with all these new databases that we're creating.

I think that what remains to be done before we can merge this, is:

  • fix up the mx-puppet-* bridges, so that they support automatic migration like all the other services
  • find someone to test Slack/IRC bridge importing. If we can't, we can always default what we can't test to sqlite for now
  • find someone to test the other bridges (those that support automatic migration already)

It might also be nice to get pgloader running on ARM, but we can also leave this for another time.

In cases where pgloader is not enough and we need to do some additional
migration work after it, we can now use
`additional_psql_statements_list` and
`additional_psql_statements_db_name`.

This is to be used when migrating `matrix-registration`'s data at the
very least.
Now that 0.7.2 is out, the Docker image supports Postgres
and we can do the (SQLite -> Postgres) migration.

I've also found out that we needed to fix up the `tokens.ex_date` column
data type a bit to prevent matrix-registration from raising exceptions
when comparing `datetime.now()` with `ex_date` coming from the database.

Example:

> File "/usr/local/lib/python3.8/site-packages/matrix_registration/tokens.py", line 58, in valid
> expired = self.ex_date < datetime.now()
> TypeError: can't compare offset-naive and offset-aware datetimes
@jdreichmann jdreichmann marked this pull request as ready for review December 16, 2020 05:43
@spantaleev
Copy link
Owner

Lately, I've:

  • added automatic (SQLite -> Postgres) migration support to all mx-puppet-* bridges too (using pgloader)
  • added automatic (nedb -> Postgres) migration support to appservice-irc and appservice-slack (using scripts that are part of their Docker images)
  • added self-building support for the pgloader component. This allows us to do this "migrate everything to Postgres" thing on ARM at the same time we release it for amd64

I've tested as much as I can, especially given that I don't use most of these bridges. It's somewhat unfortunate that we didn't get much outside testing for this, so I guess we'll have to test it upon "release". I'll put up a warning about it in the changelog.

@spantaleev spantaleev changed the title WIP: postgres: create databases for all services postgres: create databases for all services Dec 23, 2020
@spantaleev spantaleev merged commit f19b298 into spantaleev:master Dec 23, 2020
spantaleev added a commit that referenced this pull request Dec 23, 2020
@chagai95
Copy link
Contributor

I tested this again today and I messed up a few things so it was not a standard test - it worked for a moment fine, then I added the relaybot for whatsapp by changing the template and changed the template of telegram as well to allow all to use it, got kicked out everywhere and the bridges are now regenerating rooms but it might be my fault for using the wrong template at the beginning which made them use sqlite because I used my old templates that had sqlite databases in them. Also facing a problem still with the WhatsApp bridge and haven't tested the slack bridge yet...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants