Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Migrate localpart-only columns to full MXIDs #15396

Open
squahtx opened this issue Apr 5, 2023 · 3 comments
Open

Migrate localpart-only columns to full MXIDs #15396

squahtx opened this issue Apr 5, 2023 · 3 comments
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Future-Maintenance Things that can't yet be done, but will need cleaning up in a couple of months/releases

Comments

@squahtx
Copy link
Contributor

squahtx commented Apr 5, 2023

profiles.user_id and user_filters.user_id (are there more?) currently store localparts only, which is inconsistent with the rest of the schema. We'd like to move to storing full MXIDs.

The plan

The migration will happen over two schema versions.

Current schema version +0
Can use databases up to and including schema version +1.

Schema version +1
Add a new column, called full_user_id, which is nullable and unique.
Add background update to populate it.
Modify reads to query both the localpart and full user IDs (localparts can't include @s or :s and so can't be confused with a full MXID)

Can use databases up to and including schema version +2.

TODO: we need to modify reads to user_id to check for both the full MXID and localpart, to work with schema version +2. Reads and writes should not touch full_user_id, otherwise Synapse won't be compatible with schema version +2, where there is no longer a full_user_id column.

Schema version +2
Force the background job to complete
TODO: figure out how. We may have to duplicate the background job logic in the schema migration.
NB: just because the background update has finished does not mean that full_user_id is fully populated. The server might have been rolled back to a previous Synapse version after it completed.
Make full_user_id non-nullable (see @\reivilibre's comment below about avoiding table locks)
Drop user_id and rename full_user_id to user_id in a schema migration.
Unqueue the background job, just in case it hasn't run yet/is still in progress.
Update reads to only query for the full user ID.
For this step, we want to audit usages of both {profiles,user_filters}.user_id and {profiles,user_filters}.full_user_id since there are still some reads that only use the old user_id.

Update SCHEMA_COMPAT_VERSION to schema version +1.
ie. code expecting schema version +0 or later cannot use the database, but code expecting schema version +1 or later can.

@squahtx squahtx self-assigned this Apr 5, 2023
@squahtx squahtx added T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db labels Apr 5, 2023
@reivilibre
Copy link
Contributor

reivilibre commented Apr 6, 2023

For Postgres, I'd advise learning about NOT VALID constraints.

I think you +1 the schema version and:

  • ADD COLUMN full_user_id
  • ADD CHECK (full_user_id IS NOT NULL) NOT VALID ← crucial step, which we recently missed whilst doing a NOT NULL migration :(
  • update the insert/update code to set full_user_id, since insertions that don't satisfy NOT NULL will now fail

Start a background job which does this:

  • do background job to update all rows
  • VALIDATE CONSTRAINT — this enables the constraint and does a table scan, but does not need a lock whilst doing the table scan (this is why you want the NOT VALID constraint, because adding a valid constraint directly does require a locked table scan)

In v +2 (or some other later point, where we don't care about incremental updating anymore):

  • forcibly VALIDATE CONSTRAINT in the foreground, if needed then update rows
  • optional: ALTER COLUMN full_user_id NOT NULL (should be free because you have a check constraint, c.f. the docs) and then DROP the CHECK

You could leave it as a check constraint, but it feels more consistent to turn it into a plain NOT NULL constraint; I don't know if there's any functional difference otherwise.

https://www.postgresql.org/docs/15/sql-altertable.html

I hope that makes sense?

(edit: the plan described in this comment is not complete. We have a more up-to-date one but I haven't written it here...)

@squahtx
Copy link
Contributor Author

squahtx commented Apr 21, 2023

There's a branch with my initial attempt at part 1 of the migration at https://github.com/matrix-org/synapse/commits/squah/expand_localpart_columns_1

This is superseded by @\H-Shay's work.

@H-Shay
Copy link
Contributor

H-Shay commented May 1, 2023

For tracking purposes a rough outline of the next steps are:

  • add not valid check to full_user_id column, bump schema compat version, add a background update to migrate missing rows in full_user_id from user_localpart
  • after background job is complete, shift to read from user_localpart to full_user_id, bump schema compat version
  • stop reading from user_localpart, bump schema compat version
  • drop user_localpart and bump schema compat version

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. Z-Future-Maintenance Things that can't yet be done, but will need cleaning up in a couple of months/releases
Projects
None yet
Development

No branches or pull requests

3 participants