Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collation mapping #21

Closed
jeff-davis opened this issue Mar 1, 2022 · 2 comments
Closed

Collation mapping #21

jeff-davis opened this issue Mar 1, 2022 · 2 comments

Comments

@jeff-davis
Copy link

Ability to map collations on the source to a different collation on the destination.

Not sure of exact requirements here, needs investigation/discussion.

@dimitri
Copy link
Owner

dimitri commented Mar 3, 2022

I think we have three problems with collation and only one to solve:

  1. when re-using a physical index storage we might end-up with corrupted index if the OS provided collation implementation has changed below Postgres ; note that is a known problem when upgrading libc on linux for instance
  2. it might be that a collation of the same name exists in both source and target Postgres instances, but with a different implementation ; then the ordering will be different, but we avoid any corruption
  3. we might have a collation on the source Postgres database that does not exists on the target database, either because the user used CREATE COLLATION, or because of other reasons

The first case is not to be covered by pgcopydb, which does not rely on physical copy of the data.

Then case 2. and 3. are almost the same in our context.

In case 3 we could think about using CREATE COLLATION and provide a collation with the name that was used in the source database instance to make a collation of the same name and properties available on the target instance, but we might have a different implementation of the ordering rules, so user visible changes. I do not believe there is a way around that problem, we might need to accept it.

Other than using CREATE COLLATION with the same properties as on the source database instance, which pg_dump and pg_restore might already do in the case of user defined collations, remaining cases are collations that ship with Postgres core in source and not in target. I need to explore that situation, in what contexts would that be possible? Are collations created at initdb time, or part of the hard-coded catalogs?

Finally I need to add that SQL queries can also include a COLLATE clause, which means the application code could depend on collations in a way that isn't visible at all from the database itself.

@dimitri
Copy link
Owner

dimitri commented Apr 3, 2023

Closing this issue, as we have implemented --skip-collations in #160 which allows users to take care of collation mappings themselves if needed. See also pgcopydb list collations to find out what needs to be mapped.

@dimitri dimitri closed this as completed Apr 3, 2023
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants