Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence in public schema cannot be restored creating table in non-public schema #390

Closed
andyatkinson opened this issue Jul 13, 2023 · 6 comments · Fixed by #397
Closed

Comments

@andyatkinson
Copy link

Hello. This might not really be a bug, but I don't know how to proceed.

For our use case we want to make a copied table with a reduced set of rows, then use that table as the source table with pgcopydb. The original table is in the public schema, so a copy of the table is put into myschema but it has the same definition.

Sort of like this:

create schema if not exists myschema;
CREATE TABLE myschema.accounts (LIKE accounts INCLUDING ALL);
INSERT INTO myschema.accounts SELECT * FROM accounts where id = 123;

Now the new copied table myschema.accounts primary key id column still has a sequence public.accounts_id_seq in the public schema.

We'd like to copy the myschema.accounts table including the public scoped sequence. It finds the right number of sequences in the initial Fetching stage, but then fails in the pg_restore stage.

Filter file:

[include-only-table]
myschema.accounts

Failures finding the public schema sequence:

01:39:29 16 ERROR  pg_restore: error: could not execute query: ERROR:  relation "public.accounts_id_seq" does not exist
01:39:29 16 ERROR  LINE 2:     id integer DEFAULT nextval('public.accounts_id_seq'::reg...
01:39:29 16 ERROR                                         ^
01:39:29 16 ERROR  Command was: CREATE TABLE myschema.accounts (
01:39:29 16 ERROR      id integer DEFAULT nextval('public.accounts_id_seq'::regclass) NOT NULL,
  • Ideally we don't have to change the ownership of the sequence since it's still working for the original table. Should we copy the sequence first and configure it the same as a pre-step? That seems like a pain.
  • I tried putting public.accounts_id_sequence with and without into the .ini filter file, but that was just a guess and that doesn't seem to change the behavior.

Any advice on how to copy a schema-qualified table and get a sequence for the table from another schema with pgcopydb?

Thanks.

@andyatkinson
Copy link
Author

andyatkinson commented Jul 13, 2023

Update: I did try these steps to tediously clone the sequence first into the target schema, and with these steps pgcopydb works and copies the table over and with the sequence https://stackoverflow.com/a/18620623

From that point we could rename the table and sequence back if desired.

  • I guess we're using a serial column on (we're on PG 13) and I'm guessing with an identity column and doing CREATE TABLE ... INCLUDING IDENTITY this would be smoother and is the recommended approach?

@dimitri
Copy link
Owner

dimitri commented Jul 17, 2023

Yeah the table-sequence dependency is recorded in the Postgres catalogs and followed by pgcopydb. In your first attempt, there is no dependency between the sequence and the new table that is recorded.

I wonder if you tried an approach based on Row-Level Security, where you would assign the filter to a role and then use that role to do the pgcopydb copy. It might be the most “transparent” approach, but I can't remember its complexity compared to what you end up doing here. In terms of data, though, it works without an extra copy on the source, which is a non-trivial benefit.

@dimitri dimitri closed this as completed Jul 17, 2023
@andyatkinson
Copy link
Author

andyatkinson commented Jul 17, 2023

Re-reading what I wrote, I think I didn't clarify the pgcopydb aspect of this directly enough.

Because of scoping only to "myschema" then pgcopydb (correctly?) doesn't copy the sequence fully over and my guess was because only the sequence is in a different schema (public) vs. the table.

[include-only-table]
myschema.accounts

Yeah the table-sequence dependency is recorded in the Postgres catalogs and followed by pgcopydb.
I wasn't fully clear what's meant here. My experience was that the sequence is not copied when it's in a different schema. It was confusing because it identified the sequence, but then fails on pg_restore per the logs. So I wasn't sure what the expected behavior was for pgcopydb specifically when using the filtering configuration option that constrains to a schema-qualified table.

I don't know if that violates the design or not but I'd love to say "copy this schema.table even if it has a linked sequence in another schema" as configuration.

Regarding RLS I did appreciate your suggestion on that and did start going down that path. I made a policy and a role and a function to read that role to fetch the visible rows, as a WIP. We still might go that route but I read that RLS adds overhead to every read of those tables because the policy needs to be checked. I was worried about operational problems due to my inexperience with RLS. However, duplicating the rows is tedious so it's not like a big win either. Initially I was more comfortable with avoiding impacting reads to the source table by the application.

Maybe it makes sense to apply the policy only temporarily as needed and then remove it.

In fact, an RLS configuration that was applied and removed for the duration of the pgcopydb operation would be an amazing feature! 😁 😁

@dimitri
Copy link
Owner

dimitri commented Jul 17, 2023

Re-reading what I wrote, I think I didn't clarify the pgcopydb aspect of this directly enough.

Because of scoping only to "myschema" then pgcopydb (correctly?) doesn't copy the sequence fully over and my guess was because only the sequence is in a different schema (public) vs. the table.

I believe that when you do the following then the new table has no dependencies at all with the sequence. Here is a copy of a test case I did quicky which proves me wrong and proves you right, with a dependency query as found in pgcopydb sources to track tables and sequences:

# begin;
BEGIN
# create schema app;
CREATE SCHEMA
# create schema copy;
CREATE SCHEMA
# create table app.foo(id bigserial, f1 text);
CREATE TABLE
# create table copy.foo(like app.foo including all);
CREATE TABLE
# select s.oid::regclass, rn.nspname, r.relname, at.attname 
   from pg_class s
    join pg_depend d on d.refobjid = s.oid and d.classid = 'pg_attrdef'::regclass and d.refclassid = 'pg_class'::regclass 
    join pg_attrdef a on a.oid = d.objid
    join pg_attribute at on at.attrelid = a.adrelid and at.attnum = a.adnum
    join pg_class r on r.oid = at.attrelid
    join pg_namespace rn on rn.oid = r.relnamespace 
where s.relkind = 'S' and rn.nspname in ('app', 'copy');
┌────────────────┬─────────┬─────────┬─────────┐
│      oid       │ nspname │ relname │ attname │
├────────────────┼─────────┼─────────┼─────────┤
│ app.foo_id_seq │ app     │ foo     │ id      │
│ app.foo_id_seq │ copy    │ foo     │ id      │
└────────────────┴─────────┴─────────┴─────────┘
(2 rows)
# rollback;
ROLLBACK
[include-only-table]
myschema.accounts

So yeah I now believe our SQL query is filtering on the schema of the sequences wrongly. It should filter on the schema of the table that the sequences are attached to.

I don't know if that violates the design or not but I'd love to say "copy this schema.table even if it has a linked sequence in another schema" as configuration.

I believe it should just work as per your gut feeling.

Regarding RLS I did appreciate your suggestion on that and did start going down that path. I made a policy and a role and a function to read that role to fetch the visible rows, as a WIP. We still might go that route but I read that RLS adds overhead to every read of those tables because the policy needs to be checked. I was worried about operational problems due to my inexperience with RLS. However, duplicating the rows is tedious so it's not like a big win either. Initially I was more comfortable with avoiding impacting reads to the source table by the application.

Well you need to choose between a transparent WHERE clause being added (after some RLS computations to pick which to apply) or data set duplication. If you're only going to use the RLS setup for the pgcopydb context, I would pick that problem over the other one.

Maybe it makes sense to apply the policy only temporarily as needed and then remove it.

If I understand correctly RLS, this only applies anyway when you're using a role for which RLS rules have been set.

In fact, an RLS configuration that was applied and removed for the duration of the pgcopydb operation would be an amazing feature! 😁 😁

Some of the amazing things like that are needed are still out-of-scope of the tool. The beauty of RLS is that pgcopydb doesn't need to know anything about it. You can loop over the same command 12 times with 12 different roles and have it just work!

I will see how to fix the initial problem, re-opening the issue now.

@dimitri dimitri reopened this Jul 17, 2023
@andyatkinson
Copy link
Author

Thanks for all your hard work on this. This is a great tool and I'm starting to evangelize it within the company to more team members to help us with database copying.

I'll keep you in the loop as well although maybe outside of your issue tracker on whether we get an RLS policy going and can use it for this purpose. You're correct that we could drop it on the target with a more privileged user, change user, select the rows, change back, remove it, and that could be scripted from a shell script.

@dimitri
Copy link
Owner

dimitri commented Jul 20, 2023

This was more work than anticipated, and quite confusing for a while, but I think I got there. The sequence catalog queries were somewhat wrong and I fixed them, and then the filtering needed to be adjusted to consider the pg_dump statements for SEQUENCE, SEQUENCE OWNED BY, and DEFAULT separately. Before the fix we would always process the first two the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants