MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

troley · 2021-10-15T09:27:07Z

The docs state that the excluding table names matching command with comma separated table names should be used to exclude tables, but the matching keyword doesn't work at all for MS SQL (also mentioned in #1167). An error is produced indicating that the matching keyword is not recognized.

The excluding table names like command does work, but only for one table - schema pair. When multiple excluding table names like commands are provided, then no data will be parsed, but the script will run successfully.

So, the following example works:

excluding table names like 'some_table' in schema 'dbo'

The following example doesn't:

excluding table names like 'some_table' in schema 'dbo'
excluding table names like 'some_table' in schema 'dbo2'

Worth mentioning is that the including only table names like command is repeatable. So the following 2 examples both work correctly:

including only table names like 'some_table' in schema 'dbo'

including only table names like 'some_table' in schema 'dbo'
including only table names like 'some_table' in schema 'dbo2'

I would expect that there is a way that makes it possible to exclude multiple table - schema pairs for MS SQL, but as far as I know/tried this doesn't seem to be possible.

pgloader --version

pgloader version "3.6.3047c9a"
compiled with SBCL 2.0.1.debian

did you test a fresh compile from the source tree?
did you search for other similar issues?
how can I reproduce the bug?

I don't have a remotely hosted MS SQL server to make this reproducible, sorry. I might add the create/insert setup queries later.

LOAD database
    FROM mssql://exampleuser:examplepass@localhost:1433/database_oct_2021
    INTO postgresql://postgres:password@localhost:5432/database_oct_2021
 
    WITH data only
 
    excluding table names like 'some_table' in schema 'dbo'
    excluding table names like 'some_table' in schema 'schema1'
    excluding table names like 'some_table' in schema 'schema2'
 
    alter schema 'dbo' rename to 'public'
 
    SET work_mem to '16MB', maintenance_work_mem to '512MB'
 
    BEFORE LOAD DO
        $$ SET TIME ZONE UTC; $$;

pgloader output you obtain

pgloader version 3.6.3047c9a
compiled with SBCL 2.0.1.debian
sb-impl::*default-external-format* :UTF-8
tmpdir: #P"/tmp/pgloader/"
2021-10-15T08:36:14.012000+02:00 NOTICE Starting pgloader, log system is ready.
2021-10-15T08:36:14.026000+02:00 INFO Starting monitor
2021-10-15T08:36:14.030000+02:00 LOG pgloader version "3.6.3047c9a"
2021-10-15T08:36:14.080000+02:00 INFO Parsed command:
LOAD database
    FROM mssql://exampleuser:examplepass@localhost:1433/database_oct_2021
    INTO postgresql://postgres:password@localhost:5432/database_oct_2021

    WITH data only

    excluding table names like 'some_table' in schema 'dbo'
    excluding table names like 'some_table' in schema 'schema1'
    excluding table names like 'some_table' in schema 'schema2'

    alter schema 'dbo' rename to 'public'

    SET work_mem to '16MB', maintenance_work_mem to '512MB'

    BEFORE LOAD DO
        $$ SET TIME ZONE UTC; $$;

2021-10-15T08:36:14.229000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:14.230000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:14.231000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:14.231000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:14.232000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:14.612000+02:00 NOTICE Executing SQL block for before load
2021-10-15T08:36:14.702000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:14.704000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:14.704000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:14.705000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:14.705000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:14.710000+02:00 DEBUG BEGIN
2021-10-15T08:36:14.711000+02:00 SQL SET TIME ZONE UTC;
2021-10-15T08:36:14.720000+02:00 LOG Migrating from #<MSSQL-CONNECTION mssql://exampleuser@localhost:1433/database_oct_2021 {1007B6BA03}>
2021-10-15T08:36:14.721000+02:00 LOG Migrating into #<PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
Max connections reached, increase value of TDS_MAX_CONN
2021-10-15T08:36:15.007000+02:00 SQL MSSQL: sending query: -- params: dbname
--         table-type-name
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
  select c.TABLE_SCHEMA,
         c.TABLE_NAME,
         c.COLUMN_NAME,
         c.DATA_TYPE,
         CASE
         WHEN c.COLUMN_DEFAULT LIKE '((%' AND c.COLUMN_DEFAULT LIKE '%))' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,4,len(c.COLUMN_DEFAULT)-6)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
             END
         WHEN c.COLUMN_DEFAULT LIKE '(%' AND c.COLUMN_DEFAULT LIKE '%)' THEN
             CASE
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'newid()' THEN 'GENERATE_UUID'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE 'convert(%varchar%,getdate(),%)' THEN 'today'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) = 'getdate()' THEN 'CURRENT_TIMESTAMP'
                 WHEN SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2) LIKE '''%''' THEN SUBSTRING(c.COLUMN_DEFAULT,3,len(c.COLUMN_DEFAULT)-4)
                 ELSE SUBSTRING(c.COLUMN_DEFAULT,2,len(c.COLUMN_DEFAULT)-2)
             END
         ELSE c.COLUMN_DEFAULT
         END,
         c.IS_NULLABLE,
         COLUMNPROPERTY(object_id(c.TABLE_NAME), c.COLUMN_NAME, 'IsIdentity'),
         c.CHARACTER_MAXIMUM_LENGTH,
         c.NUMERIC_PRECISION,
         c.NUMERIC_PRECISION_RADIX,
         c.NUMERIC_SCALE,
         c.DATETIME_PRECISION,
         c.CHARACTER_SET_NAME,
         c.COLLATION_NAME

    from INFORMATION_SCHEMA.COLUMNS c
         join INFORMATION_SCHEMA.TABLES t
              on c.TABLE_SCHEMA = t.TABLE_SCHEMA
             and c.TABLE_NAME = t.TABLE_NAME

   where     c.TABLE_CATALOG = 'database_oct_2021'
         and t.TABLE_TYPE = 'BASE TABLE'

         and ((c.table_schema = 'dbo' and c.table_name NOT LIKE 'some_table')
           and (c.table_schema = 'schema1' and c.table_name NOT LIKE 'some_table')
           and (c.table_schema = 'schema2' and c.table_name NOT LIKE 'some_table'))

order by c.table_schema, c.table_name, c.ordinal_position;
2021-10-15T08:36:15.029000+02:00 SQL MSSQL: sending query: -- params: dbname
--         including
--         filter-list-to-where-clause including
--         excluding
--         filter-list-to-where-clause excluding
   SELECT
           REPLACE(KCU1.CONSTRAINT_NAME, '.', '_') AS 'CONSTRAINT_NAME'
         , KCU1.TABLE_SCHEMA AS 'TABLE_SCHEMA'
         , KCU1.TABLE_NAME AS 'TABLE_NAME'
         , KCU1.COLUMN_NAME AS 'COLUMN_NAME'
         , KCU2.TABLE_SCHEMA AS 'UNIQUE_TABLE_SCHEMA'
         , KCU2.TABLE_NAME AS 'UNIQUE_TABLE_NAME'
         , KCU2.COLUMN_NAME AS 'UNIQUE_COLUMN_NAME'
         , RC.UPDATE_RULE AS 'UPDATE_RULE'
         , RC.DELETE_RULE AS 'DELETE_RULE'

    FROM INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU1
              ON KCU1.CONSTRAINT_CATALOG = RC.CONSTRAINT_CATALOG
                 AND KCU1.CONSTRAINT_SCHEMA = RC.CONSTRAINT_SCHEMA
                 AND KCU1.CONSTRAINT_NAME = RC.CONSTRAINT_NAME
         JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU2
              ON KCU2.CONSTRAINT_CATALOG = RC.UNIQUE_CONSTRAINT_CATALOG
                 AND KCU2.CONSTRAINT_SCHEMA = RC.UNIQUE_CONSTRAINT_SCHEMA
                 AND KCU2.CONSTRAINT_NAME = RC.UNIQUE_CONSTRAINT_NAME

   WHERE KCU1.ORDINAL_POSITION = KCU2.ORDINAL_POSITION
         AND KCU1.TABLE_CATALOG = 'database_oct_2021'
         AND KCU1.CONSTRAINT_CATALOG = 'database_oct_2021'
         AND KCU1.CONSTRAINT_SCHEMA NOT IN ('dto', 'sys')
         AND KCU1.TABLE_SCHEMA NOT IN ('dto', 'sys')
         AND KCU2.TABLE_SCHEMA NOT IN ('dto', 'sys')


         and ((kcu1.table_schema = 'dbo' and kcu1.table_name NOT LIKE 'some_table') and (kcu1.table_schema = 'schema1' and kcu1.table_name NOT LIKE 'some_table') and (kcu1.table_schema = 'schema2' and kcu1.table_name NOT LIKE 'some_table'))

ORDER BY KCU1.CONSTRAINT_NAME, KCU1.ORDINAL_POSITION;
2021-10-15T08:36:15.070000+02:00 INFO Processing source catalogs
2021-10-15T08:36:15.094000+02:00 NOTICE Prepare PostgreSQL database.
2021-10-15T08:36:15.191000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {1007B6D703}>
2021-10-15T08:36:15.192000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.193000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.193000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.193000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.194000+02:00 DEBUG BEGIN
2021-10-15T08:36:15.232000+02:00 SQL --
-- get user defined SQL types
--
  select nt.nspname,
         extname,
         typname,
         case when enum.enumtypid is not null
              then array_agg(enum.enumlabel order by enumsortorder)
          end as enumvalues

    from pg_class c
         join pg_namespace n on n.oid = c.relnamespace
         left join pg_attribute a on c.oid = a.attrelid and a.attnum > 0
         join pg_type t on t.oid = a.atttypid
         left join pg_namespace nt on nt.oid = t.typnamespace
         left join pg_depend d on d.classid = 'pg_type'::regclass
                              and d.refclassid = 'pg_extension'::regclass
                              and d.objid = t.oid
         left join pg_extension e on refobjid = e.oid
         left join pg_enum enum on enum.enumtypid = t.oid

   where nt.nspname !~ '^pg_' and nt.nspname <> 'information_schema'
         and n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         and c.relkind in ('r', 'f', 'p')


         and
           (   t.typrelid = 0
            or
               (select c.relkind = 'c'
                 from pg_class c
                where c.oid = t.typrelid)
           )
           and not exists
             (
                select 1
                  from pg_type el
                 where el.oid = t.typelem
                   and el.typarray = t.oid
              )

group by nt.nspname, extname, typname, enumtypid
order by nt.nspname, extname, typname, enumtypid;
2021-10-15T08:36:15.288000+02:00 SQL -- params: table-type-name
--         including
--         filter-list-to-where-clause for including
--         excluding
--         filter-list-to-where-clause for excluding
with seqattr as
 (
   select adrelid,
          adnum,
          pg_get_expr(d.adbin, d.adrelid) as adsrc,
          case when pg_get_expr(d.adbin, d.adrelid) ~ 'nextval'
               then substring(pg_get_expr(d.adbin, d.adrelid)
                              from '''([^'']+)'''
                    )
               else null
           end as seqname
     from pg_attrdef d
 )
    select nspname, relname, c.oid, attname,
           t.oid::regtype as type,
           case when atttypmod > 0
                then substring(format_type(t.oid, atttypmod) from '\d+(?:,\d+)?')
                else null
            end as typmod,
           attnotnull,
           case when atthasdef
                then pg_get_expr(def.adbin, def.adrelid)
            end as default           ,
           case when s.seqname is not null then 'auto_increment' end as extra
      from pg_class c
           join pg_namespace n on n.oid = c.relnamespace
           left join pg_attribute a on c.oid = a.attrelid
           join pg_type t on t.oid = a.atttypid and attnum > 0
           left join pg_attrdef def on a.attrelid = def.adrelid
                                   and a.attnum = def.adnum
                                   and a.atthasdef
           left join seqattr s on def.adrelid = s.adrelid
                              and def.adnum = s.adnum

     where nspname !~ '^pg_' and n.nspname <> 'information_schema'
           and relkind in ('r', 'f', 'p')



  order by nspname, relname, attnum;
2021-10-15T08:36:15.322000+02:00 SQL -- params: including
--         filter-list-to-where-clause for including
--         excluding
--         filter-list-to-where-clause for excluding
  select n.nspname,
         i.relname,
         i.oid,
         rn.nspname,
         r.relname,
         indisprimary,
         indisunique,
         (select string_agg(attname, ',')
            from pg_attribute
           where attrelid = r.oid
             and array[attnum::integer] <@ indkey::integer[]
         ) as cols,
         pg_get_indexdef(indexrelid),
         c.conname,
         pg_get_constraintdef(c.oid)
    from pg_index x
         join pg_class i ON i.oid = x.indexrelid
         join pg_class r ON r.oid = x.indrelid
         join pg_namespace n ON n.oid = i.relnamespace
         join pg_namespace rn ON rn.oid = r.relnamespace
         left join pg_depend d on d.classid = 'pg_class'::regclass
                              and d.objid = i.oid
                              and d.refclassid = 'pg_constraint'::regclass
                              and d.deptype = 'i'
         left join pg_constraint c ON c.oid = d.refobjid
   where n.nspname !~ '^pg_' and n.nspname <> 'information_schema'


order by n.nspname, r.relname
2021-10-15T08:36:15.360000+02:00 SQL -- params: including (table)
--         filter-list-to-where-clause for including
--         excluding (table)
--         filter-list-to-where-clause for excluding
--         including (ftable)
--         filter-list-to-where-clause for including
--         excluding (ftable)
--         filter-list-to-where-clause for excluding
 select n.nspname, c.relname, nf.nspname, cf.relname as frelname,
        r.oid,
        d.refobjid as pkeyoid,
        conname,
        pg_catalog.pg_get_constraintdef(r.oid, true) as condef,
        (select string_agg(attname, ',')
           from pg_attribute
          where attrelid = r.conrelid
            and array[attnum::integer] <@ conkey::integer[]
        ) as conkey,
        (select string_agg(attname, ',')
           from pg_attribute
          where attrelid = r.confrelid
            and array[attnum::integer] <@ confkey::integer[]
        ) as confkey,
        confupdtype, confdeltype, confmatchtype,
        condeferrable, condeferred
   from pg_catalog.pg_constraint r
        JOIN pg_class c on r.conrelid = c.oid
        JOIN pg_namespace n on c.relnamespace = n.oid
        JOIN pg_class cf on r.confrelid = cf.oid
        JOIN pg_namespace nf on cf.relnamespace = nf.oid
        JOIN pg_depend d on d.classid = 'pg_constraint'::regclass
                        and d.refclassid = 'pg_class'::regclass
                        and d.objid = r.oid
                        and d.refobjsubid = 0
   where r.contype = 'f'
         AND c.relkind in ('r', 'f', 'p')
         AND cf.relkind in ('r', 'f', 'p')
         AND n.nspname !~ '^pg_' and n.nspname <> 'information_schema'
         AND nf.nspname !~ '^pg_' and nf.nspname <> 'information_schema'




2021-10-15T08:36:15.387000+02:00 SQL -- params pkey-oid-list
--        fkey-oild-list
with pkeys(oid) as (
  values(17438),(17436),(17449),(17426),(17419),(17457),(17468),(17518),(17520),(17507),(17515),(17479),(17492),(17490),(17383),(17381),(17402),(17517),(17409),(17391),(17535),(17537),(17549),(17547),(17321),(17323),(17333),(17335),(17191),(17193),(17310),(17308),(17344),(17342),(17367),(17365)
),
     knownfkeys(oid) as (
  values(17610),(17615),(17620),(17630),(17625),(17635)
),
  pkdeps as (
  select pkeys.oid, pg_depend.objid
    from pg_depend
         join pkeys on pg_depend.refobjid = pkeys.oid
   where     classid = 'pg_catalog.pg_constraint'::regclass
         and refclassid = 'pg_catalog.pg_class'::regclass
)
 select n.nspname, c.relname, nf.nspname, cf.relname as frelname,
        r.oid as conoid, conname,
        pg_catalog.pg_get_constraintdef(r.oid, true) as condef,
        pkdeps.oid as index_oid
   from pg_catalog.pg_constraint r
        JOIN pkdeps on r.oid = pkdeps.objid
        JOIN pg_class c on r.conrelid = c.oid
        JOIN pg_namespace n on c.relnamespace = n.oid
        JOIN pg_class cf on r.confrelid = cf.oid
        JOIN pg_namespace nf on cf.relnamespace = nf.oid
  where NOT EXISTS (select 1 from knownfkeys where oid = r.oid)
2021-10-15T08:36:15.391000+02:00 DEBUG fetch-pgsql-catalog: 21 tables, 36 indexes, 6+6 fkeys
2021-10-15T08:36:15.391000+02:00 INFO Done with COPYing data, waiting for indexes
2021-10-15T08:36:15.405000+02:00 NOTICE Completing PostgreSQL database.
2021-10-15T08:36:15.421000+02:00 NOTICE Reset sequences
2021-10-15T08:36:15.507000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {100C450043}>
2021-10-15T08:36:15.508000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.509000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.510000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.510000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.510000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.511000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.511000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.512000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.652000+02:00 DEBUG CONNECTED TO #<PGLOADER.PGSQL:PGSQL-CONNECTION pgsql://postgres@localhost:5432/database_oct_2021 {100CB25503}>
2021-10-15T08:36:15.653000+02:00 DEBUG SET client_encoding TO 'utf8'
2021-10-15T08:36:15.654000+02:00 DEBUG SET work_mem TO '16MB'
2021-10-15T08:36:15.654000+02:00 DEBUG SET maintenance_work_mem TO '512MB'
2021-10-15T08:36:15.654000+02:00 DEBUG SET application_name TO 'pgloader'
2021-10-15T08:36:15.656000+02:00 DEBUG BEGIN
2021-10-15T08:36:15.667000+02:00 LOG report summary reset
             table name     errors       read   imported      bytes      total time       read      write
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
            before load          0          1          1                     0.102s
        fetch meta data          0          0          0                     0.343s
      Drop Foreign Keys          0          0          0                     0.000s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
COPY Threads Completion          0          4          4                     0.000s
        Reset Sequences          0         13         13                     0.151s
    Create Foreign Keys          0          0          0                     0.000s
       Install Comments          0          0          0                     0.000s
-----------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------
      Total import time          ✓          0          0                     0.151s
2021-10-15T08:36:15.731000+02:00 INFO Stopping monitor

data that is being loaded, if relevant

No data, that is the issue. I would expect data to be parsed from a total of 18 tables in my case. 3 tables, each from a different schema should be excluded.

How the data is different from what you expected, if relevant

The text was updated successfully, but these errors were encountered:

Allows a pgloader load file to parse data correctly when multiple `excluding table names like 'TABLE' in schema 'SCHEMA'` filter rules have been provided. Fixes dimitri#1328

jcotterill · 2024-12-12T09:23:43Z

I have the same issue now was any fix/work around implemented? Thanks

troley linked a pull request Oct 15, 2021 that will close this issue

MS SQL: allow multiple table and schema excluding rules to be processed correctly #1329

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

troley commented Oct 15, 2021 •

edited

Loading

jcotterill commented Dec 12, 2024

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

MS SQL: multiple excluding table names like 'TABLE' in schema 'SCHEMA' not parsing any data #1328

Comments

troley commented Oct 15, 2021 • edited Loading

jcotterill commented Dec 12, 2024

troley commented Oct 15, 2021 •

edited

Loading