Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix resume on interruped index creation #783

Merged
merged 1 commit into from
May 16, 2024

Conversation

arajkumar
Copy link
Contributor

If we interrupt the process while creating an index, the summary catalog will show the index command as CREATE INDEX ... When resuming, pgcopydb alters the command to CREATE INDEX IF NOT EXISTS .. to avoid errors if index already created on target. However, summary_lookup_index() will change it back to CREATE INDEX .. because it replaces the command from the summary catalog table.

Steps to reproduce:

  1. Create a table on the source with few indexes,
psql -d $SOURCE -f - << EOF

CREATE TABLE IF NOT EXISTS "metrics"(
    id BIGINT,
    "time" timestamp with time zone NOT NULL,
    name TEXT NOT NULL,
    value NUMERIC NOT NULL
);

CREATE INDEX i1 ON metrics("time", name);
CREATE INDEX i2 ON metrics("time", name);
CREATE INDEX i3 ON metrics("time", name);
CREATE INDEX i4 ON metrics("time", name);
CREATE INDEX i5 ON metrics("time", name);

EOF
  1. Insert some data into the table
psql -d $SOURCE -f - << EOF
INSERT INTO metrics (id, time, name, value)
SELECT
    (random() * 999999 + 1)::int AS id,
    NOW() + ((random() * 998 + 2)::int || ' days')::interval AS time,
    'metric_' || (random() * 999999 + 1)::int AS name,
    (random() * 100)::int AS value
FROM generate_series(1, 1000000);
EOF
  1. Perform the migration,
pgcopydb clone --no-acl --no-owner --notice --not-consistent --skip-extensions --restart --index-jobs 1
  1. Interrupt when you see index creation message,
10:07:16.331 1667324 NOTICE COPY public.metrics
10:07:17.110 1667336 NOTICE CREATE UNIQUE INDEX metrics_pkey ON public.metrics USING btree (id, "time");
10:07:17.136 1667335 NOTICE CREATE INDEX i1 ON public.metrics USING btree ("time", name);
10:07:17.138 1667334 NOTICE CREATE INDEX i2 ON public.metrics USING btree ("time", name);
10:07:17.141 1667337 NOTICE CREATE INDEX i3 ON public.metrics USING btree ("time", name);
^C10:07:17.776 1667337 ERROR  Postgres query was interrupted: CREATE INDEX i3 ON public.metrics USING btree ("time", name);
10:07:17.776 1667336 ERROR  Postgres query was interrupted: CREATE UNIQUE INDEX metrics_pkey ON public.metrics USING btree (id, "time");
10:07:17.777 1667335 ERROR  Postgres query was interrupted: CREATE INDEX i1 ON public.metrics USING btree ("time", name);
10:07:17.777 1667334 ERROR  Postgres query was interrupted: CREATE INDEX i2 ON public.metrics USING btree ("time", name);
10:07:17.778 1667337 ERROR  Failed to create index with oid 115759, see above for details
10:07:17.778 1667335 ERROR  Failed to create index with oid 115757, see above for details
10:07:17.778 1667336 ERROR  Failed to create index with oid 115755, see above for details
10:07:17.778 1667337 ERROR  CREATE INDEX worker has been interrupted
10:07:17.778 1667334 ERROR  Failed to create index with oid 115758, see above for details
10:07:17.778 1667335 ERROR  CREATE INDEX worker has been interrupted
10:07:17.778 1667336 ERROR  CREATE INDEX worker has been interrupted
10:07:17.781 1667334 ERROR  CREATE INDEX worker has been interrupted
10:07:17.876 1667323 ERROR  Sub-process 1667334 exited with code 12
10:07:17.876 1667323 ERROR  Sub-process 1667335 exited with code 12
10:07:17.876 1667323 ERROR  Sub-process 1667336 exited with code 12
10:07:17.876 1667323 ERROR  Sub-process 1667337 exited with code 12
10:07:17.876 1667323 ERROR  Some INDEX worker process(es) have exited with error, see above for details
  1. Attempt resume,
pgcopydb clone --no-acl --no-owner --notice --not-consistent --skip-extensions --resume --index-jobs 1

Resume will fail with following log,

0:07:45.427 1667443 ERROR  [TARGET 434730] SQL query: CREATE UNIQUE INDEX metrics_pkey ON public.metrics USING btree (id, "time");
10:07:45.427 1667443 ERROR  [TARGET 434730] SQL params:
10:07:45.427 1667443 ERROR  Failed to create index with oid 115755, see above for details
10:07:45.434 1667444 NOTICE CREATE INDEX i1 ON public.metrics USING btree ("time", name);
10:07:45.436 1667444 ERROR  [TARGET 434731] ERROR:  relation "i1" already exists
10:07:45.436 1667444 ERROR  [TARGET 434731] ERROR:  relation "i1" already exists
10:07:45.436 1667444 ERROR  [TARGET 434731] SQL query: CREATE INDEX i1 ON public.metrics USING btree ("time", name);
10:07:45.436 1667444 ERROR  [TARGET 434731] SQL params:
10:07:45.436 1667444 ERROR  Failed to create index with oid 115757, see above for details
10:07:45.437 1667446 NOTICE CREATE INDEX i2 ON public.metrics USING btree ("time", name);
10:07:45.437 1667445 NOTICE Found stale pid 1667337 removing it to process index public.i3
10:07:45.438 1667446 ERROR  [TARGET 434733] ERROR:  relation "i2" already exists
10:07:45.438 1667446 ERROR  [TARGET 434733] ERROR:  relation "i2" already exists
10:07:45.438 1667446 ERROR  [TARGET 434733] SQL query: CREATE INDEX i2 ON public.metrics USING btree ("time", name);
10:07:45.438 1667446 ERROR  [TARGET 434733] SQL params:
10:07:45.439 1667446 ERROR  Failed to create index with oid 115758, see above for details
10:07:45.444 1667445 NOTICE CREATE INDEX i3 ON public.metrics USING btree ("time", name);
10:07:45.447 1667445 ERROR  [TARGET 434732] ERROR:  relation "i3" already exists
10:07:45.447 1667445 ERROR  [TARGET 434732] ERROR:  relation "i3" already exists
10:07:45.447 1667445 ERROR  [TARGET 434732] SQL query: CREATE INDEX i3 ON public.metrics USING btree ("time", name);
10:07:45.447 1667445 ERROR  [TARGET 434732] SQL params:
10:07:45.447 1667445 ERROR  Failed to create index with oid 115759, see above for details
10:07:45.449 1667443 NOTICE CREATE INDEX IF NOT EXISTS i4 ON public.metrics USING btree ("time", name);
10:07:45.481 1667444 ERROR  CREATE INDEX worker 1667444 encountered 1 errors, see above for details
10:07:45.483 1667445 ERROR  CREATE INDEX worker 1667445 encountered 1 errors, see above for details
10:07:45.486 1667446 ERROR  CREATE INDEX worker 1667446 encountered 1 errors, see above for details

Solution: Change the index definition just before executing it against the target.

With this fix, you will see the following message and clone would complete,

0:08:40.441 1667844 INFO   Skipping large objects: none found.
10:08:40.449 1667844 INFO   STEP 9: reset sequences values
10:08:40.489 1667861 NOTICE Now starting setval process 1667861 [1667844]
10:08:40.489 1667861 INFO   Skipping sequences, already done on a previous run
10:08:40.524 1667849 NOTICE Skipping table public.metrics, already done on a previous run
10:08:40.524 1667849 INFO   Skipping table-data public.metrics (115750), already done on a previous run
10:08:40.538 1667857 NOTICE Found stale pid 1667507 removing it to process index public.metrics_pkey
10:08:40.552 1667857 NOTICE CREATE UNIQUE INDEX IF NOT EXISTS metrics_pkey ON public.metrics USING btree (id, "time");
10:08:40.552 1667860 NOTICE Found stale pid 1667506 removing it to process index public.i1
10:08:40.554 1667857 WARN   NOTICE:  relation "metrics_pkey" already exists, skipping
10:08:40.555 1667859 NOTICE Found stale pid 1667505 removing it to process index public.i2
10:08:40.560 1667858 NOTICE Found stale pid 1667508 removing it to process index public.i3
10:08:40.566 1667860 NOTICE CREATE INDEX IF NOT EXISTS i1 ON public.metrics USING btree ("time", name);
10:08:40.568 1667860 WARN   NOTICE:  relation "i1" already exists, skipping
10:08:40.577 1667859 NOTICE CREATE INDEX IF NOT EXISTS i2 ON public.metrics USING btree ("time", name);
10:08:40.580 1667859 WARN   NOTICE:  relation "i2" already exists, skipping
10:08:40.585 1667858 NOTICE CREATE INDEX IF NOT EXISTS i3 ON public.metrics USING btree ("time", name);
10:08:40.587 1667858 WARN   NOTICE:  relation "i3" already exists, skipping
10:08:40.625 1667860 NOTICE ALTER TABLE public.metrics ADD CONSTRAINT metrics_pkey PRIMARY KEY USING INDEX metrics_pkey
10:08:40.754 1667844 INFO   STEP 10: restore the post-data section to the target database
10:08:40.756 1667844 NOTICE  /usr/bin/pg_restore -f /tmp/pgcopydb/schema/post-out.list -l /tmp/pgcopydb/schema/post.dump
10:08:40.807 1667844 NOTICE Skipping already processed dumpId 6544: CONSTRAINT 115756 public metrics metrics_pkey tsdbadmin
10:08:40.809 1667844 NOTICE Skipping already processed dumpId 6539: INDEX 115757 public i1 tsdbadmin
10:08:40.810 1667844 NOTICE Skipping already processed dumpId 6540: INDEX 115758 public i2 tsdbadmin
10:08:40.810 1667844 NOTICE Skipping already processed dumpId 6541: INDEX 115759 public i3 tsdbadmin
10:08:40.810 1667844 NOTICE Skipping already processed dumpId 6542: INDEX 115760 public i4 tsdbadmin
10:08:40.811 1667844 NOTICE Write filtered pg_restore list file at "/tmp/pgcopydb/schema/post-filtered.list"
10:08:40.812 1667844 INFO    /usr/bin/pg_restore --dbname 'postgres://tsdbadmin@pnig0y2kp5.j4ou7fmli5.tsdb.cloud.timescale.com:37936/tsdb?sslmode=require&keepalives=1&keepalives_idle=10&keepalives_interval=10&keepalives_count=60' --jobs 4 --no-owner --no-acl --use-list /tmp/pgcopydb/schema/post-filtered.list /tmp/pgcopydb/schema/post.dump
10:08:41.058 1667844 INFO   All step are now done, 900ms elapsed
10:08:41.059 1667844 INFO   Printing summary for 1 tables and 5 indexes
10:08:41.059 1667844 NOTICE Storing migration summary in JSON file "/tmp/pgcopydb/summary.json"

   OID | Schema |    Name | Parts | copy duration | transmitted bytes | indexes | create index duration
-------+--------+---------+-------+---------------+-------------------+---------+----------------------
115750 | public | metrics |     1 |         772ms |             25 MB |       5 |                 765ms


                                               Step   Connection    Duration    Transfer   Concurrency
 --------------------------------------------------   ----------  ----------  ----------  ------------
   Catalog Queries (table ordering, filtering, etc)       source      41d22h                         1
                                        Dump Schema       source       977ms                         1
                                     Prepare Schema       target       250ms                         1
      COPY, INDEX, CONSTRAINTS, VACUUM (wall clock)         both      41d22h                        12
                                  COPY (cumulative)         both         0ms                         4
                          CREATE INDEX (cumulative)       target        52ms                         4
                           CONSTRAINTS (cumulative)       target         5ms                         4
                                VACUUM (cumulative)       target         0ms                         4
                                    Reset Sequences         both        42ms                         1
                         Large Objects (cumulative)       (null)         0ms                         0
                                    Finalize Schema         both       298ms
                4
 --------------------------------------------------   ----------  ----------  ----------  ------------
                          Total Wall Clock Duration         both       900ms                        20

If we interrupt the process while creating an index, the summary catalog
will show the index command as `CREATE INDEX ..`. When resuming,
pgcopydb alters the command to `CREATE INDEX IF NOT EXISTS ..` to avoid errors if index already created on target. However, summary_lookup_index() will change it back to `CREATE INDEX ..` because it replaces the command from the summary catalog table.

Solution: Change the index definition just before executing it against the target.

Signed-off-by: Arunprasad Rajkumar <ar.arunprasad@gmail.com>
@dimitri dimitri merged commit c74047a into dimitri:main May 16, 2024
18 checks passed
@dimitri
Copy link
Owner

dimitri commented May 16, 2024

Thanks @arajkumar !

@dimitri dimitri added the bug Something isn't working label May 16, 2024
@dimitri dimitri added this to the v0.16 milestone May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants