sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

andy-kimball · 2020-02-18T00:02:35Z

Run the following SQL script:

SET experimental_enable_primary_key_changes = true;
CREATE TABLE tdup (x INT PRIMARY KEY, y INT NOT NULL, z INT NOT NULL);
INSERT INTO tdup VALUES (1, 1, 1);
ALTER TABLE tdup ALTER PRIMARY KEY USING COLUMNS (y, z);
DROP INDEX tdup_x_key CASCADE;

EXPECTED: This should switch the primary key to columns (y, z) and drop the unique index synthesized by the ALTER PRIMARY KEY code.

ACTUAL: The DROP INDEX statement hangs.

NOTE: This happens with and without the #45093 fix.

The text was updated successfully, but these errors were encountered:

andy-kimball · 2020-02-18T00:04:43Z

Hm, it looks like after > 1 minute, it did complete. Is that expected?

rohany · 2020-02-18T00:08:38Z

Yes, this is expected. The current job framework is unable to spawn child jobs from a given job. To get around this, the primary key change process lets the async schema changer pick up and run the drop index jobs. Until the async schema changer can pick up these jobs, mutations queued up after will have to wait. The interval for these schema changers is 1 min +- some jitter, so this seems like it's working as expected.

Lucy and I have been thinking about how to get around this blemish though.

jordanlewis · 2020-03-03T20:26:29Z

When @otan's NOTICE feature gets in, let's add a NOTICE that says the schema change will take a minute to kick off as a mitigation if we can't fix the issue completely.

thoszhang · 2020-04-02T20:54:50Z

I think if we implement the same pattern as in #46929, to use CreateStartableJobInTxn and then start the job after we publish the table descriptor, we should be able to at least have the index drop job start immediately. This would at least decrease the amount of time we're hanging.

thoszhang · 2020-04-17T20:08:16Z

#47624 implements what I described above.

47624: sql: start index drop job for primary key changes immediately r=lucy-zhang a=lucy-zhang Previously, the job to drop old indexes after a primary key change was created at the end of the schema change and left for the job registry to adopt at some later time. This PR switches the job creation over to `CreateStartableJobWithTxn`, which allows for starting the job immediately after the transaction which creates the job is committed. This eliminates one source of waiting time before other schema changes, which are queued behind the mutation to drop the indexes, can run. Note, however, that successive schema changes don't start running immediately after the mutations ahead of them in the queue are all cleared; if they've ever hit a retry error due to not being first in line, then they will have to wait to be re-adopted by the job registry. This is why the existing primary key change tests still need to have the adopt interval set to a low value in order to finish quickly. Addresses #45150. Release note (performance improvement): The cleanup job which runs after a primary key change to remove old indexes, which blocks other schema changes from running, now starts immediately after the primary key swap is complete. This reduces the amount of waiting time before subsequent schema changes can run. 47748: cli/interactive_tests: deflake test_demo_partitioning r=rohany a=knz Fixes #47747 The test is meant to tolerate a failure in the licensing server at the beginning. This is done correctly, however the check does not work if the `spawn` statement itself fails. This patch corrects the flake by avoiding `spawn` and using a bourne shell to spawn the `cockroach` processes instead. Release note: None Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

thoszhang · 2020-04-22T04:00:04Z

#47624 improves the situation, but I think to really consider this problem solved, we should

wait to return results to the client until dropping the indexes is completed, to make the user experience more predictable; and
reduce the amount of redundant waiting we do for schema change jobs (sql: schema changes can be very slow #47790 covers this).

48045: sql: enable indexing and ordering on arrays of orderable and indexable types r=rohany a=rohany Fixes #17154. Fixes #35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types. 48608: schemachange: speed up slow schema changes r=spaskob a=spaskob Touches #45150. Fixes #47607. Touches #47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on re-triable errors we retry with exponential backoff in the schema change code. This pattern of dealing with re-triable errors in client job code is encouraged vs relying on the registry because the latter leads to slowness and additionally to more complicated test fixtures that rely on hacking with the internals of the job registry, Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Spas Bojanov <pachob@gmail.com>

spaskob · 2020-05-09T01:34:26Z

I just merged #48608 which may further improve the performance for this case.

ajwerner · 2020-10-21T20:00:12Z

This all happens quite quickly now. I'm closing this.

andy-kimball assigned rohany Feb 18, 2020

rohany assigned thoszhang Feb 26, 2020

This was referenced Feb 27, 2020

sql: drop rowid column when rowid primary key is dropped in primary key change #44923

Open

sql: disable primary key changes when a schema change is in progress #45513

Merged

jordanlewis changed the title ~~sql: Hang when dropping unique index created after ALTER PRIMARY KEY~~ sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY Mar 3, 2020

rohany mentioned this issue Mar 4, 2020

sql: add a notice for async jobs in alter primary key #45730

Closed

thoszhang mentioned this issue Apr 18, 2020

sql: start index drop job for primary key changes immediately #47624

Merged

spaskob mentioned this issue May 9, 2020

schemachange: speed up slow schema changes #48608

Merged

spaskob mentioned this issue May 9, 2020

release-20.1: schemachange: speed up slow schema changes #48621

Merged

ajwerner closed this as completed Oct 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

andy-kimball commented Feb 18, 2020

andy-kimball commented Feb 18, 2020

rohany commented Feb 18, 2020 •

edited

Loading

jordanlewis commented Mar 3, 2020

thoszhang commented Apr 2, 2020

thoszhang commented Apr 17, 2020

thoszhang commented Apr 22, 2020

spaskob commented May 9, 2020

ajwerner commented Oct 21, 2020

sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150

Comments

andy-kimball commented Feb 18, 2020

andy-kimball commented Feb 18, 2020

rohany commented Feb 18, 2020 • edited Loading

jordanlewis commented Mar 3, 2020

thoszhang commented Apr 2, 2020

thoszhang commented Apr 17, 2020

thoszhang commented Apr 22, 2020

spaskob commented May 9, 2020

ajwerner commented Oct 21, 2020

rohany commented Feb 18, 2020 •

edited

Loading