-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: (temporary) Hang when dropping unique index created after ALTER PRIMARY KEY #45150
Comments
Hm, it looks like after > 1 minute, it did complete. Is that expected? |
Yes, this is expected. The current job framework is unable to spawn child jobs from a given job. To get around this, the primary key change process lets the async schema changer pick up and run the drop index jobs. Until the async schema changer can pick up these jobs, mutations queued up after will have to wait. The interval for these schema changers is 1 min +- some jitter, so this seems like it's working as expected. Lucy and I have been thinking about how to get around this blemish though. |
When @otan's NOTICE feature gets in, let's add a NOTICE that says the schema change will take a minute to kick off as a mitigation if we can't fix the issue completely. |
I think if we implement the same pattern as in #46929, to use |
#47624 implements what I described above. |
47624: sql: start index drop job for primary key changes immediately r=lucy-zhang a=lucy-zhang Previously, the job to drop old indexes after a primary key change was created at the end of the schema change and left for the job registry to adopt at some later time. This PR switches the job creation over to `CreateStartableJobWithTxn`, which allows for starting the job immediately after the transaction which creates the job is committed. This eliminates one source of waiting time before other schema changes, which are queued behind the mutation to drop the indexes, can run. Note, however, that successive schema changes don't start running immediately after the mutations ahead of them in the queue are all cleared; if they've ever hit a retry error due to not being first in line, then they will have to wait to be re-adopted by the job registry. This is why the existing primary key change tests still need to have the adopt interval set to a low value in order to finish quickly. Addresses #45150. Release note (performance improvement): The cleanup job which runs after a primary key change to remove old indexes, which blocks other schema changes from running, now starts immediately after the primary key swap is complete. This reduces the amount of waiting time before subsequent schema changes can run. 47748: cli/interactive_tests: deflake test_demo_partitioning r=rohany a=knz Fixes #47747 The test is meant to tolerate a failure in the licensing server at the beginning. This is done correctly, however the check does not work if the `spawn` statement itself fails. This patch corrects the flake by avoiding `spawn` and using a bourne shell to spawn the `cockroach` processes instead. Release note: None Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
#47624 improves the situation, but I think to really consider this problem solved, we should
|
48045: sql: enable indexing and ordering on arrays of orderable and indexable types r=rohany a=rohany Fixes #17154. Fixes #35707. This PR enables arrays to be ordered and indexed by introducing an ordered key encoding for arrays. Once this exists, the rest of the SQL infrastructure is ready to handle indexing and ordering on arrays. To encode an array of elements `ARRAY[a, b]`, we create the following encoding. Let `AM` = a marker byte for arrays and let `AT` be a terminator byte. `enc(ARRAY[a, b]) = [AM, enc(a), enc(b), AT]` The key is that the terminator is less than the element marker. This allows for the "prefix matching" style comparison that arrays support. Release note (sql change): This PR adds support for indexing and ordering of arrays of indexable and orderable inner types. 48608: schemachange: speed up slow schema changes r=spaskob a=spaskob Touches #45150. Fixes #47607. Touches #47790. Release note (performance improvement): Before this a simple schema change could take 30s+. The reason was that if the schema change is not first in line in the table mutation queue it would return a re-triable error and the jobs framework will re-adopt and run it later. The problem is that the job adoption loop is 30s. To repro run this for some time: ``` cockroach sql --insecure --watch 1s -e 'drop table if exists users cascade; create table users (id uuid not null, name varchar(255) not null, email varchar(255) not null, password varchar(255) not null, remember_token varchar(100) null, created_at timestamp(0) without time zone null, updated_at timestamp(0) without time zone null, deleted_at timestamp(0) without time zone null); alter table users add primary key (id); alter table users add constraint users_email_unique unique (email);' ``` Instead of returning on re-triable errors we retry with exponential backoff in the schema change code. This pattern of dealing with re-triable errors in client job code is encouraged vs relying on the registry because the latter leads to slowness and additionally to more complicated test fixtures that rely on hacking with the internals of the job registry, Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu> Co-authored-by: Spas Bojanov <pachob@gmail.com>
I just merged #48608 which may further improve the performance for this case. |
This all happens quite quickly now. I'm closing this. |
Run the following SQL script:
EXPECTED: This should switch the primary key to columns (y, z) and drop the unique index synthesized by the ALTER PRIMARY KEY code.
ACTUAL: The DROP INDEX statement hangs.
NOTE: This happens with and without the #45093 fix.
The text was updated successfully, but these errors were encountered: