-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SHOW CREATE: have an option to dump all tables' create statements #53488
Comments
Why is dump being deprecated? This is news to the SE team and it's a widely used feature in the field for migrations. |
What do you mean by migrations? And what part of dump is used? I don't want us to get caught up in being tied to a solution especially if we can make a better one. This issue is an attempt to give an alternative to the "see all me schemas" use cases such as auditing schemas. If you can point out a use case that this doesn't cover in lieu of dump we're all ears. |
One idea: could such a command be composed by doing something in combination with |
One of the use cases that we use this for as well is to have a point in time checkpoint to create a dev database. We currently do a |
The other alternative is to re-run the migrations since the beginning of time, which we do save in source control, but since cockroach takes a little which to run migrations this can take several minutes, while just loading the schema dump is only a few seconds. |
I second the request. I'm also used to dump the database structure in every Git commit. This allows to easily review any migrations that were introduced. Especially for early projects where migrations are plentiful. On a related note, the 'cockroach dump' command allows to dump to stdout. Even though that makes no sense for large online enterprise databases, it is really perfect for a (small) dev environment, especially when running in docker. I don't think the new BACKUP functionality provides for that. |
Cockroach Cloud currently uses buffalo's pop ORM, which currently relies on
|
The solution here will probably involve extracting out the Lines 518 to 521 in 92d9495
|
@vy-ton investigate what the SQL statement should be |
@RichardJCai will pick this up |
@CyborgMaster @Freeaqingme @kernfeld-cockroach |
The tables are the most important, but yes, a command that outputs the sql to fully recreate all the objects in the db schema would be ideal! |
For the buffalo use case, I think it's intended to just print out the schema of every table for the user's benefit; it doesn't need to print objects. I think the format should be pretty flexible, since it looks like it's more about helping the user understand the state of the schema, rather than producing an output that will be fed into some other tool. |
Proposed
|
Turns out
|
@RichardJCai A few questions
If yes to both questions above, I think |
Which ever way we go, I think it would be nice for the output to have the Foreign keys inside the create table statements. That does require the dump tool to topologically sort the create statements by foreign key dependencies to ensure they are created in the right order, but it has the advantage of making the output easier to visually inspect to understand the schema along with being faster to boostrap an empty database as the alter statements to add the foreign keys are a lot slower than creating them along with the table. |
That is a pretty cool idea! One weird edge case to this is that CockroachDB allows cycles of foreign key constraints, so I think it's not always possible to topologically sort tables by foreign key dependencies. We'd need some kind of way to handle this case. |
60154: sql: add crdb_internal.show_create_all_tables builtin r=rafiss a=RichardJCai sql: add crdb_internal.show_create_all_tables builtin Making this a PR first before continuing with #53488, we can alias this builtin with SHOW CREATE ALL TABLES. Example output: ``` query T SELECT crdb_internal.show_create_all_tables('d') ---- CREATE TABLE public.parent ( x INT8 NULL, y INT8 NULL, z INT8 NULL, UNIQUE INDEX parent_x_y_z_key (x ASC, y ASC, z ASC), UNIQUE INDEX parent_x_key (x ASC), FAMILY f1 (x, y, z, rowid) ); CREATE TABLE public.full_test ( x INT8 NULL, y INT8 NULL, z INT8 NULL, UNIQUE INDEX full_test_x_key (x ASC), FAMILY f1 (x, y, z, rowid) ); CREATE SEQUENCE public.s MINVALUE 1 MAXVALUE 9223372036854775807 INCREMENT 1 START 1; CREATE VIEW public.vx ("?column?") AS SELECT 1; ALTER TABLE public.full_test ADD CONSTRAINT fk_x_ref_parent FOREIGN KEY (x, y, z) REFERENCES public.parent(x, y, z) MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE; ALTER TABLE public.full_test ADD CONSTRAINT test_fk FOREIGN KEY (x) REFERENCES public.parent(x) ON DELETE CASCADE; -- Validate foreign key constraints. These can fail if there was unvalidated data during the SHOW CREATE ALL TABLES ALTER TABLE public.full_test VALIDATE CONSTRAINT fk_x_ref_parent; ALTER TABLE public.full_test VALIDATE CONSTRAINT test_fk; ``` Release note (sql change): crdb_internal.show_create_all_tables is a new builtin that takes in a database name (string) and returns a flat log of all the CREATE TABLE statements in the database followed by alter statements to add constraints. The output can be used to recreate a database. This builtin was added to replace old dump logic. 60469: opt: prefer sorting fewer columns r=RaduBerinde a=RaduBerinde Currently, if we have to sort results and project a new column, there is no cost difference between the two orders and we happen to prefer the sort on top. It is preferable to sort before adding new columns to avoid storing the extra value in memory or on disk. This change improves the sort costing by adding a cost proportional to the total number of values. Fixes #32952. Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
The current implementation has a bit too many shortcomings. We'll need to rework that. |
I'm going to comment here on the implementation of #60154, which is the root of the problem. Once that problem has been addressed, it's going to be easy to rework #60539 accordingly. So here's the shortcoming: the proposed built-in function returns just a single scalar value and composes its result with string concatenation.
The fundamental design mistake here is that the built-in should not have been a scalar. At the very least it should have been a set-returning function, with one row per CREATE or ALTER statement. So the behavior of this function should be to stream out one SQL statement at a time, without keeping more than 1 output statement in RAM at a time. Meanwhile, I appreciate that this is not trivial because we need to order the tables in dependency order. The ordering of the table is currently done by performing 1) a query to get all descriptors 2) another, separate subsequent query to extract the dependency links 3) an in-memory sort. This is too much work and again causes the in-RAM state to balloon unconstrained. The proper approach is to perform this sort using SQL, using a join between the descriptor table and itself, with a predicate (probably using a JSON operator) on the backward dependency. We have the (To ensure the query is fully streamable, it's pretty important to not generate all the create statements upfront, i.e. you can't rely on the existing Once you have a Also to get the streaming behavior right you'll also be depending on the new streaming interface for the internal executor that @yuzefovich has produced. That dependency, in turn, makes this work non back-portable. |
Noted, will work on fix asap. |
I think in the ideal case what this would do is find all of the non-cyclical dependencies and include the foreign keys in the create statements, and then in the case of cycles arbitrarily break the cycle with an after the fact alter statement. I recognize that of course this is a non-trival amount of work. |
The new syntax 21.1 will also keep the |
Also just want to add that |
|
There's |
With the deprecation of dump, on of the popular use cases was the ability to see all schemas for a database in one place to review schema. In lieu of that, we want to create an alternative solution for that use case.
One suggestion was a
SHOW CREATE
type command which would print the create statements for all database, etc. to address that particular use case.Jira issue: CRDB-13989
The text was updated successfully, but these errors were encountered: