SHOW CREATE: have an option to dump all tables' create statements #53488

mwang1026 · 2020-08-26T18:27:47Z

With the deprecation of dump, on of the popular use cases was the ability to see all schemas for a database in one place to review schema. In lieu of that, we want to create an alternative solution for that use case.

One suggestion was a SHOW CREATE type command which would print the create statements for all database, etc. to address that particular use case.

Jira issue: CRDB-13989

The text was updated successfully, but these errors were encountered:

keith-mcclellan · 2020-10-12T16:47:57Z

Why is dump being deprecated? This is news to the SE team and it's a widely used feature in the field for migrations.

mwang1026 · 2020-10-12T16:54:59Z

What do you mean by migrations? And what part of dump is used?

I don't want us to get caught up in being tied to a solution especially if we can make a better one. This issue is an attempt to give an alternative to the "see all me schemas" use cases such as auditing schemas.

If you can point out a use case that this doesn't cover in lieu of dump we're all ears.

rafiss · 2020-11-16T20:39:42Z

One idea: could such a command be composed by doing something in combination with SHOW TABLES? Even if so, we could still make an alias for that.

CyborgMaster · 2020-11-16T21:57:28Z

One of the use cases that we use this for as well is to have a point in time checkpoint to create a dev database. We currently do a dump --dump-mode=schema on every commit, and then where a developer needs to spin up a new local db (due to a new machine or just blowing away his local db because he borked it), we just load that dump. It would be great if the format used by any new schema dump tool was loadable on an empty db to recreate the schema.

CyborgMaster · 2020-11-16T21:59:11Z

The other alternative is to re-run the migrations since the beginning of time, which we do save in source control, but since cockroach takes a little which to run migrations this can take several minutes, while just loading the schema dump is only a few seconds.

Freeaqingme · 2020-11-28T23:57:04Z

I second the request. I'm also used to dump the database structure in every Git commit. This allows to easily review any migrations that were introduced. Especially for early projects where migrations are plentiful.

On a related note, the 'cockroach dump' command allows to dump to stdout. Even though that makes no sense for large online enterprise databases, it is really perfect for a (small) dev environment, especially when running in docker. I don't think the new BACKUP functionality provides for that.

kernfeld-cockroach · 2020-12-10T18:19:03Z

Cockroach Cloud currently uses buffalo's pop ORM, which currently relies on cockroach dump. In order for Cockroach Cloud to upgrade to CRDB v21.x, we'd need to arrange an update to pop so that it no longer uses cockroach dump. The particular error that I saw was when running migrations, although it's possible that cockroach dump is also called in other code paths.

❯ soda migrate -e test-base
v5.0.11
[POP] 2020/12/10 11:58:58 info - Migrations already up to date, nothing to apply
[POP] 2020/12/10 11:58:58 info - 0.0311 seconds
ERROR: unknown command "dump" for "cockroach"
Did you mean this?
	demo
Failed running "cockroach"
[POP] 2020/12/10 11:58:58 warn - Migrator: unable to dump schema: exit status 1

rafiss · 2020-12-10T19:32:39Z

The solution here will probably involve extracting out the getDumpMetadata function that's in dump right now and moving it somewhere else.

cockroach/pkg/cli/dump.go

Lines 518 to 521 in 92d9495

    
           // getDumpMetadata retrieves the table information for the specified table(s). 
        
           // It also retrieves the cluster timestamp at which the metadata was 
        
           // retrieved. 
        
           func getDumpMetadata(

vy-ton · 2021-01-05T19:45:47Z

@vy-ton investigate what the SQL statement should be

rafiss · 2021-01-12T20:03:17Z

@RichardJCai will pick this up

vy-ton · 2021-01-14T02:00:29Z

@CyborgMaster @Freeaqingme @kernfeld-cockroach cockroach dump --dump-mode=schema only outputs tables. To recreate a database structure, would you also want a command that outputs all database objects (sequences, tables, types, user-defined schemas, and views)?

CyborgMaster · 2021-01-14T04:03:54Z

The tables are the most important, but yes, a command that outputs the sql to fully recreate all the objects in the db schema would be ideal!

kernfeld-cockroach · 2021-01-14T14:12:17Z

For the buffalo use case, I think it's intended to just print out the schema of every table for the user's benefit; it doesn't need to print objects. I think the format should be pretty flexible, since it looks like it's more about helping the user understand the state of the schema, rather than producing an output that will be fed into some other tool.

vy-ton · 2021-01-19T19:26:23Z

Proposed SHOW command

SHOW CREATE TABLES
SHOW CREATE TABLES <table> <table>

RichardJCai · 2021-01-21T17:11:59Z

Proposed SHOW command

SHOW CREATE TABLES
SHOW CREATE TABLES <table> <table>

Turns out SHOW CREATE TABLES is ambiguous grammar (TABLES is a valid table name so it would conflict with the existing SHOW CREATE grammar.

SHOW CREATE ALL TABLES would for example would be a valid option.

vy-ton · 2021-01-21T18:18:40Z

@RichardJCai A few questions

For the use case to replace cockroach dump <database> <table> <table...> --dump-mode=schema, could we make SHOW CREATE output the flat structure when multiple tables are supplied? For example, SHOW CREATE orders packages
In the future, would SHOW CREATE ALL be valid grammar. This would support outputting all database objects not just tables.

If yes to both questions above, I think SHOW CREATE ALL TABLES could work.

CyborgMaster · 2021-01-29T20:22:38Z

Which ever way we go, I think it would be nice for the output to have the Foreign keys inside the create table statements. That does require the dump tool to topologically sort the create statements by foreign key dependencies to ensure they are created in the right order, but it has the advantage of making the output easier to visually inspect to understand the schema along with being faster to boostrap an empty database as the alter statements to add the foreign keys are a lot slower than creating them along with the table.

kernfeld-cockroach · 2021-01-29T20:52:33Z

topologically sort the create statements by foreign key dependencies

That is a pretty cool idea! One weird edge case to this is that CockroachDB allows cycles of foreign key constraints, so I think it's not always possible to topologically sort tables by foreign key dependencies. We'd need some kind of way to handle this case.

60154: sql: add crdb_internal.show_create_all_tables builtin r=rafiss a=RichardJCai sql: add crdb_internal.show_create_all_tables builtin Making this a PR first before continuing with #53488, we can alias this builtin with SHOW CREATE ALL TABLES. Example output: ``` query T SELECT crdb_internal.show_create_all_tables('d') ---- CREATE TABLE public.parent ( x INT8 NULL, y INT8 NULL, z INT8 NULL, UNIQUE INDEX parent_x_y_z_key (x ASC, y ASC, z ASC), UNIQUE INDEX parent_x_key (x ASC), FAMILY f1 (x, y, z, rowid) ); CREATE TABLE public.full_test ( x INT8 NULL, y INT8 NULL, z INT8 NULL, UNIQUE INDEX full_test_x_key (x ASC), FAMILY f1 (x, y, z, rowid) ); CREATE SEQUENCE public.s MINVALUE 1 MAXVALUE 9223372036854775807 INCREMENT 1 START 1; CREATE VIEW public.vx ("?column?") AS SELECT 1; ALTER TABLE public.full_test ADD CONSTRAINT fk_x_ref_parent FOREIGN KEY (x, y, z) REFERENCES public.parent(x, y, z) MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE; ALTER TABLE public.full_test ADD CONSTRAINT test_fk FOREIGN KEY (x) REFERENCES public.parent(x) ON DELETE CASCADE; -- Validate foreign key constraints. These can fail if there was unvalidated data during the SHOW CREATE ALL TABLES ALTER TABLE public.full_test VALIDATE CONSTRAINT fk_x_ref_parent; ALTER TABLE public.full_test VALIDATE CONSTRAINT test_fk; ``` Release note (sql change): crdb_internal.show_create_all_tables is a new builtin that takes in a database name (string) and returns a flat log of all the CREATE TABLE statements in the database followed by alter statements to add constraints. The output can be used to recreate a database. This builtin was added to replace old dump logic. 60469: opt: prefer sorting fewer columns r=RaduBerinde a=RaduBerinde Currently, if we have to sort results and project a new column, there is no cost difference between the two orders and we happen to prefer the sort on top. It is preferable to sort before adding new columns to avoid storing the extra value in memory or on disk. This change improves the sort costing by adding a cost proportional to the total number of values. Fixes #32952. Co-authored-by: richardjcai <caioftherichard@gmail.com> Co-authored-by: Radu Berinde <radu@cockroachlabs.com>

knz · 2021-02-17T11:59:00Z

The current implementation has a bit too many shortcomings. We'll need to rework that.

knz · 2021-02-17T12:11:41Z

I'm going to comment here on the implementation of #60154, which is the root of the problem. Once that problem has been addressed, it's going to be easy to rework #60539 accordingly.

So here's the shortcoming: the proposed built-in function returns just a single scalar value and composes its result with string concatenation.

there is no memory accounting
if there are many tables, the result will be too large and crash certain drivers (the pg protocol was not designed for payloads larger than a few hundred kilobytes)
the result is accumulated in RAM throughout the operation, which in turn will cause the query to balloon in proportion to the number of tables

The fundamental design mistake here is that the built-in should not have been a scalar.

At the very least it should have been a set-returning function, with one row per CREATE or ALTER statement. So the behavior of this function should be to stream out one SQL statement at a time, without keeping more than 1 output statement in RAM at a time.

Meanwhile, I appreciate that this is not trivial because we need to order the tables in dependency order.

The ordering of the table is currently done by performing 1) a query to get all descriptors 2) another, separate subsequent query to extract the dependency links 3) an in-memory sort.

This is too much work and again causes the in-RAM state to balloon unconstrained. The proper approach is to perform this sort using SQL, using a join between the descriptor table and itself, with a predicate (probably using a JSON operator) on the backward dependency. We have the pb_to_json built-ins for this purpose nowadays. The reason why this works wrt memory usage is that a SQL-level sort properly uses disk spilling.

(To ensure the query is fully streamable, it's pretty important to not generate all the create statements upfront, i.e. you can't rely on the existing crdb_internal.create_statements vtable. Instead you'd need to define two scalar built-ins, one show_create_statement(descid) and show_alter_statements(descid) that produce the create and alter statement separately for just 1 table. Then you'd apply these built-in functions on the ordered result of the self-join.)

Once you have a crdb_internal.show_create_all_tables() SRF, you can define SHOW CREATE ALL TABLES as SELECT create_statement FROM crdb_internal.show_create_all_tables()

Also to get the streaming behavior right you'll also be depending on the new streaming interface for the internal executor that @yuzefovich has produced. That dependency, in turn, makes this work non back-portable.

RichardJCai · 2021-02-17T15:47:09Z

I'm going to comment here on the implementation of #60154, which is the root of the problem. Once that problem has been addressed, it's going to be easy to rework #60539 accordingly.

So here's the shortcoming: the proposed built-in function returns just a single scalar value and composes its result with string concatenation.

there is no memory accounting

if there are many tables, the result will be too large and crash certain drivers (the pg protocol was not designed for payloads larger than a few hundred kilobytes)

the result is accumulated in RAM throughout the operation, which in turn will cause the query to balloon in proportion to the number of tables

The fundamental design mistake here is that the built-in should not have been a scalar.

At the very least it should have been a set-returning function, with one row per CREATE or ALTER statement. So the behavior of this function should be to stream out one SQL statement at a time, without keeping more than 1 output statement in RAM at a time.

Meanwhile, I appreciate that this is not trivial because we need to order the tables in dependency order.

The ordering of the table is currently done by performing 1) a query to get all descriptors 2) another, separate subsequent query to extract the dependency links 3) an in-memory sort.

This is too much work and again causes the in-RAM state to balloon unconstrained. The proper approach is to perform this sort using SQL, using a join between the descriptor table and itself, with a predicate (probably using a JSON operator) on the backward dependency. We have the pb_to_json built-ins for this purpose nowadays. The reason why this works wrt memory usage is that a SQL-level sort properly uses disk spilling.

(To ensure the query is fully streamable, it's pretty important to not generate all the create statements upfront, i.e. you can't rely on the existing crdb_internal.create_statements vtable. Instead you'd need to define two scalar built-ins, one show_create_statement(descid) and show_alter_statements(descid) that produce the create and alter statement separately for just 1 table. Then you'd apply these built-in functions on the ordered result of the self-join.)

Once you have a crdb_internal.show_create_all_tables() SRF, you can define SHOW CREATE ALL TABLES as SELECT create_statement FROM crdb_internal.show_create_all_tables()

Also to get the streaming behavior right you'll also be depending on the new streaming interface for the internal executor that @yuzefovich has produced. That dependency, in turn, makes this work non back-portable.

Noted, will work on fix asap.

CyborgMaster · 2021-02-19T23:29:30Z

That is a pretty cool idea! One weird edge case to this is that CockroachDB allows cycles of foreign key constraints, so I think it's not always possible to topologically sort tables by foreign key dependencies. We'd need some kind of way to handle this case.

I think in the ideal case what this would do is find all of the non-cyclical dependencies and include the foreign keys in the create statements, and then in the case of cycles arbitrarily break the cycle with an after the fact alter statement. I recognize that of course this is a non-trival amount of work.

rafiss · 2021-03-09T23:07:49Z

The new syntax SHOW CREATE ALL TABLES will be available in v21.1. It will output one row for each statement to set up the schema of a database.

21.1 will also keep the cockroach dump CLI command with support ONLY for --dump-mode=schema. This gives tools that rely on cockroach dump --dump-mode=schema until the 21.2 release (scheduled for Fall 2021) to switch away from using dump.

RichardJCai · 2021-03-10T00:03:58Z

Also just want to add that SHOW CREATE ALL TABLES only shows tables/views/sequences. Please leave any notes / open a new issue if there is a need to show types and schemas as well.

bikbah · 2022-03-19T14:30:36Z

SHOW CREATE ALL TABLES gives DDL for creating tables, but misses user defined types (enums).
is there any option to get DDL for creating all user defined types too?

edigaryev · 2023-12-24T10:42:33Z

is there any option to get DDL for creating all user defined types too?

There's SHOW CREATE ALL TYPES.

mwang1026 added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Aug 26, 2020

rafiss mentioned this issue Nov 13, 2020

clone test databases without cockroach dump cockroachdb/django-cockroachdb#189

Closed

adityamaru mentioned this issue Nov 16, 2020

cli: deprecate cockroach dump #54040

Closed

kenliu added the T-disaster-recovery label Dec 5, 2020

rafiss assigned RichardJCai Jan 12, 2021

RichardJCai mentioned this issue Jan 25, 2021

sql: add SHOW CREATE TABLES / SHOW CREATE ALL TABLES commands #59412

Closed

This was referenced Feb 8, 2021

sql: add crdb_internal.show_create_all_tables builtin #60015

Closed

sql: add crdb_internal.show_create_all_tables builtin #60154

Merged

RichardJCai mentioned this issue Feb 12, 2021

sql: add SHOW CREATE ALL TABLES command #60539

Merged

craig bot closed this as completed in aa8f949 Feb 16, 2021

knz reopened this Feb 17, 2021

rafiss mentioned this issue Mar 9, 2021

sql/cli: make cockroach dump --dump-mode=schema delegate to SHOW CREATE ALL TABLES #61756

Closed

rafiss closed this as completed Mar 9, 2021

rafiss mentioned this issue Mar 10, 2021

dump command dramatically slower than insert #51116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHOW CREATE: have an option to dump all tables' create statements #53488

SHOW CREATE: have an option to dump all tables' create statements #53488

mwang1026 commented Aug 26, 2020 •

edited by cockroach-jira-scripts

Loading

keith-mcclellan commented Oct 12, 2020

mwang1026 commented Oct 12, 2020

rafiss commented Nov 16, 2020

CyborgMaster commented Nov 16, 2020 •

edited

Loading

CyborgMaster commented Nov 16, 2020

Freeaqingme commented Nov 28, 2020 •

edited

Loading

kernfeld-cockroach commented Dec 10, 2020

rafiss commented Dec 10, 2020

vy-ton commented Jan 5, 2021

rafiss commented Jan 12, 2021

vy-ton commented Jan 14, 2021

CyborgMaster commented Jan 14, 2021

kernfeld-cockroach commented Jan 14, 2021

vy-ton commented Jan 19, 2021

RichardJCai commented Jan 21, 2021

vy-ton commented Jan 21, 2021

CyborgMaster commented Jan 29, 2021

kernfeld-cockroach commented Jan 29, 2021

knz commented Feb 17, 2021

knz commented Feb 17, 2021

RichardJCai commented Feb 17, 2021

CyborgMaster commented Feb 19, 2021

rafiss commented Mar 9, 2021

RichardJCai commented Mar 10, 2021

bikbah commented Mar 19, 2022

edigaryev commented Dec 24, 2023

SHOW CREATE: have an option to dump all tables' create statements #53488

SHOW CREATE: have an option to dump all tables' create statements #53488

Comments

mwang1026 commented Aug 26, 2020 • edited by cockroach-jira-scripts Loading

keith-mcclellan commented Oct 12, 2020

mwang1026 commented Oct 12, 2020

rafiss commented Nov 16, 2020

CyborgMaster commented Nov 16, 2020 • edited Loading

CyborgMaster commented Nov 16, 2020

Freeaqingme commented Nov 28, 2020 • edited Loading

kernfeld-cockroach commented Dec 10, 2020

rafiss commented Dec 10, 2020

vy-ton commented Jan 5, 2021

rafiss commented Jan 12, 2021

vy-ton commented Jan 14, 2021

CyborgMaster commented Jan 14, 2021

kernfeld-cockroach commented Jan 14, 2021

vy-ton commented Jan 19, 2021

RichardJCai commented Jan 21, 2021

vy-ton commented Jan 21, 2021

CyborgMaster commented Jan 29, 2021

kernfeld-cockroach commented Jan 29, 2021

knz commented Feb 17, 2021

knz commented Feb 17, 2021

RichardJCai commented Feb 17, 2021

CyborgMaster commented Feb 19, 2021

rafiss commented Mar 9, 2021

RichardJCai commented Mar 10, 2021

bikbah commented Mar 19, 2022

edigaryev commented Dec 24, 2023

mwang1026 commented Aug 26, 2020 •

edited by cockroach-jira-scripts

Loading

CyborgMaster commented Nov 16, 2020 •

edited

Loading

Freeaqingme commented Nov 28, 2020 •

edited

Loading