sql: Implement SHOW CREATE for managed clusters #27801

jkosh44 · 2024-06-21T16:43:02Z

This commit implements SHOW CREATE for managed clusters. It also gives
us a blueprint for how to convert catalog objects back to SQL
statements if we don't explicitly save the SQL statement. The process
looks roughly like:

Convert the catalog object to a plan.
Convert the plan to a statement.
Pretty print the statement to a string.

This process is designed to exactly mirror the process of converting
SQL to a catalog object.

Works towards resolving #15435

Motivation

This PR adds a known-desirable feature.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:
- Add SHOW CREATE for managed clusters.

This commit implements SHOW CREATE for managed clusters. It also gives us a blueprint for how to convert catalog objects back to SQL statements if we don't explicitly save the SQL statement. The process looks roughly like: 1. Convert the catalog object to a plan. 2. Convert the plan to a statement. 3. Pretty print the statement to a string. This process is designed to exactly mirror the process of converting SQL to a catalog object. Works towards resolving MaterializeInc#15435

test/sqllogictest/cluster.slt

shepherdlybot · 2024-06-21T22:07:51Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test 🔍 Detected
(Required) Observability
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test

Risk Summary:

The pull request presents a high risk with a score of 80, indicating a significant chance of introducing a bug. This assessment is driven by predictors like the sum of bug reports of files affected and the change in executable lines of code. Historically, pull requests with these characteristics have been 109% more likely to cause a bug compared to the repository's baseline. Additionally, there are two files modified in this pull request that have seen a recent increase in bug fixes. While the repository's observed bug trend remains steady, the predicted trend for bugs is on the rise.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../memory/objects.rs	92
../src/parser.rs	95

benesch

Sweet! So glad to finally have this knocked out. LGTM modulo testing.

One tricky part will be to ensure that both directions of transformations (i.e. Statement -> Plan -> Object and Object -> Plan -> Statement) remain in sync going forward.

Worth getting @MaterializeInc/testing to weigh in here. The best answer in a situation like this is randomized automated testing. I wonder if we could rig up testdrive or SLT to automatically sniff out CREATE CLUSTER commands and run SHOW CREATE CLUSTER against them immediately after their creation and verify that the results match, or something like that. Would protect against new options getting added in the future that don't unplan correctly.

src/catalog/src/memory/objects.rs

test/sqllogictest/cluster.slt

benesch · 2024-06-23T03:00:08Z

LGTM from a SQL council perspective too.

morsapaes

Thanks for writing the docs, @jkosh44! +1 from me, as a SQL Council member.

nrainer-materialize · 2024-06-24T06:34:31Z

Worth getting @MaterializeInc/testing to weigh in here. The best answer in a situation like this is randomized automated testing. I wonder if we could rig up testdrive or SLT to automatically sniff out CREATE CLUSTER commands and run SHOW CREATE CLUSTER against them immediately after their creation and verify that the results match, or something like that. Would protect against new options getting added in the future that don't unplan correctly.

@def-, what do you think?

def- · 2024-06-24T07:55:43Z

I'll work on it.

maddyblue · 2024-06-24T21:38:39Z

We should teach CREATE/ALTER CLUSTER sequencing to do a round trip through SHOW always and panic (dev) or return an error to the user and abort the catalog txn (prod) if it fails. Then we don't have to modify testing at all and we have certainty that we've never committed catalog state that we can't print correctly.

benesch · 2024-06-24T21:43:38Z

Seems reasonable. Easy to do for CREATE CLUSTER because the statement should roundtrip more or less exactly (gotta be careful with option order). Hard to do for ALTER CLUSTER, I think, because you don't have a reference for what the statement should look like.

maddyblue · 2024-06-24T21:58:33Z

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

benesch · 2024-06-24T22:07:49Z

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

def- · 2024-06-25T10:52:49Z

We should teach CREATE/ALTER CLUSTER sequencing to do a round trip through SHOW always and panic (dev) or return an error to the user and abort the catalog txn (prod) if it fails. Then we don't have to modify testing at all and we have certainty that we've never committed catalog state that we can't print correctly.

Could we generalize that for every query we serialize and verify that it roundtrips successfully? Would save us a lot of effort on manually thinking of test cases or hoping that we have all the relevant grammar for randomized testing. And most importantly this shows the user a nice error immediately when running the query instead of having problems the next time envd happens to restart.

jkosh44 · 2024-06-25T14:44:49Z

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

…w-create-cluster

jkosh44 · 2024-06-25T15:45:08Z

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

I did this in planning because it was much more straight forward.

jkosh44 · 2024-06-25T16:08:37Z

src/sql/src/plan/statement/ddl.rs

+    // Roundtrip through unplan and make sure that we end up with the same plan.
+    if let CreateClusterVariant::Managed(_) = &plan.variant {
+        let stmt = unplan_create_cluster(scx, plan.clone())
+            .map_err(|e| PlanError::Replan(e.to_string()))?;
+        let create_sql = stmt.to_ast_string_stable();
+        let stmt = parse::parse(&create_sql)
+            .map_err(|e| PlanError::Replan(e.to_string()))?
+            .into_element()
+            .ast;
+        let (stmt, _resolved_ids) =
+            names::resolve(scx.catalog, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
+        let stmt = match stmt {
+            Statement::CreateCluster(stmt) => stmt,
+            stmt => {
+                return Err(PlanError::Replan(format!(
+                "replan does not match: plan={plan:?}, create_sql={create_sql:?}, stmt={stmt:?}"
+            )))
+            }
+        };
+        let replan =
+            plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
+        if plan != replan {
+            return Err(PlanError::Replan(format!(
+                "replan does not match: plan={plan:?}, replan={replan:?}"
+            )));
+        }
+    }


Once we start to have more unplans, this could be generalized to look something like this:

fn plan(stmt: Statement<Aug>) -> Result<Plan, PlanError> { let plan = plan_inner(stmt)?; if let Some(stmt) = unplan(plan.clone()) { let create_sql = stmt.to_ast_string_stable(); let stmt = parse::parse(&create_sql) .map_err(|e| PlanError::Replan(e.to_string()))? .into_element() .ast; let (stmt, _resolved_ids) = names::resolve(scx.catalog, stmt).map_err(|e| PlanError::Replan(e.to_string()))?; let replan = plan_inner(stmt).map_err(|e| PlanError::Replan(e.to_string()))?; if plan != replan { return Err(PlanError::Replan(format!( "replan does not match: plan={plan:?}, replan={replan:?}" ))); } } Ok(plan) } fn plan_inner(stmt: Statement<Aud>) -> Result<Plan, PlanError> { ... } fn unplan(plan: Plan) -> Result<Option<Statement<Aug>>, PlanError> { match plan { Plan::CreateClusterPlan(plan) => Ok(Some(unplan_create_cluster(plan)?)), ... // These plans cannot be unplanned. _ => Ok(None), } }

benesch · 2024-06-25T19:17:04Z

I did this in planning because it was much more straight forward.

Nice! I took a quick look and it looks solid, but deferring (re-)approval to someone with more bandwidth.

jkosh44 · 2024-06-25T22:01:10Z

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

I did this in planning because it was much more straight forward.

This uncovered an issue. You cannot specify the DISK option in SQL without a feature flag. However, we determine whether to use a disk based on multiple other variables. So the SQL doesn't rountrip because the DISK option is rejected by the planner when the feature flag is off. We can't just omit DISK in the SQL because then it might derive a different value than we did the first time.

maddyblue · 2024-06-25T22:48:11Z

Bootstrap planning uses the "turn on all the flags mode". Should we use that here too?

maddyblue · 2024-06-26T00:07:15Z

src/sql/src/normalize.rs

+                            vec![value]
+                        );
+                        for value in values {
+                            if let Some(value) = value.try_into_value(catalog) {


I don't quite understand what's going on here but: why is it ok to ignore values that fail try_into_value and not error or something?

I don't blame you, this macro was not easy for me to figure out and update. I'll try and explain it here and in the process maybe I'll come up with a good comment to add that also explains it.

Before my PR this macro allowed you to convert a list of options into some struct that had a field for each option. So for example when processing CREATE CLUSTER options, you would start with a Vec<ClusterOption<Aug>>.

/// An option in a `CREATE CLUSTER` statement. pub struct ClusterOption<T: AstInfo> { pub name: ClusterOptionName, pub value: Option<WithOptionValue<T>>, }

You would convert that into a struct that looks like the following (the struct is generated by the macro so there's no where in the code to look for this exact struct):

struct ClusterOptionExtracted { availability_zones: Option<Vec<String>>, disk: Option<bool>, managed: Option<bool>, ... }

The Vec only contains options that were explicitly specified. So a field in ClusterOptionExtracted would be Some(_) if the option was specified and None if it was not.

Now this commit adds the reverse process, it takes a ClusterOptionExtracted and converts it back into a Vec<ClusterOption<Aug>>. So if the field is Some(_) then we need to generate a ClusterOption<Aug> and add it to the Vec. However, if the field is None, then there was no option specified and we have nothing to add back to the list. That's why we ignore the None values here, they are not errors they just don't exist as specified options.

maddyblue · 2024-06-26T00:17:46Z

src/sql/src/plan/statement/ddl.rs

+        };
+        let replan =
+            plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
+        if plan != replan {


This should also be in ALTER CLUSTER.

What comparison should we do for ALTER CLUSTER? We don't have a CreateClusterPlan to compare to.

For that I think we'd have to move the check into sequencing and do the full roundtrip of Cluster -> CreateClusterPlan -> CreateClusterStatement -> String -> CreateClusterStatement -> CreateClusterPlan -> Cluster.

Hmm yeah. ALTER has an updated catalog so it could produce a sql string from that to plan. But we need to compare it to the Cluster object in the catalog, which is the sequenced plan, not the plan? I'm ok merging as is, but I do very much think we should complete the thought because ALTER seems like a likely place where we make a small mistake and either corrupt or have silent mutation to some object.

It sounds like this commit doesn't add any new hazards to ALTER CLUSTER, but this feature could be used as a tool to protect against existing or future hazards in ALTER CLUSTER? If I'm understanding that correctly, then I'm inclined to merge as is and let someone else add that to ALTER CLUSTER as a follow up.

jkosh44 · 2024-06-26T13:19:57Z

Bootstrap planning uses the "turn on all the flags mode". Should we use that here too?

I don't think that would quite work. This would cause the re-planning to succeed, but generate invalid SQL. For example we'd end up with something like the following:

CREATE CLUSTER c SIZE '250cc';

SHOW CREATE CLUSTER c;
----
"CREATE CLUSTER c (SIZE = '250cc', DISK = 'true', ...)";

It is true that cluster c uses a disk, but if someone tried to type that in, it would error because you can't specify the DISK option. We can't just omit the DISK option either because then we may return inaccurate SQL. For example, consider the following:

ALTER SYSTEM SET disk_cluster_replicas_default = TRUE;

CREATE CLUSTER c SIZE 'small';

ALTER SYSTEM SET disk_cluster_replicas_default = FALSE;

SHOW CREATE CLUSTER c;
----
"CREATE CLUSTER c (SIZE = 'small')";

Here, we've omitted the DISK option, which implies that there is no DISK, which is not true.

I did come up with an idea last night. We update the feature flag requirement so that it only fires when the user specifies the DISK option AND the value doesn't match what we were going to pick without the user option. It doesn't make a lot of sense to reject DISK = TRUE if we were going to use a disk anyway.

jkosh44 · 2024-06-26T14:41:57Z

test/sqllogictest/cluster.slt

I should add a test that unmanaged clusters return an error.

maddyblue · 2024-06-26T19:05:07Z

src/sql/src/plan/statement/ddl.rs

+        };
+        let replan =
+            plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
+        if plan != replan {


Hmm yeah. ALTER has an updated catalog so it could produce a sql string from that to plan. But we need to compare it to the Cluster object in the catalog, which is the sequenced plan, not the plan? I'm ok merging as is, but I do very much think we should complete the thought because ALTER seems like a likely place where we make a small mistake and either corrupt or have silent mutation to some object.

jkosh44 force-pushed the show-create-cluster branch 4 times, most recently from 7b8e2f8 to 7ed6d80 Compare June 21, 2024 20:52

jkosh44 changed the title ~~[WIP] SHOW CREATE CLUSTER~~ sql: Implement SHOW CREATE for managed clusters Jun 21, 2024

jkosh44 force-pushed the show-create-cluster branch from 7ed6d80 to f578904 Compare June 21, 2024 21:08

jkosh44 force-pushed the show-create-cluster branch from f578904 to deeb932 Compare June 21, 2024 21:35

jkosh44 commented Jun 21, 2024

View reviewed changes

test/sqllogictest/cluster.slt Show resolved Hide resolved

jkosh44 requested a review from ggevay June 21, 2024 21:36

jkosh44 marked this pull request as ready for review June 21, 2024 22:06

jkosh44 requested review from a team and morsapaes as code owners June 21, 2024 22:06

jkosh44 requested a review from maddyblue June 21, 2024 22:06

benesch mentioned this pull request Jun 23, 2024

Add SHOW CREATE CLUSTER and SHOW CREATE CLUSTER REPLICA #15435

Open

benesch approved these changes Jun 23, 2024

View reviewed changes

src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved

src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved

test/sqllogictest/cluster.slt Show resolved Hide resolved

morsapaes approved these changes Jun 23, 2024

View reviewed changes

jkosh44 added 2 commits June 24, 2024 09:36

Update method name

ed53975

Add alter test

f886e36

Merge branch 'main' of github.com:MaterializeInc/materialize into sho…

a7b38a3

…w-create-cluster

Roundtrip through unplan

6b80a2f

jkosh44 commented Jun 25, 2024

View reviewed changes

maddyblue approved these changes Jun 26, 2024

View reviewed changes

jkosh44 commented Jun 26, 2024

View reviewed changes

test/sqllogictest/cluster.slt

Copy link

Contributor Author

jkosh44 Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a test that unmanaged clusters return an error.

jkosh44 added 3 commits June 26, 2024 11:46

Fixup

56bcae1

Fixup

2595967

Add test

d8d0727

maddyblue approved these changes Jun 26, 2024

View reviewed changes

jkosh44 merged commit a458244 into MaterializeInc:main Jun 26, 2024
192 of 196 checks passed

jkosh44 deleted the show-create-cluster branch June 26, 2024 22:03

materialize-bot mentioned this pull request Jun 27, 2024

release: v0.106.0 required reviews #27930

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: Implement SHOW CREATE for managed clusters #27801

sql: Implement SHOW CREATE for managed clusters #27801

jkosh44 commented Jun 21, 2024 •

edited

Loading

shepherdlybot bot commented Jun 21, 2024 •

edited

Loading

benesch left a comment

benesch commented Jun 23, 2024

morsapaes left a comment

nrainer-materialize commented Jun 24, 2024

def- commented Jun 24, 2024

maddyblue commented Jun 24, 2024

benesch commented Jun 24, 2024

maddyblue commented Jun 24, 2024

benesch commented Jun 24, 2024

def- commented Jun 25, 2024

jkosh44 commented Jun 25, 2024 •

edited

Loading

jkosh44 commented Jun 25, 2024

jkosh44 Jun 25, 2024

benesch commented Jun 25, 2024 •

edited

Loading

jkosh44 commented Jun 25, 2024

maddyblue commented Jun 25, 2024

maddyblue Jun 26, 2024

jkosh44 Jun 26, 2024

maddyblue Jun 26, 2024

jkosh44 Jun 26, 2024

jkosh44 Jun 26, 2024

maddyblue Jun 26, 2024

jkosh44 Jun 26, 2024

jkosh44 commented Jun 26, 2024

jkosh44 Jun 26, 2024

maddyblue Jun 26, 2024

sql: Implement SHOW CREATE for managed clusters #27801

sql: Implement SHOW CREATE for managed clusters #27801

Conversation

jkosh44 commented Jun 21, 2024 • edited Loading

Motivation

Checklist

shepherdlybot bot commented Jun 21, 2024 • edited Loading

Mitigations

benesch left a comment

Choose a reason for hiding this comment

benesch commented Jun 23, 2024

morsapaes left a comment

Choose a reason for hiding this comment

nrainer-materialize commented Jun 24, 2024

def- commented Jun 24, 2024

maddyblue commented Jun 24, 2024

benesch commented Jun 24, 2024

maddyblue commented Jun 24, 2024

benesch commented Jun 24, 2024

def- commented Jun 25, 2024

jkosh44 commented Jun 25, 2024 • edited Loading

jkosh44 commented Jun 25, 2024

Choose a reason for hiding this comment

benesch commented Jun 25, 2024 • edited Loading

jkosh44 commented Jun 25, 2024

maddyblue commented Jun 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 commented Jun 26, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jkosh44 commented Jun 21, 2024 •

edited

Loading

shepherdlybot bot commented Jun 21, 2024 •

edited

Loading

jkosh44 commented Jun 25, 2024 •

edited

Loading

benesch commented Jun 25, 2024 •

edited

Loading