Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Implement SHOW CREATE for managed clusters #27801

Merged
merged 8 commits into from
Jun 26, 2024

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Jun 21, 2024

This commit implements SHOW CREATE for managed clusters. It also gives
us a blueprint for how to convert catalog objects back to SQL
statements if we don't explicitly save the SQL statement. The process
looks roughly like:

  1. Convert the catalog object to a plan.
  2. Convert the plan to a statement.
  3. Pretty print the statement to a string.

This process is designed to exactly mirror the process of converting
SQL to a catalog object.

Works towards resolving #15435

Motivation

This PR adds a known-desirable feature.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • This PR includes the following user-facing behavior changes:
    • Add SHOW CREATE for managed clusters.

@jkosh44 jkosh44 force-pushed the show-create-cluster branch 4 times, most recently from 7b8e2f8 to 7ed6d80 Compare June 21, 2024 20:52
@jkosh44 jkosh44 changed the title [WIP] SHOW CREATE CLUSTER sql: Implement SHOW CREATE for managed clusters Jun 21, 2024
This commit implements SHOW CREATE for managed clusters. It also gives
us a blueprint for how to convert catalog objects back to SQL
statements if we don't explicitly save the SQL statement. The process
looks roughly like:

1. Convert the catalog object to a plan.
2. Convert the plan to a statement.
3. Pretty print the statement to a string.

This process is designed to exactly mirror the process of converting
SQL to a catalog object.

Works towards resolving MaterializeInc#15435
@jkosh44 jkosh44 requested a review from ggevay June 21, 2024 21:36
@jkosh44 jkosh44 marked this pull request as ready for review June 21, 2024 22:06
@jkosh44 jkosh44 requested review from a team and morsapaes as code owners June 21, 2024 22:06
@jkosh44 jkosh44 requested a review from maddyblue June 21, 2024 22:06
Copy link

shepherdlybot bot commented Jun 21, 2024

Risk Score:80 / 100 Bug Hotspots:2 Resilience Coverage:50%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test 🔍 Detected
  • (Required) Observability
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test
Risk Summary:

The pull request presents a high risk with a score of 80, indicating a significant chance of introducing a bug. This assessment is driven by predictors like the sum of bug reports of files affected and the change in executable lines of code. Historically, pull requests with these characteristics have been 109% more likely to cause a bug compared to the repository's baseline. Additionally, there are two files modified in this pull request that have seen a recent increase in bug fixes. While the repository's observed bug trend remains steady, the predicted trend for bugs is on the rise.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../memory/objects.rs 92
../src/parser.rs 95

Copy link
Member

@benesch benesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! So glad to finally have this knocked out. LGTM modulo testing.

One tricky part will be to ensure that both directions of transformations (i.e. Statement -> Plan -> Object and Object -> Plan -> Statement) remain in sync going forward.

Worth getting @MaterializeInc/testing to weigh in here. The best answer in a situation like this is randomized automated testing. I wonder if we could rig up testdrive or SLT to automatically sniff out CREATE CLUSTER commands and run SHOW CREATE CLUSTER against them immediately after their creation and verify that the results match, or something like that. Would protect against new options getting added in the future that don't unplan correctly.

src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved
src/catalog/src/memory/objects.rs Outdated Show resolved Hide resolved
test/sqllogictest/cluster.slt Show resolved Hide resolved
@benesch
Copy link
Member

benesch commented Jun 23, 2024

LGTM from a SQL council perspective too.

Copy link
Contributor

@morsapaes morsapaes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing the docs, @jkosh44! +1 from me, as a SQL Council member.

@nrainer-materialize
Copy link
Contributor

Worth getting @MaterializeInc/testing to weigh in here. The best answer in a situation like this is randomized automated testing. I wonder if we could rig up testdrive or SLT to automatically sniff out CREATE CLUSTER commands and run SHOW CREATE CLUSTER against them immediately after their creation and verify that the results match, or something like that. Would protect against new options getting added in the future that don't unplan correctly.

@def-, what do you think?

@def-
Copy link
Contributor

def- commented Jun 24, 2024

I'll work on it.

@maddyblue
Copy link
Contributor

We should teach CREATE/ALTER CLUSTER sequencing to do a round trip through SHOW always and panic (dev) or return an error to the user and abort the catalog txn (prod) if it fails. Then we don't have to modify testing at all and we have certainty that we've never committed catalog state that we can't print correctly.

@benesch
Copy link
Member

benesch commented Jun 24, 2024

Seems reasonable. Easy to do for CREATE CLUSTER because the statement should roundtrip more or less exactly (gotta be careful with option order). Hard to do for ALTER CLUSTER, I think, because you don't have a reference for what the statement should look like.

@maddyblue
Copy link
Contributor

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

@benesch
Copy link
Member

benesch commented Jun 24, 2024

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

@def-
Copy link
Contributor

def- commented Jun 25, 2024

We should teach CREATE/ALTER CLUSTER sequencing to do a round trip through SHOW always and panic (dev) or return an error to the user and abort the catalog txn (prod) if it fails. Then we don't have to modify testing at all and we have certainty that we've never committed catalog state that we can't print correctly.

Could we generalize that for every query we serialize and verify that it roundtrips successfully? Would save us a lot of effort on manually thinking of test cases or hoping that we have all the relevant grammar for randomized testing. And most importantly this shows the user a nice error immediately when running the query instead of having problems the next time envd happens to restart.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 25, 2024

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 25, 2024

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

I did this in planning because it was much more straight forward.

Comment on lines +3555 to +3581
// Roundtrip through unplan and make sure that we end up with the same plan.
if let CreateClusterVariant::Managed(_) = &plan.variant {
let stmt = unplan_create_cluster(scx, plan.clone())
.map_err(|e| PlanError::Replan(e.to_string()))?;
let create_sql = stmt.to_ast_string_stable();
let stmt = parse::parse(&create_sql)
.map_err(|e| PlanError::Replan(e.to_string()))?
.into_element()
.ast;
let (stmt, _resolved_ids) =
names::resolve(scx.catalog, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
let stmt = match stmt {
Statement::CreateCluster(stmt) => stmt,
stmt => {
return Err(PlanError::Replan(format!(
"replan does not match: plan={plan:?}, create_sql={create_sql:?}, stmt={stmt:?}"
)))
}
};
let replan =
plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
if plan != replan {
return Err(PlanError::Replan(format!(
"replan does not match: plan={plan:?}, replan={replan:?}"
)));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we start to have more unplans, this could be generalized to look something like this:

fn plan(stmt: Statement<Aug>) -> Result<Plan, PlanError> {
    let plan = plan_inner(stmt)?;
    
    if let Some(stmt) = unplan(plan.clone()) {
        let create_sql = stmt.to_ast_string_stable();
        let stmt = parse::parse(&create_sql)
            .map_err(|e| PlanError::Replan(e.to_string()))?
            .into_element()
            .ast;
       let (stmt, _resolved_ids) =
            names::resolve(scx.catalog, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
       let replan = plan_inner(stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
       if plan != replan {
            return Err(PlanError::Replan(format!(
                "replan does not match: plan={plan:?}, replan={replan:?}"
            )));
        }
    }

    Ok(plan)
}

fn plan_inner(stmt: Statement<Aud>) -> Result<Plan, PlanError> {
    ...
}

fn unplan(plan: Plan) -> Result<Option<Statement<Aug>>, PlanError> {
    match plan {
        Plan::CreateClusterPlan(plan) => Ok(Some(unplan_create_cluster(plan)?)),
        ...
        // These plans cannot be unplanned.
        _ => Ok(None),
    }
}

@benesch
Copy link
Member

benesch commented Jun 25, 2024

I did this in planning because it was much more straight forward.

Nice! I took a quick look and it looks solid, but deferring (re-)approval to someone with more bandwidth.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 25, 2024

The method doesn't need to compare the printed strings. It parses the printed string and asserts that the internal data structures are the same.

Ah, I see, you'd actually run the printed string through the parser and planner (and sequencer??) and make sure that you get the same cluster object out. Makes sense.

We can't do exactly this, because we only have the actual Cluster object after the catalog transaction completes, at which point it's too late to error. We could do plan -> stmt -> string -> stmt -> plan in either planning or sequencing. Or we could do cluster -> plan -> stmt -> string -> stmt -> plan -> cluster inside of the catalog transaction.

I did this in planning because it was much more straight forward.

This uncovered an issue. You cannot specify the DISK option in SQL without a feature flag. However, we determine whether to use a disk based on multiple other variables. So the SQL doesn't rountrip because the DISK option is rejected by the planner when the feature flag is off. We can't just omit DISK in the SQL because then it might derive a different value than we did the first time.

@maddyblue
Copy link
Contributor

Bootstrap planning uses the "turn on all the flags mode". Should we use that here too?

vec![value]
);
for value in values {
if let Some(value) = value.try_into_value(catalog) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what's going on here but: why is it ok to ignore values that fail try_into_value and not error or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't blame you, this macro was not easy for me to figure out and update. I'll try and explain it here and in the process maybe I'll come up with a good comment to add that also explains it.

Before my PR this macro allowed you to convert a list of options into some struct that had a field for each option. So for example when processing CREATE CLUSTER options, you would start with a Vec<ClusterOption<Aug>>.

/// An option in a `CREATE CLUSTER` statement.
pub struct ClusterOption<T: AstInfo> {
    pub name: ClusterOptionName,
    pub value: Option<WithOptionValue<T>>,
}

You would convert that into a struct that looks like the following (the struct is generated by the macro so there's no where in the code to look for this exact struct):

struct ClusterOptionExtracted {
    availability_zones: Option<Vec<String>>,
    disk: Option<bool>,
    managed: Option<bool>,
    ...
}

The Vec only contains options that were explicitly specified. So a field in ClusterOptionExtracted would be Some(_) if the option was specified and None if it was not.

Now this commit adds the reverse process, it takes a ClusterOptionExtracted and converts it back into a Vec<ClusterOption<Aug>>. So if the field is Some(_) then we need to generate a ClusterOption<Aug> and add it to the Vec. However, if the field is None, then there was no option specified and we have nothing to add back to the list. That's why we ignore the None values here, they are not errors they just don't exist as specified options.

};
let replan =
plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
if plan != replan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be in ALTER CLUSTER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What comparison should we do for ALTER CLUSTER? We don't have a CreateClusterPlan to compare to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that I think we'd have to move the check into sequencing and do the full roundtrip of Cluster -> CreateClusterPlan -> CreateClusterStatement -> String -> CreateClusterStatement -> CreateClusterPlan -> Cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah. ALTER has an updated catalog so it could produce a sql string from that to plan. But we need to compare it to the Cluster object in the catalog, which is the sequenced plan, not the plan? I'm ok merging as is, but I do very much think we should complete the thought because ALTER seems like a likely place where we make a small mistake and either corrupt or have silent mutation to some object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this commit doesn't add any new hazards to ALTER CLUSTER, but this feature could be used as a tool to protect against existing or future hazards in ALTER CLUSTER? If I'm understanding that correctly, then I'm inclined to merge as is and let someone else add that to ALTER CLUSTER as a follow up.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Jun 26, 2024

Bootstrap planning uses the "turn on all the flags mode". Should we use that here too?

I don't think that would quite work. This would cause the re-planning to succeed, but generate invalid SQL. For example we'd end up with something like the following:

CREATE CLUSTER c SIZE '250cc';

SHOW CREATE CLUSTER c;
----
"CREATE CLUSTER c (SIZE = '250cc', DISK = 'true', ...)";

It is true that cluster c uses a disk, but if someone tried to type that in, it would error because you can't specify the DISK option. We can't just omit the DISK option either because then we may return inaccurate SQL. For example, consider the following:

ALTER SYSTEM SET disk_cluster_replicas_default = TRUE;

CREATE CLUSTER c SIZE 'small';

ALTER SYSTEM SET disk_cluster_replicas_default = FALSE;

SHOW CREATE CLUSTER c;
----
"CREATE CLUSTER c (SIZE = 'small')";

Here, we've omitted the DISK option, which implies that there is no DISK, which is not true.

I did come up with an idea last night. We update the feature flag requirement so that it only fires when the user specifies the DISK option AND the value doesn't match what we were going to pick without the user option. It doesn't make a lot of sense to reject DISK = TRUE if we were going to use a disk anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a test that unmanaged clusters return an error.

};
let replan =
plan_create_cluster_inner(scx, stmt).map_err(|e| PlanError::Replan(e.to_string()))?;
if plan != replan {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah. ALTER has an updated catalog so it could produce a sql string from that to plan. But we need to compare it to the Cluster object in the catalog, which is the sequenced plan, not the plan? I'm ok merging as is, but I do very much think we should complete the thought because ALTER seems like a likely place where we make a small mistake and either corrupt or have silent mutation to some object.

@jkosh44 jkosh44 merged commit a458244 into MaterializeInc:main Jun 26, 2024
192 of 196 checks passed
@jkosh44 jkosh44 deleted the show-create-cluster branch June 26, 2024 22:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants