use sequence_staged for alter cluster ddl #27797

jubrad · 2024-06-21T15:02:18Z

Motivation

This refactor should have no functional change, but will prepare us to implement graceful reconfigure using sequence_staged

Graceful reconfiguration will require sleeps or long running checks inside of the alter cluster ddl. We want these waits to be
cancelable and moved off the main coord thread. This is perfect for sequence_staged to handle.

Tips for reviewer

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
This PR includes the following user-facing behavior changes:

ParkMyCar · 2024-06-24T15:04:04Z

src/adapter/src/coord/sequencer/inner/cluster.rs

+        let validity = PlanValidity {
+            transient_revision: self.catalog().transient_revision(),
+            dependency_ids: BTreeSet::new(),
+            cluster_id: None,


We should add the cluster_id from plan here. This way if the cluster is concurrently altered in a different way we should bail

ParkMyCar · 2024-06-24T15:07:15Z

src/adapter/src/coord/sequencer/inner/cluster.rs

+                self.sequence_alter_cluster_managed_to_managed(
+                    Some(session),
+                    cluster_id,
+                    config,
+                    new_config,
+                    ReplicaCreateDropReason::Manual,
+                )
+                .await?;
+            }
+            (Unmanaged, Managed(new_config)) => {
+                self.sequence_alter_cluster_unmanaged_to_managed(
+                    session, cluster_id, new_config, options,
+                )
+                .await?;
+            }
+            (Managed(_), Unmanaged) => {
+                self.sequence_alter_cluster_managed_to_unmanaged(session, cluster_id)
+                    .await?;
+            }


right now these all still happen on the main coordinator thread. We can easily move them off-thread now by spawning them in a tokio task and returning a handle to that task, is that desirable?

right now I think no?

I believe we only want to do that for alter_managed_to_managed if performing a "graceful" reconfig, although we could potentially kick all of these off on a thread and return with StageResult::Handle or HandleRetire.

If these do any long running async work like reach out to kubernetes and waiting for it to create resources then it might be nice to move them off the coord thread, but def not blocking this PR since we're maintaining the status quo here

That requires refactoring the orchestrator/compute controller which few people own or understand. Definitely the right call to merge this, which puts us into position to incrementally improve this later on.

ParkMyCar

Sweet!

jubrad requested a review from a team as a code owner June 21, 2024 15:02

jubrad requested a review from ParkMyCar June 21, 2024 15:02

jubrad marked this pull request as draft June 21, 2024 15:02

jubrad mentioned this pull request Jun 21, 2024

[Epic] Graceful reconfiguration of managed clusters (wait until hydrated) #20010

Open

jubrad force-pushed the cluster-sequence-staged branch 4 times, most recently from 13d0e8c to c39227a Compare June 21, 2024 17:50

jubrad marked this pull request as ready for review June 21, 2024 19:57

ParkMyCar reviewed Jun 24, 2024

View reviewed changes

use sequence_staged for alter cluster ddl

cf19f10

jubrad force-pushed the cluster-sequence-staged branch from c39227a to cf19f10 Compare June 24, 2024 16:38

ParkMyCar approved these changes Jun 24, 2024

View reviewed changes

jubrad merged commit 0c65d7e into MaterializeInc:main Jun 24, 2024
76 checks passed

materialize-bot mentioned this pull request Jun 27, 2024

release: v0.106.0 required reviews #27930

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use sequence_staged for alter cluster ddl #27797

use sequence_staged for alter cluster ddl #27797

jubrad commented Jun 21, 2024 •

edited

Loading

ParkMyCar Jun 24, 2024

ParkMyCar Jun 24, 2024

jubrad Jun 24, 2024

ParkMyCar Jun 24, 2024

maddyblue Jun 24, 2024

ParkMyCar left a comment

use sequence_staged for alter cluster ddl #27797

use sequence_staged for alter cluster ddl #27797

Conversation

jubrad commented Jun 21, 2024 • edited Loading

Motivation

Tips for reviewer

Checklist

ParkMyCar Jun 24, 2024

Choose a reason for hiding this comment

ParkMyCar Jun 24, 2024

Choose a reason for hiding this comment

jubrad Jun 24, 2024

Choose a reason for hiding this comment

ParkMyCar Jun 24, 2024

Choose a reason for hiding this comment

maddyblue Jun 24, 2024

Choose a reason for hiding this comment

ParkMyCar left a comment

Choose a reason for hiding this comment

jubrad commented Jun 21, 2024 •

edited

Loading