-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use sequence_staged for alter cluster ddl #27797
Conversation
13d0e8c
to
c39227a
Compare
let validity = PlanValidity { | ||
transient_revision: self.catalog().transient_revision(), | ||
dependency_ids: BTreeSet::new(), | ||
cluster_id: None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add the cluster_id
from plan
here. This way if the cluster is concurrently altered in a different way we should bail
self.sequence_alter_cluster_managed_to_managed( | ||
Some(session), | ||
cluster_id, | ||
config, | ||
new_config, | ||
ReplicaCreateDropReason::Manual, | ||
) | ||
.await?; | ||
} | ||
(Unmanaged, Managed(new_config)) => { | ||
self.sequence_alter_cluster_unmanaged_to_managed( | ||
session, cluster_id, new_config, options, | ||
) | ||
.await?; | ||
} | ||
(Managed(_), Unmanaged) => { | ||
self.sequence_alter_cluster_managed_to_unmanaged(session, cluster_id) | ||
.await?; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now these all still happen on the main coordinator thread. We can easily move them off-thread now by spawning them in a tokio task and returning a handle to that task, is that desirable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right now I think no?
I believe we only want to do that for alter_managed_to_managed if performing a "graceful" reconfig, although we could potentially kick all of these off on a thread and return with StageResult::Handle or HandleRetire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these do any long running async work like reach out to kubernetes and waiting for it to create resources then it might be nice to move them off the coord thread, but def not blocking this PR since we're maintaining the status quo here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That requires refactoring the orchestrator/compute controller which few people own or understand. Definitely the right call to merge this, which puts us into position to incrementally improve this later on.
c39227a
to
cf19f10
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
Motivation
This refactor should have no functional change, but will prepare us to implement graceful reconfigure using
sequence_staged
Graceful reconfiguration will require sleeps or long running checks inside of the
alter cluster
ddl. We want these waits to becancelable and moved off the main coord thread. This is perfect for
sequence_staged
to handle.Tips for reviewer
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.