Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update upgrade reconciler to handle subclusters in sandbox #772

Merged
merged 7 commits into from
Apr 19, 2024

Conversation

roypaulin
Copy link
Collaborator

@roypaulin roypaulin commented Apr 16, 2024

This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.

// we can skip the filtering if all of the sts we are
// looking for are not for a subcluster in vdb
if flags&FindInVdb != 0 || flags&FindExisting != 0 {
sts = m.filterStatefulsetsBySandbox(sts, sandbox)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the filtering in m.buildObjList. If we search for the vertica.com/sandbox label, it should be able to find the sts we are looking for. Have we decided if the pods will have a sandbox label either? We may be able to filter pods in the same manner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add that label to both for filtering, but shouldn't we only trust the sandbox status in vdb? That's what tell us which sts/pod belongs to a sandbox. I guess we can also use a label but in that case we need to make sure we correctly set/unset it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be setup correctly. We have full control over it, so it would a programming error if not set.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok then let's set the label in both sts and pods.

isAllowedForUpgradePolicyFunc func(vdb *vapi.VerticaDB) bool) *UpgradeManager {
return &UpgradeManager{
VRec: vdbrecon,
Vdb: vdb,
Log: log,
Finder: iter.MakeSubclusterFinder(vdbrecon.Client, vdb),
StatusCondition: statusCondition,
SandboxName: sandbox,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think setting sandbox for the entire upgrade manager makes sense. For replicated upgrade, it will span both the main cluster and a sandbox. So, it doesn't really make sense. I think each function in UpgradeManager that needs the sandbox should accept it as input.

@@ -233,7 +233,7 @@ func (o *ObjReconciler) checkForDeletedSubcluster(ctx context.Context) (ctrl.Res
finder := iter.MakeSubclusterFinder(o.VRec.Client, o.Vdb)

// Find any statefulsets that need to be deleted
stss, err := finder.FindStatefulSets(ctx, iter.FindNotInVdb)
stss, err := finder.FindStatefulSets(ctx, iter.FindNotInVdb, vapi.MainCluster)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than hard coding the main cluster, could we use the sandbox found in the podfacts?

@@ -228,7 +230,7 @@ func (o *OfflineUpgradeReconciler) checkNMADeploymentChange(ctx context.Context)
return ctrl.Result{}, nil
}

stss, err := o.Finder.FindStatefulSets(ctx, iter.FindExisting)
stss, err := o.Finder.FindStatefulSets(ctx, iter.FindExisting, o.SandboxName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment, could we use the sandbox from the podfacts if its available in the reconciler?

@roypaulin roypaulin requested a review from spilchen April 16, 2024 21:22
@roypaulin roypaulin self-assigned this Apr 16, 2024
labels := makeLabelsForObject(vdb, sc, true)
sandbox := vdb.GetSubclusterSandboxName(sc.Name)
if sandbox != vapi.MainCluster {
labels[vmeta.SandboxNameLabel] = sandbox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the comment for vmeta.SandboxNameLabel (in pkg/meta/labels.go) to mention that we add it for pods now.

@@ -1129,6 +1129,17 @@ func (p *PodFacts) findExpectedNodeNames() []string {
return expectedNodeNames
}

// getSandboxName returns the name of the sandbox, or empty string
// for main cluster, the pods belong to
func (p *PodFacts) getSandboxName() string {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As prep for moving this struct to its own package, we probably want to export it.

Suggested change
func (p *PodFacts) getSandboxName() string {
func (p *PodFacts) GetSandboxName() string {

pkg/controllers/vdb/upgrade.go Show resolved Hide resolved
@@ -80,7 +80,7 @@ func (m *SubclusterFinder) FindStatefulSets(ctx context.Context, flags FindFlags
// FindServices returns service objects that are in use for subclusters
func (m *SubclusterFinder) FindServices(ctx context.Context, flags FindFlags) (*corev1.ServiceList, error) {
svcs := &corev1.ServiceList{}
if err := m.buildObjList(ctx, svcs, flags); err != nil {
if err := m.buildObjList(ctx, svcs, flags, vapi.MainCluster); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, just because we know how FindServices is called, but to be consistent with the other exported APIs, we should take the sandbox in as input parm to FindServices. That will keep this package general purpose.

@@ -125,7 +120,7 @@ func (m *SubclusterFinder) FindSubclusters(ctx context.Context, flags FindFlags,
// a sandboxed subcluster can only be one of those in the vdb so we skip this part if
// we are looking for sandboxed subclusters
if isMainCluster && (flags&FindNotInVdb != 0 || flags&FindExisting != 0) {
missingSts, err := m.FindStatefulSets(ctx, flags & ^FindInVdb)
missingSts, err := m.FindStatefulSets(ctx, flags & ^FindInVdb, vapi.MainCluster)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this still make sense to pass in main cluster here? Any harm in removing the isMainCluster, getting in here for a sandbox and passing the sandbox parm here. It would keep this logic simple. No need for the comment at line 120-121.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to avoid calling the client to fetch sts when we don't need, but I see what you mean, now that we have moved the filter in buildObjList let's just pass the sandbox name.

Comment on lines 201 to 204
if sandbox == vapi.MainCluster {
return l[vmeta.SandboxNameLabel] != vapi.MainCluster
}
return l[vmeta.SandboxNameLabel] != sandbox
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you just simply it this way?

Suggested change
if sandbox == vapi.MainCluster {
return l[vmeta.SandboxNameLabel] != vapi.MainCluster
}
return l[vmeta.SandboxNameLabel] != sandbox
return l[vmeta.SandboxNameLabel] != sandbox

Comment on lines 71 to 72
sandbox := r.PFacts.getSandboxName()
if ok, err := r.Manager.IsUpgradeNeeded(ctx, sandbox); !ok || err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you merge up r.PFacts is a map now (sandbox name to podfacts). This only needs to look at the main cluster, so you probably can just pass in vapi.MainCluster here.

Suggested change
sandbox := r.PFacts.getSandboxName()
if ok, err := r.Manager.IsUpgradeNeeded(ctx, sandbox); !ok || err != nil {
if ok, err := r.Manager.IsUpgradeNeeded(ctx, vapi.MainCluster); !ok || err != nil {

@roypaulin roypaulin requested a review from spilchen April 17, 2024 13:28
Copy link
Collaborator

@spilchen spilchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good. Thanks for doing the revisions. I noticed in the last run leg 9 failed related to replicated upgrade. I didn't look at it in too much detail. So, I'm not sure if its a flaky test or not. I'm approving assuming it's a flaky test or requires a small change to fix.

@roypaulin
Copy link
Collaborator Author

roypaulin commented Apr 17, 2024

Everything looks good. Thanks for doing the revisions. I noticed in the last run leg 9 failed related to replicated upgrade. I didn't look at it in too much detail. So, I'm not sure if its a flaky test or not. I'm approving assuming it's a flaky test or requires a small change to fix.

I think it is a flaky test, I got that yesterday and rerun the test. At wait-for-secondaries only 2/3 nodes are up. If it persists we would need to take a closer look.

@@ -88,6 +88,12 @@ func makeLabelsForObject(vdb *vapi.VerticaDB, sc *vapi.Subcluster, forPod bool)

// MakeLabelsForPodObject constructs the labels that are common for all pods
func MakeLabelsForPodObject(vdb *vapi.VerticaDB, sc *vapi.Subcluster) map[string]string {
return makeLabelsForObject(vdb, sc, true)
Copy link
Collaborator Author

@roypaulin roypaulin Apr 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the test was failing because when the replicaB is sandboxed, the sts podTemplateSpec is updated(sandbox name label added) which causes a pod replacement with a rollingUpdateStrategy. That's why *-pri1-sb-2 is restarted(and will never be up again as restart reconciler is not called at this point) and we get upNodeCount mismatch expect 3 actual 2. I don't think we really want to rolling update those pods, do we?
So if we want to add sandbox label to the pods, maybe we should do so after the sandbox reconciler. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Since the pods have to be restarted by the sandbox controller, it seems like it should apply the label then.

@roypaulin roypaulin merged commit 3773a48 into vnext Apr 19, 2024
31 checks passed
@roypaulin roypaulin deleted the roypaulin/upgrade-sandbox branch April 19, 2024 10:09
cchen-vertica pushed a commit that referenced this pull request Jul 17, 2024
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
cchen-vertica pushed a commit that referenced this pull request Jul 17, 2024
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
cchen-vertica pushed a commit that referenced this pull request Jul 24, 2024
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
cchen-vertica pushed a commit that referenced this pull request Jul 24, 2024
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
cchen-vertica pushed a commit that referenced this pull request Jul 24, 2024
This change will modify the upgrade reconcilers, both offline and
online, to function in either the main cluster or a sandbox. Our
immediate plan is to use the offline upgrade reconciler within the
sandbox controller, although this will be done in a follow-on task.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants