-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update upgrade reconciler to handle subclusters in sandbox #772
Conversation
pkg/iter/sc_finder.go
Outdated
// we can skip the filtering if all of the sts we are | ||
// looking for are not for a subcluster in vdb | ||
if flags&FindInVdb != 0 || flags&FindExisting != 0 { | ||
sts = m.filterStatefulsetsBySandbox(sts, sandbox) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add the filtering in m.buildObjList
. If we search for the vertica.com/sandbox
label, it should be able to find the sts we are looking for. Have we decided if the pods will have a sandbox label either? We may be able to filter pods in the same manner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add that label to both for filtering, but shouldn't we only trust the sandbox status in vdb? That's what tell us which sts/pod belongs to a sandbox. I guess we can also use a label but in that case we need to make sure we correctly set/unset it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be setup correctly. We have full control over it, so it would a programming error if not set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok then let's set the label in both sts and pods.
pkg/controllers/vdb/upgrade.go
Outdated
isAllowedForUpgradePolicyFunc func(vdb *vapi.VerticaDB) bool) *UpgradeManager { | ||
return &UpgradeManager{ | ||
VRec: vdbrecon, | ||
Vdb: vdb, | ||
Log: log, | ||
Finder: iter.MakeSubclusterFinder(vdbrecon.Client, vdb), | ||
StatusCondition: statusCondition, | ||
SandboxName: sandbox, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think setting sandbox for the entire upgrade manager makes sense. For replicated upgrade, it will span both the main cluster and a sandbox. So, it doesn't really make sense. I think each function in UpgradeManager that needs the sandbox should accept it as input.
@@ -233,7 +233,7 @@ func (o *ObjReconciler) checkForDeletedSubcluster(ctx context.Context) (ctrl.Res | |||
finder := iter.MakeSubclusterFinder(o.VRec.Client, o.Vdb) | |||
|
|||
// Find any statefulsets that need to be deleted | |||
stss, err := finder.FindStatefulSets(ctx, iter.FindNotInVdb) | |||
stss, err := finder.FindStatefulSets(ctx, iter.FindNotInVdb, vapi.MainCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than hard coding the main cluster, could we use the sandbox found in the podfacts?
@@ -228,7 +230,7 @@ func (o *OfflineUpgradeReconciler) checkNMADeploymentChange(ctx context.Context) | |||
return ctrl.Result{}, nil | |||
} | |||
|
|||
stss, err := o.Finder.FindStatefulSets(ctx, iter.FindExisting) | |||
stss, err := o.Finder.FindStatefulSets(ctx, iter.FindExisting, o.SandboxName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment, could we use the sandbox from the podfacts if its available in the reconciler?
labels := makeLabelsForObject(vdb, sc, true) | ||
sandbox := vdb.GetSubclusterSandboxName(sc.Name) | ||
if sandbox != vapi.MainCluster { | ||
labels[vmeta.SandboxNameLabel] = sandbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the comment for vmeta.SandboxNameLabel (in pkg/meta/labels.go) to mention that we add it for pods now.
pkg/controllers/vdb/podfacts.go
Outdated
@@ -1129,6 +1129,17 @@ func (p *PodFacts) findExpectedNodeNames() []string { | |||
return expectedNodeNames | |||
} | |||
|
|||
// getSandboxName returns the name of the sandbox, or empty string | |||
// for main cluster, the pods belong to | |||
func (p *PodFacts) getSandboxName() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As prep for moving this struct to its own package, we probably want to export it.
func (p *PodFacts) getSandboxName() string { | |
func (p *PodFacts) GetSandboxName() string { |
pkg/iter/sc_finder.go
Outdated
@@ -80,7 +80,7 @@ func (m *SubclusterFinder) FindStatefulSets(ctx context.Context, flags FindFlags | |||
// FindServices returns service objects that are in use for subclusters | |||
func (m *SubclusterFinder) FindServices(ctx context.Context, flags FindFlags) (*corev1.ServiceList, error) { | |||
svcs := &corev1.ServiceList{} | |||
if err := m.buildObjList(ctx, svcs, flags); err != nil { | |||
if err := m.buildObjList(ctx, svcs, flags, vapi.MainCluster); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, just because we know how FindServices is called, but to be consistent with the other exported APIs, we should take the sandbox in as input parm to FindServices. That will keep this package general purpose.
pkg/iter/sc_finder.go
Outdated
@@ -125,7 +120,7 @@ func (m *SubclusterFinder) FindSubclusters(ctx context.Context, flags FindFlags, | |||
// a sandboxed subcluster can only be one of those in the vdb so we skip this part if | |||
// we are looking for sandboxed subclusters | |||
if isMainCluster && (flags&FindNotInVdb != 0 || flags&FindExisting != 0) { | |||
missingSts, err := m.FindStatefulSets(ctx, flags & ^FindInVdb) | |||
missingSts, err := m.FindStatefulSets(ctx, flags & ^FindInVdb, vapi.MainCluster) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this still make sense to pass in main cluster here? Any harm in removing the isMainCluster
, getting in here for a sandbox and passing the sandbox parm here. It would keep this logic simple. No need for the comment at line 120-121.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to avoid calling the client to fetch sts when we don't need, but I see what you mean, now that we have moved the filter in buildObjList let's just pass the sandbox name.
pkg/iter/sc_finder.go
Outdated
if sandbox == vapi.MainCluster { | ||
return l[vmeta.SandboxNameLabel] != vapi.MainCluster | ||
} | ||
return l[vmeta.SandboxNameLabel] != sandbox |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you just simply it this way?
if sandbox == vapi.MainCluster { | |
return l[vmeta.SandboxNameLabel] != vapi.MainCluster | |
} | |
return l[vmeta.SandboxNameLabel] != sandbox | |
return l[vmeta.SandboxNameLabel] != sandbox |
sandbox := r.PFacts.getSandboxName() | ||
if ok, err := r.Manager.IsUpgradeNeeded(ctx, sandbox); !ok || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you merge up r.PFacts
is a map now (sandbox name to podfacts). This only needs to look at the main cluster, so you probably can just pass in vapi.MainCluster
here.
sandbox := r.PFacts.getSandboxName() | |
if ok, err := r.Manager.IsUpgradeNeeded(ctx, sandbox); !ok || err != nil { | |
if ok, err := r.Manager.IsUpgradeNeeded(ctx, vapi.MainCluster); !ok || err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good. Thanks for doing the revisions. I noticed in the last run leg 9 failed related to replicated upgrade. I didn't look at it in too much detail. So, I'm not sure if its a flaky test or not. I'm approving assuming it's a flaky test or requires a small change to fix.
I think it is a flaky test, I got that yesterday and rerun the test. At |
@@ -88,6 +88,12 @@ func makeLabelsForObject(vdb *vapi.VerticaDB, sc *vapi.Subcluster, forPod bool) | |||
|
|||
// MakeLabelsForPodObject constructs the labels that are common for all pods | |||
func MakeLabelsForPodObject(vdb *vapi.VerticaDB, sc *vapi.Subcluster) map[string]string { | |||
return makeLabelsForObject(vdb, sc, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the test was failing because when the replicaB is sandboxed, the sts podTemplateSpec is updated(sandbox name label added) which causes a pod replacement with a rollingUpdateStrategy. That's why *-pri1-sb-2
is restarted(and will never be up again as restart reconciler is not called at this point) and we get upNodeCount mismatch expect 3 actual 2
. I don't think we really want to rolling update those pods, do we?
So if we want to add sandbox label to the pods, maybe we should do so after the sandbox reconciler. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. Since the pods have to be restarted by the sandbox controller, it seems like it should apply the label then.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.
This change will modify the upgrade reconcilers, both offline and online, to function in either the main cluster or a sandbox. Our immediate plan is to use the offline upgrade reconciler within the sandbox controller, although this will be done in a follow-on task.