Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race with ActivePlan between Instance and PlanExecution controllers #722

Merged
merged 21 commits into from
Aug 28, 2019

Conversation

alenkacz
Copy link
Contributor

@alenkacz alenkacz commented Aug 15, 2019

What type of PR is this?
/kind bug

What this PR does / why we need it:
Based on the test I wrote in #718 it became apparent where lies the race that is one of the causes of multiple deploy plan executions.

  • instance is created, create event triggers reconciliation
  • planexecution is created
  • OV is created and triggers reconciliation
  • another planexecution is created because ActivePlan is still empty
  • PlanExecutionController processes added planexecutions and create link between one of them and instance leaving two deploy planexecutions in progress, only one linked to instance, the other just hanging around forever

This PR proposes to move adding the ActivePlan relation to InstanceController. To make that possible I needed to add some safeguards to the PlanExecutionController as well.

Which issue(s) this PR fixes:

This is related to #628

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@alenkacz
Copy link
Contributor Author

/retest

@alenkacz alenkacz changed the title Add reference to PE inside instance controller Fix race with ActivePlan between Instance and PlanExecution controllers Aug 15, 2019

//todo: REMOVE this... we do not want to create a plan outside of reconcile
// most likely... any use of this function is wrong... all plans should be created in reconcile
func createPlanOld(mgr manager.Manager, planName string, instance *kudov1alpha1.Instance) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was real pain maintaining two, while the only difference was how they accepted the dependencies which was pretty easy to unite

if err := controllerutil.SetControllerReference(instance, &planExecution, mgr.GetScheme()); err != nil {
log.Printf("InstanceController: Error setting ControllerReference")
return err
func addActivePlanReference(c client.Client, r record.EventRecorder, planExecution *kudov1alpha1.PlanExecution, instance *kudov1alpha1.Instance) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main change

@@ -228,6 +227,11 @@ func (r *ReconcilePlanExecution) Reconcile(request reconcile.Request) (reconcile
return reconcile.Result{}, err
}

if instance.Status.ActivePlan.Name != planExecution.Name || instance.Status.ActivePlan.Namespace != planExecution.Namespace {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed because otherwise the plan runs until the end but then cannot update instance because that one was modified in between

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general I feel like this safeguard is correct. If we're reconciling PE that is not current active plan, we probably should not execute anything

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be executed right after the PE is created, but before the instance controller calls addActivePlanReference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm good point, I guess it could. To overcome even that case I guess adding reconcile trigger after linking activeplan could be a solution? But I still think this safeguard is correct, because we should really execute only active plan, not every plan that changed for example...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor Author

@alenkacz alenkacz Aug 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh wait, yeah, that should be fine, I think the PE controller should also watch for instance update event and trigger reconciliation as it does for deployments etc. Would you agree? Looking at the implementation right now I am not 100% that is happening and that it was intentional but if we agree on that I can push in that direction

EDIT: Oh yeah, right now it watches only for instances that were created as part of that plan not all of them... I would probably watch and reconcile for all updates. I mean controller-runtime will deal with de-duplication and the reconcile should be idempotent anyway

gvk, _ := apiutil.GVKForObject(instance, scheme)
ref := corev1.ObjectReference{
Kind: gvk.Kind,
Name: instance.Name,
Namespace: instance.Namespace,
}
planExecution := kudov1alpha1.PlanExecution{
TypeMeta: metav1.TypeMeta{
Kind: "PlanExecution",
APIVersion: "kudo.dev/v1alpha1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this dotted in a lot of places, so I looked for some "standardization" point that already exists in our codebase generated by kubebuilder, maybe we start moving to something more like this? Or at least turn it into a constant we can reference elsewhere. Maybe this is the constant?

SchemeGroupVersion = schema.GroupVersion{Group: "kudo.dev", Version: "v1alpha1"}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to unify this in another PR... I'll prepare one. Btw. how do you convert the groupversion to the string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's unify in a separate PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 we'll have to look that up. hopefully it's something we can just set as a var and not have as a function call everywhere.

@@ -228,6 +227,11 @@ func (r *ReconcilePlanExecution) Reconcile(request reconcile.Request) (reconcile
return reconcile.Result{}, err
}

if instance.Status.ActivePlan.Name != planExecution.Name || instance.Status.ActivePlan.Namespace != planExecution.Namespace {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@kudo-ci kudo-ci added size/L and removed size/M labels Aug 27, 2019
@@ -305,57 +305,42 @@ func instanceEventFilter(mgr manager.Manager) predicate.Funcs {
}
}

func createPlan(r *ReconcileInstance, planName string, instance *kudov1alpha1.Instance) error {
func createPlan(c client.Client, r record.EventRecorder, scheme *runtime.Scheme, planName string, instance *kudov1alpha1.Instance) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should be called createPlanAndUpdateInstance or even createPlanAndUpdateInstanceActivePlanReerence :P

@@ -228,6 +252,11 @@ func (r *ReconcilePlanExecution) Reconcile(request reconcile.Request) (reconcile
return reconcile.Result{}, err
}

if instance.Status.ActivePlan.Name != planExecution.Name || instance.Status.ActivePlan.Namespace != planExecution.Namespace {
// this can happen for newly created PlanExecution where ActivePlan was not yet set to point to this instance
return reconcile.Result{}, fmt.Errorf("instance %s does not have ActivePlan pointing to PlanExecution %s, %s. Instead %s, %s. Retrying", instance.Name, planExecution.Name, planExecution.Namespace, instance.Status.ActivePlan.Name, instance.Status.ActivePlan.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're already watching instance updates, so there is no need to re-trigger this reconciliation (it might create a loop)

err = c.Watch(
&source.Kind{Type: &kudov1alpha1.Instance{}},
&handler.EnqueueRequestsFromMapFunc{
ToRequests: mapFn,
ToRequests: handler.ToRequestsFunc(
Copy link
Contributor

@zen-dog zen-dog Aug 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's meta...

@kudo-ci
Copy link

kudo-ci commented Aug 27, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alenkacz, gerred

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@alenkacz alenkacz merged commit 47e7da7 into master Aug 28, 2019
@kudo-ci kudo-ci deleted the av/pe-reference-in-instance-controller branch August 28, 2019 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants