Server-side Apply problem #1311

allenhaozi · 2020-12-27T15:03:52Z

In our internal scenario, We developed a PaaS platform based on OAM
our business simply declare the Workload and Traits they use
There are two types of Workload, ServerWorkload(online) and TaskWorkload(offline)
Traits include ManualScalerTrait,AutoScalerTrait,LoadBalanceTrait etc.

The ServerWorkload renders a Deployment and creates it
When the ManualScalerTrait observes that this Deployment has been created, it changes the replicas of this Deployment

In practice, however, we found that once our Operator was restarted, the POD that was already under the running deployment would be restarted too.

found the following event

Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  13s (x12 over 5h5m)  deployment-controller  Scaled down replica set simple-web-33-5d759bd4cf to 0
  Normal  ScalingReplicaSet  12s (x15 over 5h7m)  deployment-controller  Scaled up replica set simple-web-33-5d759bd4cf to 3

Because the replicas of Deployment rendered by ServerWorkload is 0 But the number of ManualScaler is 3, the deployment changes from 3 to 0 and then to 3 again, Caused a restart

Based on the implementation mechanism, it is difficult for ServerWorkload to patch Deployment because it renders a complete Deployment template.

If APIServer can provide this parameter client.SkipConflictFields, perfect solution

ao := []client.PatchOption{ client.FieldOwner(c1.GetUID()),client.SkipConflictFields}
err := r.Patch(ctx, deploy, client.Apply, ao...)

controller-runtime version : 0.6.2

The text was updated successfully, but these errors were encountered:

coderanger · 2020-12-27T16:33:19Z

What are those GetUID functions?

coderanger · 2020-12-27T16:35:50Z

Oh the second isn’t a server side apply. Check the patch diff you are sending by logging it or similar. Will probably show something unexpected.

allenhaozi · 2020-12-27T18:00:43Z

Oh the second isn’t a server side apply. Check the patch diff you are sending by logging it or similar. Will probably show something unexpected.

Thank you for your response,
Is there an example here that multiple controller manage one resource using server side apply

coderanger · 2020-12-28T01:32:10Z

I do it in my rabbitmq-operator, https://github.com/coderanger/rabbitmq-operator/blob/main/controllers/rabbituser.go though the code is probably going to be hard to follow. Also it's been a huge source of bugs and I really want to get rid of it :D

allenhaozi · 2020-12-28T05:15:35Z

when restart controller pod got the following message

ao := []client.PatchOption{ client.FieldOwner(c1.GetUID())}
err := r.Patch(ctx, deploy, client.Apply, ao...)

utils.Pretty(err)

([]interface {}) (len=1 cap=1) {
 (*errors.withStack)(0xc00070efa0)(cannot patch object:apply: Apply failed with 5 conflicts: conflicts with "c060a82f-e150-47a2-a1a3-83c863113143" using apps/v1:
- .spec.replicas
- .spec.template.spec.containers[name="simple-web-0"].resources.requests.memory
- .spec.template.spec.containers[name="simple-web-0"].resources.requests.memory
- .spec.template.spec.containers[name="simple-web-0"].resources.requests.memory
- .spec.template.spec.containers[name="simple-web-0"].resources.requests.memory)
}

when change ao to []client.PatchOption{client.ForceOwnership, client.FieldOwner(c1.GetUID())}

lead to deployment's pod restart, because C1 controller change the replicas of the deployment

although managedFields show the replicas managed by C2

coderanger · 2020-12-28T05:27:37Z

Field manager names should not be UUIDs, they should be names, usually the name of the controller (or subcontroller element).

kwiesmueller · 2020-12-28T13:09:18Z

The conflict probably means that your apply contains those fields.

When using SSA one should only send the fields they care about. In this case your operator should only specify (I think) replicas. And not other fields.
Otherwise it will try to own them and potentially overwrite them if the are different.

If two controllers are intended to manage the same resource, they should never specify a patch that would make their ownership collide.

SSA is not meant for sending the full object if you don't care about the full object in your workflow.

I don't know how you're building your object here, are you applying the full object?

/wg api-expression
/cc @apelisse @jpbetz

allenhaozi · 2020-12-28T13:35:17Z

The conflict probably means that your apply contains those fields.

When using SSA one should only send the fields they care about. In this case your operator should only specify (I think) replicas. And not other fields.
Otherwise it will try to own them and potentially overwrite them if the are different.

If two controllers are intended to manage the same resource, they should never specify a patch that would make their ownership collide.

SSA is not meant for sending the full object if you don't care about the full object in your workflow.

I don't know how you're building your object here, are you applying the full object?

/wg api-expression
/cc @apelisse @jpbetz

in my case , I have two controllers
The first controller is responsible for creating Deployment with Spec.Replicas is 0
the second controller is responsible for scaling Deployment, set field Spec.Replicas = 3

when restart the controllers got the following event

Events:
  Type    Reason             Age                  From                   Message
  ----    ------             ----                 ----                   -------
  Normal  ScalingReplicaSet  13s (x12 over 5h5m)  deployment-controller  Scaled down replica set simple-web-33-5d759bd4cf to 0
  Normal  ScalingReplicaSet  12s (x15 over 5h7m)  deployment-controller  Scaled up replica set simple-web-33-5d759bd4cf to 3

kwiesmueller · 2020-12-28T14:08:11Z

That should be the representation of a restart for all pods. You scale down and then back up.
Your first controller code above does a Patch and not a create, so it will also update the resource when it's already created.
You might want to check if it's not there already.
Then the second controller can "steal" the field from the create-controller.

Make sure both controllers have different fieldManagers or they might bounce around ownership (one fieldManager has to send the fields it cares about on every apply or they will get dropped).

And make sure that the second controller only sends the replicas field.
I don't know straight out of my head how this is intended to be used with the controller-runtime client (maybe @DirectXMan12 can help?) but in general with SSA you can supply an incomplete JSON that only specifies the fields you care about.

allenhaozi · 2020-12-28T14:51:34Z

is there a way remove the deployment field or fill deployment field by managedFields between desired deployment and current deployment ?

I have two kind controller

assume controller 1 named Workload
controller 2 named Scaler

this Apply function for Workload create or update deployment

func (a *Applicator) Apply(ctx context.Context, parent, current runtime.Object, ao ...client.PatchOption) error {
	
		ownerRef := utils.GenOwnerReferenceByObject(parent)
		crmeta.AddOwnerReference(current, ownerRef)
	

	if deploy, ok := current.(*appsv1.Deployment); ok {
		desired := deploy.DeepCopy()
		err := a.Get(ctx, types.NamespacedName{Name: current.GetName(), Namespace: current.GetNamespace()}, deploy)
		if apierrors.IsNotFound(err) {
			ao = append(ao, client.ForceOwnership)
		} else {
			patchOpts := []client.PatchOption{
				client.FieldOwner(utils.GenFieldOwnerByObject(parent))}
			a.prePatch(parent, desired, deploy)
			return errors.Wrap(a.Patch(ctx, desired, client.MergeFrom(deploy), patchOpts...), "cannot patch object:merge")
		}
	}

	// Server-side Apply
	return errors.Wrap(a.Patch(ctx, current, client.Apply, ao...), "cannot patch object:apply")
}

// can remove or fill by the managedFields
// use deploy's replicas replace desired's replicas ? 
func (a *Applicator) prePatch(parent,desired ,current runtime.Object){
        
}

this Patch function is Scaler to scale deployment

// patch method for scale
func (a *Applicator) ScalePatch(ctx ctx.Context, scaler, desired, current runtime.Object) error {

	ownerRef := utils.GenOwnerReferenceByTrait(trait)
	crmeta.AddOwnerReference(desired, ownerRef)

	if err := a.Patch(ctx, desired, client.MergeFrom(current), client.FieldOwner(utils.GenFieldOwnerByObject(scaler))); err != nil {
		return err
	}

	return nil
}

kwiesmueller · 2020-12-28T15:04:45Z

I think I don't understand your setup yet.
You said you want one controller to create deployments and one to set the replicas.
The snippet above seems like you're doing both in one thing?

Can you clarify the goal you want to reach?
Right now you're doing either an Apply or a MergePatch depending on the objects existence.

allenhaozi · 2020-12-29T03:19:35Z

I know what my problem is.

in my WorkloadController which controls the deployment

ownerReferences:
  - apiVersion: core.4paradigm.com/v1alpha2
    blockOwnerDeletion: true
    controller: true
    kind: WorkloadController
    name: simple-web-29
    uid: d26e5886-938e-4748-9927-4eed3712feaa
  - apiVersion: core.4paradigm.com/v1alpha2
    blockOwnerDeletion: true
    controller: false
    kind: ScalerController
    name: simple-web-29
    uid: 1c156536-63af-4ce6-aea2-533836b6eef5
  uid: 7ac2aacd-c66b-4918-9731-806d93689d76

It should be used a.Patch(ctx, current, client.Apply, ao...) all the time

but I have to pre-process this deployment which WorkloadController render when the deployment has exists.
because it contains the replica field which has been update by ScalerController

at the beginning I think k8s SSA will skip this replicas field which managedFields belong to ScalerController,
In fact I got an error If I don't use client.ForceOwnership

I need find a way to deal with fields that have been modified by ScalerController

thanks @coderanger @kwiesmueller

if SSA can provide client.SkipConflictFields is wonderful

ao := []client.PatchOption{ client.FieldOwner(c1.GetUID()),client.SkipConflictFields}
err := r.Patch(ctx, deploy, client.Apply, ao...)

coderanger · 2020-12-29T07:37:59Z

That would be a feature request for upstream, but seems unlikely. As mentioned, you shouldn't be sending a replicas field at all. A common mistake with Apply patches is to use a normal object, you almost always want to be using an unstructued. Take a look at the code in https://github.com/coderanger/controller-utils/blob/main/components/template.go (or just use it directly)

fejta-bot · 2021-04-21T07:03:21Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

fejta-bot · 2021-05-21T07:31:54Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

fejta-bot · 2021-06-20T07:51:18Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

k8s-ci-robot · 2021-06-20T07:51:21Z

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-contributor-experience at kubernetes/community.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

allenhaozi mentioned this issue Dec 28, 2020

Server-side Apply! #347

Open

k8s-ci-robot added the wg/api-expression Categorizes an issue or PR as relevant to WG API Expression. label Dec 28, 2020

allenhaozi mentioned this issue Jan 9, 2021

managedfields to fieldPath kubernetes/kubernetes#97559

Closed

allenhaozi mentioned this issue Jan 21, 2021

Trait apply should use update instead of patch crossplane/oam-kubernetes-runtime#272

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 21, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 21, 2021

k8s-ci-robot closed this as completed Jun 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server-side Apply problem #1311

Server-side Apply problem #1311

allenhaozi commented Dec 27, 2020 •

edited

Loading

coderanger commented Dec 27, 2020

coderanger commented Dec 27, 2020

allenhaozi commented Dec 27, 2020

coderanger commented Dec 28, 2020

allenhaozi commented Dec 28, 2020

coderanger commented Dec 28, 2020

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 28, 2020

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 28, 2020 •

edited

Loading

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 29, 2020 •

edited

Loading

coderanger commented Dec 29, 2020

fejta-bot commented Apr 21, 2021

fejta-bot commented May 21, 2021

fejta-bot commented Jun 20, 2021

k8s-ci-robot commented Jun 20, 2021

Server-side Apply problem #1311

Server-side Apply problem #1311

Comments

allenhaozi commented Dec 27, 2020 • edited Loading

coderanger commented Dec 27, 2020

coderanger commented Dec 27, 2020

allenhaozi commented Dec 27, 2020

coderanger commented Dec 28, 2020

allenhaozi commented Dec 28, 2020

coderanger commented Dec 28, 2020

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 28, 2020

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 28, 2020 • edited Loading

this Apply function for Workload create or update deployment

this Patch function is Scaler to scale deployment

kwiesmueller commented Dec 28, 2020

allenhaozi commented Dec 29, 2020 • edited Loading

coderanger commented Dec 29, 2020

fejta-bot commented Apr 21, 2021

fejta-bot commented May 21, 2021

fejta-bot commented Jun 20, 2021

k8s-ci-robot commented Jun 20, 2021

allenhaozi commented Dec 27, 2020 •

edited

Loading

allenhaozi commented Dec 28, 2020 •

edited

Loading

allenhaozi commented Dec 29, 2020 •

edited

Loading