Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Granular management policies #456

Merged
merged 1 commit into from
Jul 14, 2023

Conversation

lsviben
Copy link
Contributor

@lsviben lsviben commented Jun 20, 2023

Description of your changes

Implements new granular management policies based on the accepted design doc.

In essence, it replaces the old spec.managementPolicy that was introduced with ObserveOnly with an new array of granular management policies spec.managementPolicies

In addition as there are more options now, the various reconciler steps are now conditioned based on weather the management policies feature is in use and the combination of actions.

NOTE: As it is an alpha feature, we are just replacing the old managementPolicy field with the new one.

Fixes #453

I have:

  • Read and followed Crossplane's contribution process.
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

Deployed with updated upjet (PR TBD) and provider-aws in a local kind cluster.

  1. Used the Crossplane repo tools to create a kind cluster and install Crossplane on it:
./cluster/local/kind.sh up
./cluster/local/kind.sh helm-install
  1. Check out the aws-provider repo , add a replace to the go.mod which points to the branch of the upjet changes PR and this PR changes.
replace github.com/crossplane/crossplane-runtime => github.com/lsviben/crossplane-runtime v0.0.0-20230712210651-dae13a90cf2f

replace github.com/upbound/upjet => github.com/lsviben/upjet v0.0.0-20230712211824-291bef216c84

replace github.com/crossplane/crossplane-tools => github.com/lsviben/crossplane-tools v0.0.0-20230712201434-f65dadcb91b1

require (
...

run

go mod tidy
go mod vendor

Because of the controller-runtime version change, make a small change in the code at internal/controller/providerconfig/config.go:31

                Named(name).
                WithOptions(o.ForControllerRuntime()).
                For(&v1beta1.ProviderConfig{}).
-               Watches(&source.Kind{Type: &v1beta1.ProviderConfigUsage{}}, &resource.EnqueueRequestForProviderConfig{}).
+               Watches(&v1beta1.ProviderConfigUsage{}, &resource.EnqueueRequestForProviderConfig{}).
                Complete(providerconfig.NewReconciler(mgr, of,
                        providerconfig.WithLogger(o.Logger.WithValues("controller", name)),
                        providerconfig.WithRecorder(event.NewAPIRecorder(mgr.GetEventRecorderFor(name)))))
  1. Generate the CRDs
make generate
  1. Apply the CRDs
kubectl apply -f package/crds/
  1. Apply the provider config
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-creds
      key: creds
  1. Create the AWS credentials secret

  2. Run the provider with --enable-management-policies.

    You can update the run target in the Makefile as below

    diff --git a/Makefile b/Makefile
    index d529a0d6..84411669 100644
    --- a/Makefile
    +++ b/Makefile
    @@ -111,7 +111,7 @@ submodules:
     run: go.build
            @$(INFO) Running Crossplane locally out-of-cluster . . .
            @# To see other arguments that can be provided, run the command with --help instead
    -       UPBOUND_CONTEXT="local" $(GO_OUT_DIR)/provider --debug
    +       UPBOUND_CONTEXT="local" $(GO_OUT_DIR)/provider --debug --enable-management-policies
    
     # NOTE(hasheddan): we ensure up is installed prior to running platform-specific
     # build steps in parallel to avoid encountering an installation race condition.

    and run with:

    make run
  3. Apply some of the test cases below.

NOTE
I used the monolith Upbound AWS provider for simplicity, if your machine is having issues with the ~900 CRDs then consider running the smaller provider families for testing.

No management policy set - defaults to ["*"]

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  name: training-vpc-test
spec:
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TrainingVpc

Pause

Create a standard VPC (check above) and then edit the spec.managementPolicies to []

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  name: training-vpc-test
spec:
  managementPolicies: []
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TrainingVpc

The resource should be paused, you can test by trying to make it reconcile, by for example removing an annotation.

Observe Only test

Create an external resource and make sure you use its name as the MR name. For AWS, we also need to provide some indentifier data (forProvider.region)

apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
metadata:
  name: an-existing-dbinstance
spec:
  managementPolicies: ["Observe"]
  forProvider:
    region: us-west-1

Orphan through managementPolicies

As the spec.managementPolicies does not have the Delete action, after creating the resource below, waiting for it to be Ready
and then deleting it, the external resource will be orphaned.

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  name: training-vpc-test
spec:
  deletionPolicy: Delete
  managementPolicies: ["Observe", "Create", "Update", "LateInitialize"]
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TrainingVpc

Orphan through deletionPolicy

Same process as above, but here we check that the spec.deletionPolicy: Orphan is the one determining the external resource deletion as the spec.managementPolicies is a default value.

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  name: training-vpc-test
spec:
  deletionPolicy: Orphan
  managementPolicies: ["*"]
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TrainingVpc

Delete through managementPolicies, deletionPolicy is orphan

Same setup as above, but in this case, both spec.deletionPolicy and spec.managementPolicies are non-default so spec.managementPolicies trumps the spec.deletionPolicy and the external resource gets deleted.

apiVersion: ec2.aws.upbound.io/v1beta1
kind: VPC
metadata:
  name: training-vpc-test
spec:
  deletionPolicy: Orphan
  managementPolicies: ["Observe", "Create", "Update", "Delete"]
  forProvider:
    region: us-west-1
    cidrBlock: 172.16.0.0/16
    tags:
      Name: TrainingVpc

No late initialization

Tested using a use-case from the design doc:

Prerequisite:

apiVersion: autoscaling.aws.upbound.io/v1beta1
kind: LaunchConfiguration
metadata:
  annotations:
    upjet.upbound.io/manual-intervention: "This resource refers to an AMI ID."
    meta.upbound.io/example-id: autoscaling/v1beta1/launchconfiguration
  labels:
    testing.upbound.io/example-name: bob
  name: bob
spec:
  forProvider:
    imageId: ami-0f6ded6fd335abde3
    instanceType: t2.micro
    region: us-west-1

The Autoscaling group needs a Launch Configuration or a Launch Template to function, in this case I opted for the Launch Configuration. When its ready, create an AutoscalingGroup

apiVersion: autoscaling.aws.upbound.io/v1beta1
kind: AutoscalingGroup
metadata:
  annotations:
    meta.upbound.io/example-id: ecs/v1beta1/capacityprovider
    upjet.upbound.io/manual-intervention: This resource depends on LaunchConfiguration with manual intervention.
  labels:
    testing.upbound.io/example-name: example
  name: example
spec:
  managementPolicies: ["Observe", "Create", "Update", "Delete"]  
  forProvider:
    availabilityZones:
      - us-west-1b
    launchConfiguration: bob
    maxSize: 1
    minSize: 1
    region: us-west-1
    tag:
      - key: AmazonECSManaged
        propagateAtLaunch: true
        value: "true"

The resources are taken from the provider-aws examples.

When the resource is created and ready, we can see that the spec.forProvider.desiredCapacity is not being late-initialized. We can check its value in status.atProvider.desiredCapacity. This allows the desiredSize to be managed by an external controller (autoscaler).

@lsviben lsviben force-pushed the granular_management_policies branch from d196d36 to ab78f63 Compare June 20, 2023 09:30
@lsviben lsviben requested a review from turkenh June 20, 2023 09:31
apis/common/v1/policies.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/meta/meta.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
@lsviben
Copy link
Contributor Author

lsviben commented Jun 29, 2023

Discussed with @turkenh to add an allowlist of supported combinations of ManagementPolicies and check that in the reconciler.

What we though about is initially supporting:

Default:

managememntPolicy: ["*"]

All - alternative to default

managememntPolicy: ["Observe", "Create", "Update", "LateInitialize", "Delete"]

Observe Only

managememntPolicy: ["Observe"] 

Pause

managememntPolicy: [""]

No late init

managememntPolicy: ["Observe", "Create", "Update", "Delete"]

No late init, orphan

managememntPolicy: ["Observe", "Create", "Update"]

Orphan

managememntPolicy: ["Observe", "Create", "Update", "LateInitialize"]

These are some combinations that we thought make sense now. Possibly there are other combinations that we could add to the initial list.

One issue with this is that if we want to add a combination, we need to do a release of crossplane-runtime, and then the providers need to be updated and released.

But still, it gives a clear picture of what kind of managed policy behaviors we are supporting, and can say that we tested and know they work.

@lsviben lsviben marked this pull request as ready for review June 29, 2023 21:17
@lsviben lsviben requested review from a team as code owners June 29, 2023 21:17
@lsviben lsviben requested a review from ytsarev June 29, 2023 21:17
@lsviben
Copy link
Contributor Author

lsviben commented Jun 30, 2023

I have tested migrating from the previous managmenetPolicy: FullControl/OrphanOnDelete/Observe to the new managementPolicy: ["*"].

For some reason (also based on some early testing) I expected that after applying the new CRD with the updated OpenAPI spec, that the old policy will get replaced by the default new managementPolicy (spec.managmementPolicy: [*]). Well I was wrong :) .

What happens is that the old managementPolicy doesnt get removed and starting the providers fails because of the unexpected value in the field.

If I dont find an easy and elegant solution for that I am considering simply renaming the new spec.managementPolicy to spec.managmentPolicies. This will remove the old field and default the new one when the new CRD is applied

cc @turkenh

@chlunde
Copy link

chlunde commented Jun 30, 2023

Nice! I am really looking forward to this! 👍 A couple of late comments, I missed the proposal.

If limiting policy combinations, consider supporting immutable resources ["Create", "Delete"]

Discussed with @turkenh to add an allowlist of supported combinations of ManagementPolicies and check that in the reconciler.

What we though about is initially supporting:
...

Immutable resources (like Kubernetes Job) would work better if we have support for the following combinations:

managementPolicy: ["Create", "Delete"]
managementPolicy: ["Create", "Observe", "Delete"]

I don't see any technical reason why "LateInitialize" could not be part of it, which makes four meaningful, valid policies for immutable resources alone.

(Have you written what combinations you worry about supporting and what issues you expect? With the combinations mentioned above allowed, I wonder what is not there 😄 )


Keeping LateInitialize

Are there any good use cases for LateInitialize when Observe exists?

As a user, I think LateInitialize might have two paths forward:

  • Disable and deprecate it, as it is a major source of issues. Just keep it around for legacy use cases for a while.
  • A mode where LateInitialize keeps updating the field, by using managed fields and checking that the provider is the manager of that specific field.
    Am I missing some use cases where it is helpful?

initProvider name

It might make sense to consider the name "initProvider" together with deletion options, to see if there's a way to make it intuitive that these are API attributes specific to create and delete.

I know of three kinds of additional deletion behaviors:

I wonder if the design should consider this to ensure the API is orthogonal and consistent in naming.


Use of array - usability in compositions?

I wonder how well this will work with compositions, if you want to have one or more feature flags in a definition to allow the user to control if a resource should be orphaned, for example.

For example, map transforms do not work as they output single strings. This might require a new transform like "add to set", or a map transform returning a list. Have you tested a use case like this end-to-end?


I hope the tone of this comment does not look negative, as I really like this proposal.

@lsviben
Copy link
Contributor Author

lsviben commented Jul 3, 2023

Thanks for the feedback @chlunde , its really appreciated! 👍

Nice! I am really looking forward to this! 👍 A couple of late comments, I missed the proposal.

If limiting policy combinations, consider supporting immutable resources ["Create", "Delete"]

Discussed with @turkenh to add an allowlist of supported combinations of ManagementPolicies and check that in the reconciler.
What we though about is initially supporting:
...

Immutable resources (like Kubernetes Job) would work better if we have support for the following combinations:

managementPolicy: ["Create", "Delete"]
managementPolicy: ["Create", "Observe", "Delete"]

I don't see any technical reason why "LateInitialize" could not be part of it, which makes four meaningful, valid policies for immutable resources alone.

(Have you written what combinations you worry about supporting and what issues you expect? With the combinations mentioned above allowed, I wonder what is not there smile )

Thanks for the suggestion. The idea is to support valid use cases, and to have confidence that it works properly.
We could add the "immutable" use cases, seams reasonable to me.

One combination I am not sure if its neccessary:

managementPolicy: ["Create", "Delete"]

as I dont see a reason why not to add the Observe. Having combinations with LateInitialize and Delete make sense.

We dont have to add all valid combinations right away, until we see a use case for it. As this is an alpha feature, IMO its better to start small, and iterate through discovering use cases we want to solve. There could be valid combinations as well which are omitting "Create" for importing cases, but while its not needed, I am hesitant to add it right away and increase the load of what we are supporting.

Of course there are also combinations which dont make sense like ["LateInitialize"].

Keeping LateInitialize

Are there any good use cases for LateInitialize when Observe exists?

As a user, I think LateInitialize might have two paths forward:

  • Disable and deprecate it, as it is a major source of issues. Just keep it around for legacy use cases for a while.
  • A mode where LateInitialize keeps updating the field, by using managed fields and checking that the provider is the manager of that specific field.
    Am I missing some use cases where it is helpful?

TBH, I also see that going forward it may be that Late initialization gets deprecated. When I started to work on this issue, I also thought about deprecating/removing it alltogather, but as its a feature that has been around for a while, people are already using late initialized fields in patches and such. As we now have status.atProvider I think late initialization is losing its usefulness and we could rethink if its needed. I may be also missing some other use cases where it is helpful.

initProvider name

It might make sense to consider the name "initProvider" together with deletion options, to see if there's a way to make it intuitive that these are API attributes specific to create and delete.

I know of three kinds of additional deletion behaviors:

I wonder if the design should consider this to ensure the API is orthogonal and consistent in naming.

Ehh, naming is hard. For me, initProvider makes sense as it has a connotation to forProvider which its complementing and the init in the name relates to me that is just used at the start of the resources lifecycle ( create). I don't have a better idea right now, createProvider or something like that does'nt sound so good to me.

Do you have some suggestion for the naming?

Use of array - usability in compositions?

I wonder how well this will work with compositions, if you want to have one or more feature flags in a definition to allow the user to control if a resource should be orphaned, for example.

For example, map transforms do not work as they output single strings. This might require a new transform like "add to set", or a map transform returning a list. Have you tested a use case like this end-to-end?

I see your point, it would be useful to have a map transform with a list for this case, not only for the management policies, but in general for arrays. I haven't tested this in a composition yet btw.

@lsviben
Copy link
Contributor Author

lsviben commented Jul 10, 2023

I have tested migrating from the previous managmenetPolicy: FullControl/OrphanOnDelete/Observe to the new managementPolicy: ["*"].

For some reason (also based on some early testing) I expected that after applying the new CRD with the updated OpenAPI spec, that the old policy will get replaced by the default new managementPolicy (spec.managmementPolicy: [*]). Well I was wrong :) .

What happens is that the old managementPolicy doesnt get removed and starting the providers fails because of the unexpected value in the field.

If I dont find an easy and elegant solution for that I am considering simply renaming the new spec.managementPolicy to spec.managmentPolicies. This will remove the old field and default the new one when the new CRD is applied

cc @turkenh

Changed the field name from spec.managmentPolicy to spec.managementPolicies. When the newly generated CRD is applied, the following will happen:

Old resource:

spec:
  managmentPolicy: FullControl

apply new CRD:

spec:
  managmentPolicies: ["*"]

If we keep the name the same, what would happen is that the value will stay the same, even though the CRD spec changed:

Applying a resource with the old management policies CRD:

...
spec:
  managementPolicy: FullControl

applying the new CRD withouth changing the mane:

...
spec:
  managementPolicy: FullControl

It stays the same even though the CRD spec changed. I tried triggering an update or something to see if it gets changed, but the edit doesnt even pass because of the validation failing (it now expects an array instead of a string). Of course new resources are created correctly.

Decided that its better to go with the name change as it achieves the effect we are looking for.

Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lsviben Apologies, super quick/shallow review because I didn't get around to this until the end of my day and I have to run. I have another reminder to look again tomorrow.

pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
return false
}
return ContainsOnlyManagementAction(policy, xpv1.ManagementActionObserve)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these functions take a policy and whether management policies are enabled as arguments. Would it make sense to introduce a ManagementPolicyChecker type? Might result in more succinct calls, and less refactoring when managementPolicyEnabled goes away (i.e. when its GA).

I'm thinking something like:

type ManagementPolicyChecker interface {
	ShouldUpdate() bool
	ShouldLateInit() bool
	IsObserveOnly() bool  // Could this be phrased in some kind of ShouldFoo way?
}

You could then have one implementation that had both enabled and policy, e.g.:

// This is a very dumb name.
type PolicyBasedManagementPolicyChecker struct {
	policy xpv1.ManagementPolicy
	enabled bool
}

Or one that embeds just xpv1.ManagementPolicy and a different implementation that is used whenever management policies aren't actually enabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

The interface could be extended with IsNonDefault and IsSupported as well.

Copy link
Member

@negz negz Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lsviben Do you think you'll add IsNonDefault and IsSupported to this as well? I like @turkenh's suggestion - it would be nice for this type to be a one-stop-shop for all things management-policy-checking.

If we wanted to get reaaaally fancy it would be pretty clean to have a type that essentially loaded up all the context we need to make branching decisions (e.g management policies, deletion policy, paused annotation, etc etc). Something like:

policy, err := NewManagementPolicyResolver(managed, managementPoliciesEnabled)
if err != nil {
    // NewManagementPolicyResolver returns an error if you're doing anything "wrong",
    //i.e. if you're using an unsupported management policy, or if you're trying to use
    // management policies but the feature flag isn't enabled.
}

type ManagementPolicyResolver interface {
    ShouldPause() bool
    ShouldOnlyObserve() bool

    ShouldCreate() bool
    ShouldDelete() bool  // Equivalent to !shouldOrphan
    ShouldLateInit() bool
    ShouldUpdate() bool
}

All the ShouldFoo functions don't feel quite as elegant as the nice Action and OnlyAction methods you have that take a policy as arguments, but I think I still prefer them because they let us push all of the context needed to make "should I do this thing" checks (not only management policies) into the policy resolver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i can add something like that, also like the Resolver name. Also like how it hides the checks for supported and the sanity check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added something similar:

// ManagementPoliciesChecker is used to perform checks on management policies
// to determine specific actions are allowed, or if they are the only allowed
// action.
type ManagementPoliciesChecker interface {
	// IsSupported returns true if the management policy is supported.
	IsSupported() bool
	// IsDisabledNonDefault returns true if the management policy is set to a non-default
	// value, and the management policy feature is disabled. Used as a sanity
	// check.
	IsDisabledNonDefault() bool
	// IsPaused returns true if the resource is paused based
	// on the management policy.
	IsPaused() bool

	// ShouldOnlyObserve returns true if only the Observe action is allowed.
	ShouldOnlyObserve() bool
	// ShouldCreate returns true if the Create action is allowed.
	ShouldCreate() bool
	// ShouldLateInitialize returns true if the LateInitialize action is
	// allowed.
	ShouldLateInitialize() bool
	// ShouldUpdate returns true if the Update action is allowed.
	ShouldUpdate() bool
	// ShouldDelete returns true if the Delete action is allowed.
	ShouldDelete() bool
}

Added the IsPaused to it, but in the end this IsPaused just checks if its paused based on the management policies, and I left the meta.IsPaused to check paused condition based on annotations. Felt wrong to mix annotations in the ManagementPolicies checking tool, so i left them separate.

Checking IsDisabledNonDefault and IsSupported through calls instead of an error in NewManagement... due to:

  • if we want to have IsPaused there, we need to initialize it before it, so to avoid changing the order of checks from IsPaused->IsDisabledNonDefault->IsSupported to IsDisabledNonDefault->IsSupported->IsPaused
  • simpler for testing

Just noticed that I didnt change naming, ill check that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt wrong to mix annotations in the ManagementPolicies checking tool, so i left them separate.

FWIW despite using the name "management policy checker" in my examples I was thinking more of an "all the policies" checker as a single place to calculate the policy based on all the policy inputs, which right now I think are:

  • Paused annotation
  • Deletion policy
  • Management policies

Though to be fair the first two will go away eventually at which point this would purely be checking management policies. I see your point WRT ordering of the paused/disabled/supported calls though. Don't feel super strongly on this one.

Another thought: Could we merge IsDisabledNonDefault and IsSupported into a Validate() error method that returns an error indicating what's wrong with the policy config (i.e. its set despite the flag being enabled, or its set to an invalid value)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged the IsDisabledNonDefault and IsSupported to Validate() error, nice suggestion 👍

For adding the pause annotation, I left it as is for now. If you/we feel that we should merge these, we can do it in some next PR, I dont have a strong feeling about it, for me what it is now is ok.

apis/common/v1/policies.go Outdated Show resolved Hide resolved
// not updated when the managed resource is updated, the external resource
// is not deleted when the managed resource is deleted and the
// spec.forProvider is not late initialized.
sets.New[xpv1.ManagementAction](xpv1.ManagementActionObserve, xpv1.ManagementActionCreate),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add Observe + Update ? crossplane-contrib/provider-kubernetes#115

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for updating these comments

I'd prefer to avoid this giant set (no pun intended) of package scoped state. Perhaps we could make it a member of ManagementPolicyActionChecker and move its declaration into the NewManagementPolicyActionChecker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved the giant set into an func returning default supported policies and made it injectable for the ManagedPoliciesResolver, so that its not package-scoped anymore and we can still easily change it for tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't want to add Observe + Update for now but enable a way for individual providers to extend the supported list? When I check the code, this doesn't seem to be the case though.

Copy link
Contributor Author

@lsviben lsviben Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I want to add Observe + Update but didnt yet have a chance to add and test it first. Maybe it would be better to add them in a next PR, so that this can be merged?

I havent thought about allowing providers to set their own supported policies, now they are injected just in the resovler which actually just helps with tests and not having the supported list package scoped. But I like the idea, we can inject the list into the Reconciler, so that the providers can change it easier if needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@turkenh now the supported policies can be injected into the Reconciler, so if some provider devs would like to test something new or have their own set, its possible 🎉

apis/common/v1/resource.go Outdated Show resolved Hide resolved
return false
}
return ContainsOnlyManagementAction(policy, xpv1.ManagementActionObserve)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

The interface could be extended with IsNonDefault and IsSupported as well.

pkg/meta/meta.go Outdated Show resolved Hide resolved
// not updated when the managed resource is updated, the external resource
// is not deleted when the managed resource is deleted and the
// spec.forProvider is not late initialized.
sets.New[xpv1.ManagementAction](xpv1.ManagementActionObserve, xpv1.ManagementActionCreate),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks for updating these comments

I'd prefer to avoid this giant set (no pun intended) of package scoped state. Perhaps we could make it a member of ManagementPolicyActionChecker and move its declaration into the NewManagementPolicyActionChecker?

pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
Copy link
Member

@negz negz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I'd like @turkenh to approve too before we merge.

I left a few comments for possible improvements, but I think we're into optimizations at this point so feel free to take them or leave them.

pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
return false
}
return ContainsOnlyManagementAction(policy, xpv1.ManagementActionObserve)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt wrong to mix annotations in the ManagementPolicies checking tool, so i left them separate.

FWIW despite using the name "management policy checker" in my examples I was thinking more of an "all the policies" checker as a single place to calculate the policy based on all the policy inputs, which right now I think are:

  • Paused annotation
  • Deletion policy
  • Management policies

Though to be fair the first two will go away eventually at which point this would purely be checking management policies. I see your point WRT ordering of the paused/disabled/supported calls though. Don't feel super strongly on this one.

Another thought: Could we merge IsDisabledNonDefault and IsSupported into a Validate() error method that returns an error indicating what's wrong with the policy config (i.e. its set despite the flag being enabled, or its set to an invalid value)?

Copy link
Member

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, great work @lsviben 💪

pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
// not updated when the managed resource is updated, the external resource
// is not deleted when the managed resource is deleted and the
// spec.forProvider is not late initialized.
sets.New[xpv1.ManagementAction](xpv1.ManagementActionObserve, xpv1.ManagementActionCreate),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't want to add Observe + Update for now but enable a way for individual providers to extend the supported list? When I check the code, this doesn't seem to be the case though.

pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
pkg/reconciler/managed/reconciler.go Outdated Show resolved Hide resolved
Signed-off-by: lsviben <sviben.lovro@gmail.com>
@rbrunan
Copy link

rbrunan commented Jul 14, 2023

Hey @lsviben, thank you for your work.
I've been testing this PR in our environment and unfortunately, it is not working as expected or I'm doing something wrong.
The test that we did is related to this example in the design doc.
The example seems wrong to me since the .spec.scalingConfig content is an array, not a map. When I apply that in a composition I receive the following error:

composed resource "eks-nodepool": cannot apply the patch at index 12: spec.forProvider.scalingConfig is not an array

Changing that to:

        spec:
          managementPolicy: ["Create", "Update", "Delete", "Observe"]
          initProvider:
            scalingConfig:
              - desiredSize: 1
          forProvider:
            scalingConfig:
              - minSize: 1 # <spec.parameters.scalingConfig.minSize>
                maxSize: 20 # <spec.parameters.scalingConfig.maxSize>

Throws another error:

cannot compose resources: cannot apply composed resource: cannot create object: NodeGroup.eks.aws.upbound.io "aws-dev-eu-west-1-test-cas-eks-np-mgmt-a-7jdtj" is invalid: spec.forProvider.scalingConfig.desiredSize: Required value

Finally, If I remove all entries for .spec.forProvider.scalingConfig and I use only the .spec.initProvider.scalingConfig with fixed values, the MR is generated but without the scaling config block:

cannot run refresh: refresh failed: Insufficient scaling_config blocks: At least 1 "scaling_config" blocks are required.
So my understanding is that the initiProvider is not yet implemented in that PR.

Am I doing something wrong or the initProvider is not passing the parameters to the MR?

@rbrunan
Copy link

rbrunan commented Jul 14, 2023

When the resource is created and ready, we can see that the spec.forProvider.desiredSize is not being late-initialized. We can check its value in status.atProvider.desiredSize. This allows the desiredSize to be managed by an external controller (autoscaler).

Note that the parameter in the Autoscaling group is called desiredCapacity the desiredSize one is for the NodeGroup resource.

@lsviben
Copy link
Contributor Author

lsviben commented Jul 14, 2023

Hey, @rbrunan, thanks for the feedback.

This PR does not introduce the initProvider functionality, just the basic Granular Management Polices, you can check the examples in the description of the this PR.

I am currently working on introducing initProvider to upjet, and upjet based providers (upbound provider-aws/gcp/azure).

As for the example in the design doc, sorry for that, I handwrote them and probably missed the map/array thing.

So for now, only examples which do not use initProviders should work, hope that makes it clearer.

In practice that means that if we want to ignore fields which are required we will need to wait for initProviders, optional fields whoose problem was only that they were being late-initialized into spec.forProvider should be handleable with this PR.

@rbrunan
Copy link

rbrunan commented Jul 14, 2023

Hey, @rbrunan, thanks for the feedback.

This PR does not introduce the initProvider functionality, just the basic Granular Management Polices, you can check the examples in the description of the this PR.

I am currently working on introducing initProvider to upjet, and upjet based providers (upbound provider-aws/gcp/azure).

As for the example in the design doc, sorry for that, I handwrote them and probably missed the map/array thing. I see also that an old versuion of managementPolicy: PartialControl is there instead of managementPolicy:["Observe", "Create", "Update", "Delete"]

So for now, only examples which do not use initProviders should work, hope that makes it clearer

Thank you, now everything makes sense.

@turkenh turkenh merged commit bb0d34b into crossplane:master Jul 14, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Granular management policies
5 participants