Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add static resource controller #185

Closed

Conversation

sjenning
Copy link
Contributor

Begins implementing openshift/enhancements#260

This controller deploys the static and templated resources required for the AWS pod identity webhook.

The image pull spec is templated into the Deployment using the envvar injected into the operator by the CVO and the service-ca is templated into the MutatingWebhookConfiguration.

The controller reconciles the assets if they are changed and updates the MutatingWebhookConfiguration and does a forced rollout of the Deployment if the service-ca rotates.

/cc @dgoodwin @joelddiaz @derekwaynecarr @marun @deads2k

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sjenning
To complete the pull request process, please assign abutcher
You can assign the PR to them by writing /assign @abutcher in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sjenning
Copy link
Contributor Author

Still have some RBAC work to do.
/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 23, 2020
@sjenning sjenning force-pushed the add-pod-identity-assets-controller branch from 5bd7c8a to 6626553 Compare April 23, 2020 18:52
@sjenning
Copy link
Contributor Author

/retest

@marun
Copy link

marun commented Apr 24, 2020

@sjenning Is the service-ca related to the service-ca-operator or something else?

pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
manifests/03-deployment.yaml Show resolved Hide resolved
bindata/v4.1.0/aws-pod-identity-webhook/deployment.yaml Outdated Show resolved Hide resolved
bindata/v4.1.0/aws-pod-identity-webhook/deployment.yaml Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
pkg/operator/staticresource/controller.go Outdated Show resolved Hide resolved
@sjenning
Copy link
Contributor Author

@marun yes, the service-ca-signer is the CA that signs Services that have the service.beta.openshift.io/serving-cert-secret-name annotation. It creates a cert/key and injects a secret so that TLS application can be accessed by name on the service network.

@sjenning
Copy link
Contributor Author

/hold cancel

no point since master is locked

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 27, 2020
@sjenning
Copy link
Contributor Author

sjenning commented Apr 27, 2020

Updates:

  • use imagePullPolicy: Always
  • use service.beta.openshift.io/inject-cabundle to inject service-ca
  • create a namespace-local non-manager informer cache so the controller can do a very narrow watch and we can use Role rather than ClusterRole
  • awsPodIdentityController now retries the reconcilation in a loop every 10s before starting the namespace-local informer cache
  • replaced log.WithFields(log.Fields{"controller": controllerName}) with r.logger
  • removed Generation tracking for MutatingWebhookConfiguration
  • requeue with 10s delay on failed reconcilation
  • rename package from staticresource to awspodidentity
  • restored the control-plane: controller-manager and controller-tools.k8s.io: "1.0" labels on the deployment as removal caused upgrades to fail (Deployment matchLabels are immutable)
  • add tolerations and nodeSelector to pod-identity-webhook Deployment so it can schedule on masters

@sjenning sjenning force-pushed the add-pod-identity-assets-controller branch from aca1181 to 7616916 Compare April 27, 2020 17:27
}

// ApplyMutatingWebhookConfiguration merges objectmeta, does not worry about anything else
func ApplyMutatingWebhookConfiguration(client admissionregistrationclientv1beta1.MutatingWebhookConfigurationsGetter, recorder events.Recorder, required *admissionregistrationv1beta1.MutatingWebhookConfiguration) (*admissionregistrationv1beta1.MutatingWebhookConfiguration, bool, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, modification of the webhook config spec (e.g the url) by other actors will not prompt action by the operator. Is this intentional?

Copy link
Contributor Author

@sjenning sjenning Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is true. It currently just ensures that the resource exists basically.

Since an outside controller injects the service-ca, we really can't use Generation as Deployments do to detect changes as external changes are expected.

I would like to check more here. I'll add a TODO.

Copy link

@marun marun Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the documentation for Apply{DaemonSet,Deployment}:

https://github.com/openshift/library-go/blob/master/pkg/operator/resource/resourceapply/apps.go#L21

I think it makes sense to set the input spec hash as an annotation to enable detection of changes across operator releases.

Regarding the use of generation to detect changes made by other actors, you're right that it can't be used in the same manner as other apply methods due to the CA field being managed by service ca. Since this behavior depends on the service ca injection annotation, it should be possible to detect when to retain the CA field of the existing resource to avoid racing. At worst this implies a second operator-initiated update when the CA is changed by service ca (to pick up the generation). I think that's a small price to pay to ensure that the operator can maintain the intended spec.

cc: @deads2k

@sjenning
Copy link
Contributor Author

trying to explain why the AWS_POD_IDENTITY_WEBHOOK_IMAGE env var doesn't seem to be set sometimes

38m         Normal    ServiceAccountCreated                 deployment/cloud-credential-operator              Created ServiceAccount/pod-identity-webhook -n openshift-cloud-credential-operator because it was missing
38m         Normal    ClusterRoleCreated                    deployment/cloud-credential-operator              Created ClusterRole.rbac.authorization.k8s.io/pod-identity-webhook because it was missing
38m         Normal    RoleCreated                           deployment/cloud-credential-operator              Created Role.rbac.authorization.k8s.io/pod-identity-webhook -n openshift-cloud-credential-operator because it was missing
38m         Normal    ClusterRoleBindingCreated             deployment/cloud-credential-operator              Created ClusterRoleBinding.rbac.authorization.k8s.io/pod-identity-webhook because it was missing
38m         Normal    RoleBindingCreated                    deployment/cloud-credential-operator              Created RoleBinding.rbac.authorization.k8s.io/pod-identity-webhook -n openshift-cloud-credential-operator because it was missing
38m         Normal    ServiceCreated                        deployment/cloud-credential-operator              Created Service/pod-identity-webhook -n openshift-cloud-credential-operator because it was missing
35m         Warning   DeploymentCreateFailed                deployment/cloud-credential-operator              Failed to create Deployment.apps/pod-identity-webhook -n openshift-cloud-credential-operator: Deployment.apps "pod-identity-webhook" is invalid: spec.template.spec.containers[0].image: Required value
30m         Warning   DeploymentCreateFailed                deployment/cloud-credential-operator              Failed to create Deployment.apps/pod-identity-webhook -n openshift-cloud-credential-operator: Deployment.apps "pod-identity-webhook" is invalid: spec.template.spec.containers[0].image: Required value
26m         Warning   DeploymentCreateFailed                deployment/cloud-credential-operator              Failed to create Deployment.apps/pod-identity-webhook -n openshift-cloud-credential-operator: Deployment.apps "pod-identity-webhook" is invalid: spec.template.spec.containers[0].image: Required value
25m         Normal    DeploymentCreated                     deployment/cloud-credential-operator              Created Deployment.apps/pod-identity-webhook -n openshift-cloud-credential-operator because it was missing
25m         Normal    MutatingWebhookConfigurationCreated   deployment/cloud-credential-operator              Created MutatingWebhookConfiguration.admissionregistration.k8s.io/pod-identity-webhook because it was missing

@sjenning sjenning force-pushed the add-pod-identity-assets-controller branch from 7616916 to 8055a27 Compare April 27, 2020 18:38
@sjenning
Copy link
Contributor Author

Weird, installed a new cluster and now I'm not seeing it. Added a proper fatal error to capture it though.

@sjenning sjenning force-pushed the add-pod-identity-assets-controller branch from 8055a27 to d097e71 Compare April 27, 2020 21:29
@sjenning
Copy link
Contributor Author

flake https://bugzilla.redhat.com/show_bug.cgi?id=1817588

/test e2e-aws-upgrade

@sjenning
Copy link
Contributor Author

ok, all tests are green. I've deployed a cluster with the release image from this PR and everything looks good. I'm going to lock this in and move on to adding the serviceAccountRef to the CRD.

err := r.ReconcileResources()
if err != nil {
r.logger.Errorf("reconciliation failed, retrying in %s", retryInterval.String())
return reconcile.Result{RequeueAfter: retryInterval}, err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a static retry schedule instead of using the built-in exponential back-off?

requestedDeployment := resourceread.ReadDeploymentV1OrDie(v410_00_assets.MustAsset("v4.1.0/aws-pod-identity-webhook/deployment.yaml"))
requestedDeployment.Spec.Template.Spec.Containers[0].Image = r.imagePullSpec
resultDeployment, modified, err := resourceapply.ApplyDeployment(r.clientset.AppsV1(), r.eventRecorder, requestedDeployment, r.deploymentGeneration, false)
r.deploymentGeneration = resultDeployment.Generation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we read and store the Generation field before checking whether an error ocorred?

@sjenning
Copy link
Contributor Author

sjenning commented May 4, 2020

changes carried on in #187

And linking back to unaddressed review #187 (comment)

@sjenning sjenning closed this May 4, 2020
@sjenning
Copy link
Contributor Author

reopening as the new head of current development and to get a CI release image with these changes

@sjenning sjenning reopened this May 19, 2020
@sjenning
Copy link
Contributor Author

/test e2e-aws

@sjenning
Copy link
Contributor Author

these CI release images get reclaimed quickly!

/test e2e-aws

@openshift-ci-robot
Copy link
Contributor

@sjenning: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws d097e71 link /test e2e-aws

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@sjenning sjenning closed this Jun 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants