Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(csv): install owned APIServices #492

Merged

Conversation

njhale
Copy link
Member

@njhale njhale commented Oct 1, 2018

Description

Extends the CSV control-loop to install owned APIServices. This includes generating and injecting associated serving certificates as volume mounted Secrets in matching Deployments, creating/updating a Service to point to the pods of said Deployments, and creating/updating APIServices with the CA bundle of the signing CA.

Note:

This is a first attempt at something that creates all the required resources to establish trust with the OLM managed extension api-server. Managing cert expiration will be the next step. At this time tests still need to be written for this.

Related to ALM-657

@openshift-ci-robot openshift-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Oct 1, 2018
@njhale njhale changed the title feat(csv): install owned APIServices WIP: feat(csv): install owned APIServices Oct 1, 2018
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 1, 2018
@njhale njhale changed the title WIP: feat(csv): install owned APIServices [WIP] feat(csv): install owned APIServices Oct 1, 2018
@njhale njhale force-pushed the manage-apiservices branch 2 times, most recently from f8e7608 to 2110f0b Compare October 2, 2018 16:04
@openshift-ci-robot openshift-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 2, 2018
@njhale njhale force-pushed the manage-apiservices branch 4 times, most recently from 3e8d3c1 to a563272 Compare October 3, 2018 22:31
Copy link
Member

@ecordell ecordell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great overall!

I noticed this in the service-catalog deploy artifacts:

        # apiserver gets the ability to read authentication. This allows it to
        # read the specific configmap that has the requestheader-* entries to
        # enable api aggregation
        - apiVersion:
          kind: RoleBinding
          metadata:
            name: "servicecatalog.k8s.io:apiserver-authentication-reader"
            namespace: kube-system
          roleRef:
            apiGroup: rbac.authorization.k8s.io
            kind: Role
            name: extension-apiserver-authentication-reader
          subjects:
          - apiGroup: ""
            kind: ServiceAccount
            name: "service-catalog-apiserver"
            namespace: "default"

I think we may want to generate these as well. This one is tricky since it's a binding to a Role in kube-system (which means you can't write any equivalent clusterPermission or permission to get it).

We may want to go ahead and generate the binding to auth-delegator as well. The e2e tests would miss both of these things since they don't delete the package apiserver's rbac roles.

@@ -189,15 +189,15 @@ spec:
items:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should include the "deploymentName" field as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - thanks for catching that.

Version string `json:"version"`
Kind string `json:"kind"`
DeploymentName string `json:"deploymentName,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should accept a container as well? deploymentName: packages-apiserver/pkg-container

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, what would the value add be, further verification of the APIService Deployment?

@ecordell
Copy link
Member

ecordell commented Oct 4, 2018

Just hit an interesting issue: if the port used to expose the apiserver isn't 443, we get stuck in installing (but no info about why).

We should probably either have a way to specify deployment/container:port for an apiserver, or detect when the port we expect (443) isn't exposed on the pod that's designated as an apiserver.

@ecordell
Copy link
Member

ecordell commented Oct 4, 2018

Also, thinking about the volume issue - what if we just had a standard volume name (like apiserver-certs) and then if there's no volumeMount for that volume, we add the "standard" one, otherwise we use the one that's there. That lets you take an existing apiserver (that's looking in a particular location for certs) and package it, without adding extra config.

@njhale njhale force-pushed the manage-apiservices branch 4 times, most recently from 8cb0678 to 471008e Compare October 4, 2018 20:32
@njhale
Copy link
Member Author

njhale commented Oct 4, 2018

Just hit an interesting issue: if the port used to expose the apiserver isn't 443, we get stuck in installing (but no info about why).

We should probably either have a way to specify deployment/container:port for an apiserver, or detect when the port we expect (443) isn't exposed on the pod that's designated as an apiserver.

@ecordell

So PodSpec.Containers[].Ports[] is optional, and the omission of a container port doesn't preclude it from being exposed to the network. So I'm thinking that we just have an optional apiServiceDescriptions[].containerPort field that overrides a default targetPort of 443 on the generated Service.

We could also add a liveness probe on the for the given or default port that enforces that a socket can be opened against it.

@njhale njhale force-pushed the manage-apiservices branch 3 times, most recently from ec0b9de to 0136a21 Compare October 5, 2018 11:45
@njhale njhale changed the title [WIP] feat(csv): install owned APIServices feat(csv): install owned APIServices Oct 5, 2018
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2018
@njhale njhale force-pushed the manage-apiservices branch from 0136a21 to 0b1e97c Compare October 5, 2018 12:55
@njhale
Copy link
Member Author

njhale commented Oct 5, 2018

/assign @alecmerdler

@njhale njhale closed this Oct 5, 2018
@njhale njhale force-pushed the manage-apiservices branch 2 times, most recently from 82d2486 to 2400b79 Compare October 5, 2018 21:27
// installcheck determined we can't progress (e.g. deployment failed to come up in time)
if install.IsErrorUnrecoverable(strategyErr) {
csv.SetPhase(v1alpha1.CSVPhaseFailed, v1alpha1.CSVReasonInstallCheckFailed, fmt.Sprintf("install failed: %s", strategyErr))
return strategyErr
}

// if there's an error checking install that shouldn't fail the strategy, requeue with message
if apiServiceErr != nil {
csv.SetPhase(v1alpha1.CSVPhaseInstalling, requeueConditionReason, fmt.Sprintf("APIServices installing: %s", apiServiceErr))
a.requeueCSV(csv.GetName(), csv.GetNamespace())
Copy link
Member Author

@njhale njhale Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is the best way to ensure APIService GVK discovery is immediately detected. Watching APIServices only tells us when its status changes to available, but not when the expected GVKs are discoverable (which can happen much later). I expect this to swamp the CSV workqueue.

@ecordell Maybe we could requeue an APIService when it's detected that the owner CSV's expected GVKs are not discoverable in the syncAPIServices handler. Then, once all expected GVKs are discoverable, the syncAPIServices handler can requeue the owner CSV. This approach should at least shift the pressure from the CSV workqueue to the APIService workqueue. This logic could also be placed in a time limited goroutine, which could potentially keep workqueue pressure down all together.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now. It would be nice to watch discovery as an optimization, but this should work fine. And you don't need to worry too much about swamping the queue, requeueCSV obeys the rate limiting on the backing queue.

I say we add just an issue to our backlog for the optimization 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if podTemplateName == depSpec.Template.GetName() {
return nil, fmt.Errorf("a name collision occured when generating name for PodTemplate")
}
depSpec.Template.SetName(podTemplateName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ensures that the extension api-server is actually rolled out in the event of a secret change. Triggering rollouts this way allows us to keep a static secret name.

// installcheck determined we can't progress (e.g. deployment failed to come up in time)
if install.IsErrorUnrecoverable(strategyErr) {
csv.SetPhase(v1alpha1.CSVPhaseFailed, v1alpha1.CSVReasonInstallCheckFailed, fmt.Sprintf("install failed: %s", strategyErr))
return strategyErr
}

// if there's an error checking install that shouldn't fail the strategy, requeue with message
if apiServiceErr != nil {
csv.SetPhase(v1alpha1.CSVPhaseInstalling, requeueConditionReason, fmt.Sprintf("APIServices installing: %s", apiServiceErr))
a.requeueCSV(csv.GetName(), csv.GetNamespace())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now. It would be nice to watch discovery as an optimization, but this should work fine. And you don't need to worry too much about swamping the queue, requeueCSV obeys the rate limiting on the backing queue.

I say we add just an issue to our backlog for the optimization 😄

@@ -1211,6 +1221,7 @@
"k8s.io/kube-aggregator/pkg/apis/apiregistration/v1",
"k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset",
"k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset/fake",
"k8s.io/kube-aggregator/pkg/client/informers/externalversions",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice job not pulling in kubernetes/kubernetes 😄

version:
type: string
description: The version field of the APIService
kind:
type: string
description: The kind field of the APIService
deploymentName:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering suggesting a selector here instead so that you don't have to point to an entire deployment and better match Service, but I think it's reasonable to require a deployment so that we can modify it to include the volume (using a selector to modify the pods would mean the deployment controller would fight us for changes).


// GetRequiredAPIServiceDescriptions returns a deduplicated set of required APIServiceDescriptions
// with the intersection of required and owned removed
// Equivalent to the set subtraction required - owned
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, but I think we should just prevent people from creating CSVs that have duplicate entries in owned and required. fine for now, but let's keep it in mind for the next apiversion.

}
if ownerutil.IsOwnedByKind(apiService, v1alpha1.ClusterServiceVersionKind) {
oref := ownerutil.GetOwnerByKind(apiService, v1alpha1.ClusterServiceVersionKind)
log.Infof("APIService %s change requeuing CSV %s", apiService.GetName(), oref.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug?

@njhale njhale force-pushed the manage-apiservices branch from 2400b79 to 7931dd8 Compare October 8, 2018 12:53
@@ -158,18 +158,18 @@ spec:
serviceAccount: prometheus-operator-0-22-2
containers:
- name: prometheus-operator
image: quay.io/coreos/prometheus-operator@sha256:3daa69a8c6c2f1d35dcf1fe48a7cd8b230e55f5229a1ded438f687debade5bcf
image: quay.io/coreos/prometheus-operator:master
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remove the changes from this file? this was just for testing out the operatorgroup in that combined branch for the demo, it's not something we want to change just yet

@njhale njhale force-pushed the manage-apiservices branch from c6a09d1 to 36286b4 Compare October 8, 2018 17:55
@njhale njhale force-pushed the manage-apiservices branch from 0a027ae to bfe9524 Compare October 8, 2018 20:12
@ecordell
Copy link
Member

ecordell commented Oct 8, 2018

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 8, 2018
@njhale
Copy link
Member Author

njhale commented Oct 8, 2018

/approve

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: njhale

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 8, 2018
@openshift-merge-robot openshift-merge-robot merged commit 9e4b493 into operator-framework:master Oct 8, 2018
ecordell pushed a commit to ecordell/operator-lifecycle-manager that referenced this pull request Mar 8, 2019
@njhale njhale added the kind/bug Categorizes issue or PR as related to a bug. label Mar 19, 2019
@njhale njhale deleted the manage-apiservices branch September 30, 2019 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants