Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEVEXP-403: Mount cluster trusted CA to image registry operator #340

Merged
merged 1 commit into from
Aug 20, 2019

Conversation

adambkaplan
Copy link
Contributor

Injects the cluster proxy CA into /etc/pki/ca-trust/source/ca-bundle.crt

@openshift-ci-robot openshift-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 30, 2019
manifests/04-ca-proxy.yaml Outdated Show resolved Hide resolved
Copy link
Contributor Author

@adambkaplan adambkaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind: ConfigMap
metadata:
annotations:
config.openshift.io/inject-proxy-cabundle: "true"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed in https://github.com/openshift/cluster-image-registry-operator/blob/master/manifests/04-ca-config.yaml#L5-L6 that we are not marking the other CA ConfigMap as "create only" via the annotation release.openshift.io/create-only: "true". Is this right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wrong to me, yet i don't see that CM getting stomped by the CVO. Perhaps the CVO does not stomp configmaps back to empty?

@deads2k @abhinavdahiya ? (tldr: we have a CM being created in the manifest for service CAs. It's not marked create-only, but it seems to be working ok, the service-ca gets injected and the value doesn't seem to be getting stomped back out).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(after looking at a cluster i think it is getting stomped and then re-injected. though i'm a little surprised that isn't causing rollouts of the registry operator so we should probably confirm the registry operator is actually doing the right thing when the service ca changes. In this case the cert doesn't actually change, but it's presumably flapping back and forth between not existing and existing, in the CM)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to dig deeper on this - there are three separate control loops that are modifying the serviceca ConfigMap:

  1. The CVO
  2. The ca-configmap-injector
  3. The image registry operator itself - https://github.com/openshift/cluster-image-registry-operator/blob/master/pkg/resource/serviceca.go#L52-L62

Will fix this in a separate PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch on (3). seems like we should not need that generator at all?

for (1) we should just set the create-only annotation on the CM.

volumes:
- name: proxyca
configMap:
name: proxyca
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't block mounting the specific ca-bundle.crt path.

@@ -53,3 +53,10 @@ spec:
value: "cluster-image-registry-operator"
- name: IMAGE
value: docker.io/openshift/origin-docker-registry:latest
volumeMounts:
- name: proxyca
mountPath: /etc/pki/ca-trust/source/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty sure the bundle you're trying to overwrite is this one:
/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem

but i guess i'm not positive which one golang is actually reading since there is also:
/usr/share/pki/ca-trust-legacy/ca-bundle.legacy.default.crt

but it should be one of those two (there are various symlinks that point to those 2 files)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've called this out in the proxy design doc. Will the operator container need to run update-ca-trust extract before it begins?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would not expect so, no. pretty sure update-ca-trust extract is what produces those bundles.

@bparees
Copy link
Contributor

bparees commented Jul 30, 2019

the other bit of this should be some logic in the operator itself to watch the mounted file. If it changes, the operator should die (so it gets restarted).

(The file on disk will change if the configmap content changes).

@adambkaplan
Copy link
Contributor Author

the other bit of this should be some logic in the operator itself to watch the mounted file. If it changes, the operator should die (so it gets restarted).

@ricardomaraschini you and I will work on a follow-up PR to do this.

@bparees this PR will also stage the proxy CA that will be mounted into the registry itself - I assume we can use the same ConfigMap for both as long as the volume mounts are read-only.

@bparees
Copy link
Contributor

bparees commented Jul 31, 2019

@bparees this PR will also stage the proxy CA that will be mounted into the registry itself - I assume we can use the same ConfigMap for both as long as the volume mounts are read-only.

i'm not entirely sure what you mean, but the CM that's being annotated for proxy-ca injection needs to be used only for the proxy-ca, in the same way that CMs annotated for service-ca injection are used only for that. They can't have multiple writers.

@adambkaplan
Copy link
Contributor Author

i'm not entirely sure what you mean, but the CM that's being annotated for proxy-ca injection needs to be used only for the proxy-ca,

My intent is that we only have one CM for the proxy ca (rename this trusted-ca?). This ConfigMap will be mounted into the registry operator as well as the registry pod - hence read-only mounting.

I assume we don't need to mount this into the nodeca daemonset, and that another operator will be responsible for this.

@bparees
Copy link
Contributor

bparees commented Jul 31, 2019

My intent is that we only have one CM for the proxy ca (rename this trusted-ca?). This ConfigMap will be mounted into the registry operator as well as the registry pod - hence read-only mounting.

yes mounting it in two different pods should be fine.

I assume we don't need to mount this into the nodeca daemonset, and that another operator will be responsible for this.

would be good to confirm with @mrunalp how they are handling it(not on this thread please) but i believe the intent is for the MCO to get the trusted CAs onto the nodes and CRIO should just pick it up from there.

@adambkaplan adambkaplan changed the title Mount cluster proxy CA to image registry operator WIP - Mount cluster proxy CA to image registry operator Aug 1, 2019
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 1, 2019
@mrunalp
Copy link
Member

mrunalp commented Aug 1, 2019

would be good to confirm with @mrunalp how they are handling it(not on this thread please) but i believe the intent is for the MCO to get the trusted CAs onto the nodes and CRIO should just pick it up from there.
Yes, that's how it will work for CRI-O 👍

@adambkaplan
Copy link
Contributor Author

Huzzah! No trust = no AWS storage (though we separately trust the apiserver?):

 {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-08-01T15:58:23Z",
                "generation": 1,
                "name": "image-registry",
                "resourceVersion": "12484",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/image-registry",
                "uid": "30c468a5-b475-11e9-a066-0afad4972984"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-08-01T15:58:23Z",
                        "message": "The deployment does not exist",
                        "reason": "DeploymentNotFound",
                        "status": "False",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-08-01T15:58:23Z",
                        "message": "Unable to apply resources: unable to sync storage configuration: RequestError: send request failed\ncaused by: Head https://image-registry-us-east-1-ciop5nrpdzv2f0b2b5wzf6-30d741eeb47511.s3.amazonaws.com/: x509: certificate signed by unknown authority",
                        "reason": "Error",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-08-01T15:58:23Z",
                        "status": "False",
                        "type": "Degraded"
                    }
                ],
                "extension": null
            }
        },

@danehans
Copy link

danehans commented Aug 2, 2019

Should s3.amazonaws.com be added to noProxy for aws installs?

@adambkaplan
Copy link
Contributor Author

Should s3.amazonaws.com be added to noProxy for aws installs?

If we use VPC endpoints for s3, then yes. Otherwise no - iirc traffic to s3 by default routes over the public internet.

Is our proxy controller smart enough to detect this?

fyi @openshift/openshift-team-network-edge

@adambkaplan adambkaplan changed the title WIP - Mount cluster proxy CA to image registry operator DEVEXP-402: Mount cluster trusted CA to image registry operator Aug 8, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 8, 2019
@adambkaplan
Copy link
Contributor Author

/hold

Blocked by openshift/cluster-network-operator#274

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 8, 2019
name: trusted-ca
optional: true
items:
- key: ca-bundle.crt
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees this uses the optional mount and overwrites the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does "optional" apply to the keys as well as the configmap itself?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bparees I can confirm it does. If the key is not present, the pod is still scheduled (and in my testing had lots of unknown CA errors).

@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 9, 2019
* Injects the cluster proxy CA into /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
* Tell the CVO to only create the trusted-ca ConfigMap
@adambkaplan
Copy link
Contributor Author

/hold cancel

Once openshift/cluster-network-operator#274 this should work!

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 9, 2019
@adambkaplan adambkaplan changed the title DEVEXP-402: Mount cluster trusted CA to image registry operator DEVEXP-403: Mount cluster trusted CA to image registry operator Aug 9, 2019
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 11, 2019
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 14, 2019
@danehans
Copy link

@adambkaplan openshift/cluster-network-operator#271 is also needed, but should merge tomorrow.

@danehans
Copy link

/retest

@danehans
Copy link

@danehans
Copy link

fail [k8s.io/kubernetes/test/e2e/framework/framework.go:338]: Aug 16 19:37:17.100: Couldn't delete ns: "e2e-test-build-webhooks-9brtm": namespace e2e-test-build-webhooks-9brtm was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed (&errors.errorString{s:"namespace e2e-test-build-webhooks-9brtm was not deleted with limit: timed out waiting for the condition, namespace is empty but is not yet removed"})

/test e2e-aws

@adambkaplan
Copy link
Contributor Author

/test e2e-aws

/test e2e-aws-image-registry

/test e2e-aws-operator

/test e2e-aws-upgrade

@danehans
Copy link

level=error msg="Error: Error creating S3 bucket: TooManyBuckets: You have attempted to create more buckets than allowed"
/test e2e-aws
/test e2e-aws-image-registry
/test e2e-aws-upgrade

@adambkaplan
Copy link
Contributor Author

/retest

@bparees
Copy link
Contributor

bparees commented Aug 19, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 19, 2019
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@ricardomaraschini
Copy link
Contributor

/lgtm

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adambkaplan, bparees, ricardomaraschini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [adambkaplan,bparees]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit 75e8e85 into openshift:master Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants