Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTA-1294: enhancements/security/openshift-image-policy: Propose new enhancement #1633

Merged
merged 1 commit into from
Aug 6, 2024

Conversation

wking
Copy link
Member

@wking wking commented May 29, 2024

Increasing the security of OpenShift release images from the current "check GPG signatures before initiating an update" to "check Sigstore signatures on every Pod launch".

Like the (Cluster)ImagePolicy enhancement I'm building on, this enhancement mostly belongs to the Node component. But the node component doesn't seem to have its own subdirectory, so I'm dropping this into the enhancements/security directory for now.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2024
Copy link
Contributor

openshift-ci bot commented May 29, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@wking wking force-pushed the openshift-image-policy branch 2 times, most recently from 5837b9a to b0f05e6 Compare May 29, 2024 15:10
@wking wking changed the title enhancements/openshift-image-policy: Propose new enhancement enhancements/security/openshift-image-policy: Propose new enhancement May 29, 2024
@wking wking force-pushed the openshift-image-policy branch 2 times, most recently from 1010a07 to a388407 Compare May 29, 2024 17:04
@wking wking changed the title enhancements/security/openshift-image-policy: Propose new enhancement OTA-1294: enhancements/security/openshift-image-policy: Propose new enhancement May 29, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 29, 2024
@openshift-ci-robot
Copy link

openshift-ci-robot commented May 29, 2024

@wking: This pull request references OTA-1294 which is a valid jira issue.

In response to this:

Increasing the security of OpenShift release images from the current "check GPG signatures before initiating an update" to "check Sigstore signatures on every Pod launch".

Like the (Cluster)ImagePolicy enhancement I'm building on, this enhancement mostly belongs to the Node component. But the node component doesn't seem to have its own subdirectory, so I'm dropping this into the enhancements/security directory for now.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@wking wking force-pushed the openshift-image-policy branch 4 times, most recently from ab6b145 to bb35b37 Compare May 29, 2024 21:25
@wking wking marked this pull request as ready for review May 29, 2024 21:30
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 29, 2024
@openshift-ci openshift-ci bot requested review from jan--f and trozet May 29, 2024 21:30
@wking wking force-pushed the openshift-image-policy branch 2 times, most recently from 6ada6e7 to ab7a83e Compare May 30, 2024 00:23
enhancements/security/openshift-image-policy.md Outdated Show resolved Hide resolved
* Suggest changes to mirroring tools, although they will need to learn to mirror Sigstore signatures for use in disconnected/restricted-network environments.
* Suggest changes to the ImageStream controller, although it will need to learn to mirror Sigstore signatures for verification when running pods from ImageStream sources, e.g. [must-gather][].
* Suggest policies for `quay.io/openshift-release-dev/ocp-v4.0-art-dev`, although those would help tighten down security for other `openshift-*` namespaces.
* Support testing unsigned releases as HyperShift hosted clusters on a management cluster that requires signed releases.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/when does a hosted control plane deployment verify the release payload signature today (during install / during update)? Do we have the same luxury as we do in standalone of a Pod needing to run with the new payload before it has any impact on the hosted cluster?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I'm aware, there is no signature verification today in the HyperShift stack. In managed HyperShift (currently just hosted ROSA), ClusterImageSets define allowed update targets, and customers select from those presumably pre-verified targets. The HyperShift rolls out whichever release pullspec without performing any signature verification of its own, although it does run some pre-checks like looking at Upgradeable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone can patch HostedCluster and roll out an untrusted payload, I think we must close this gap.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, but:

  • Do you want to close that gap via OpenPGP / GPG signatures? Because if you do, that's a completely separate effort from this enhancement's Sigstore proposal.
  • If you are comfortable closing that gap via management clusters having a ClusterImagePolicy per this proposal, the gap will close, although without some HyperShift-side investigation, the user experience of failing to launch Sigstore-troubled release-image pods is unclear. But closing that gap will take time, as this enhancement percolates through tech-preview to GA, and management clusters eventually update to pick up the GA version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

closing that gap via management clusters having a ClusterImagePolicy per this proposal

A corner case might be the management cluster enforcing a production-key-signed policy, while trying to roll out a beta-key-signed prerelease of the managed cluster. Can that happen at all?


## Proposal

This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images.
This enhancement proposes a new `openshift-release` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images.

Trying not to step on a likely future name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping this instance would get expanded later to cover more of OpenShift. In the event that it does not (e.g. if we end up using different signing keys/policies for quay.io/openshift-release-dev/ocp-v4.0-art-dev, can we shard names then? Or can we use openshift-container-platform or something to match the product linked to the release we're verifying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Covering more of openshift requires careful collaboration that I don't any value in - given that the ClusterPolicies are merged automatically. I actually hope this ClusterImagePolicy becomes just ImagePolicy in the future., so having a unique instance that we can manage and remove at will has benefits.

Copy link
Member Author

@wking wking May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually hope this ClusterImagePolicy becomes just ImagePolicy in the future...

If that happens, then the naming of this temporary policy is largely irrelevant.

... so having a unique instance that we can manage and remove at will has benefits.

The CVO is going to stomp anyone else making any kind of spec edits, so it's going to be something we can manage and remove at will regardless of what we name this ClusterImagePolicy.

enhancements/security/openshift-image-policy.md Outdated Show resolved Hide resolved
This enhancement will not:

* Figure out how to configure a ClusterImagePolicy for verifying a mirrored release image, although this will need to happen before we can use Sigstore signatures in disconnected/restricted-network environments.
* Suggest changes to mirroring tools, although they will need to learn to mirror Sigstore signatures for use in disconnected/restricted-network environments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I'd like to state how we think this should work, avoiding implementation detail, and have mirror team sign off.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More on mirroring in this thread.

There are a number of possible next-steps that I'm punting on for now, as I try to nibble off the smallest possible slice of progress.
This enhancement will not:

* Figure out how to configure a ClusterImagePolicy for verifying a mirrored release image, although this will need to happen before we can use Sigstore signatures in disconnected/restricted-network environments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we state how we think this should work (transparently) and have mirror / container runtime teams review & sign off on the enhancement. I think a fair non-goal is to describe the technical implementation of how it will be implemented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My grasp of Sigstore is light, so I'd guess it's "do whatever CRI-O does to find relevant signatures, and then copy them over in a way that's discoverable on the mirror". I don't know more details. Just look for *.sig sibling tags? Eventually moving over to the referrer API, as tracked in RUN-1880 and containers/image#1848? With Sigstore as "the generic way to sign images", isn't there some prior art on "the generic way to mirror these things" we can draft on, without me having to guess from a cold start?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClusterImagePolicy part should be a no-op here: ClusterImagePolicy would only have a policy for quay.io/openshift-release-dev/ocp-release, and image references (in Pods and elsewhere) should refer to ``quay.io/openshift-release-dev/ocp-release`.

CRI-O (and other image users) should then interpret IDMS to redirect the actual registry accesses from quay.io/openshift-release-dev/ocp-release to the mirrors. If that works in current disconnected environments, no other IDMS / policy should have to be configured specifically for mirrors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(mirroring tools do need to be taught to mirror the signature images/artifacts, and the related tags, if any.)

Copy link
Member Author

@wking wking May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But while IDMS will redirect the by-digest quay.io/openshift-release-dev/ocp-release@sha256:d93db95ec59b2f51ff45fdecd54b4dd9144521e3d099fef151c189598200cf24 image request, we might need an ImageTagMirrorSet to get the quay.io/openshift-release-dev/ocp-release:sha256-d93db95ec59b2f51ff45fdecd54b4dd9144521e3d099fef151c189598200cf24.sig redirected. Until CRI-O pivots to the referrers API for signature lookup?

But regardless, out of scope initially, and we can test all of this once the current scope restriction is dropped from the API and MCO implementation.

Copy link
Contributor

@mtrmac mtrmac May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as implemented in c/image (and called by CRI-O): the “abstract image” as a whole has a mirror identified using IDMS, and then that mirror is used to fetch the manifest, layers, and signatures — although the way signatures are stored is structurally similar to an independent image, the implementation does not do another mirror lookup.

(If something else is fetching signatures without using c/image, then, yes, that something else is responsible for interpreting IDMS/ITMS, and I selfishly think it should adjust to behave the same way as c/image — or to call it.)

For Red-Hat-internal CI and QE, the agent configuring the install-config needs to set `internalTestingImagePolicy: Disable`.
The new property can be set regardless of version or whether the `ImagePolicy` feature gate is enabled, because [the installer does not fail on unrecognized properties][install-config-unrecognized-properties].

### API Extensions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status information for ClusterVersion / HostedCluster / NodePool when payload is not signed / Pod can't be deployed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CVO-launches a Job to pull in the new target's manifests, and that job can have problems already. OCPBUGS-31462 is a recent example. It's reported via the ReleaseAccepted condition. Likely has no analog for HostedCluster or NodePool, because I doubt they do the warm manifest handoff that the CVO needs to do to handle implicitly-enabled capabilities. But they will likely need something similar if they ever get around to supporting configurable cluster capabilities. And they should already have something similar, because "the pod I tried to launch came up healthy" is not something you want to rely on happening every time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a Existing OpenPGP user experience section covering ReleaseAccepted in 1b7f6db -> f2d5194.

@wking wking force-pushed the openshift-image-policy branch 5 times, most recently from e5a5b10 to ddbd388 Compare May 30, 2024 04:07
## Proposal

This enhancement proposes a new `openshift` ClusterImagePolicy that requires Red Hat signatures for `quay.io/openshift-release-dev/ocp-release` images.
The new ClusterImagePolicy will be managed by the ClusterVersion operator and applied to all clusters by default (initially tech-preview, gated by the `ImagePolicy` feature gate ([currently `TechPreviewNoUpgrade`][feature-gate]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean we ship the policy per default if image policies are enabled? Or do we want to have an additional toggle switch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's the proposal. If you think there's a case for a separate feature gate, can you give some pros and cons, for an Alternatives section? Then folks weighing pros and cons could decide to flip that option in as the core proposal and shift "use the ImagePolicy feature-gate" out into the Alternatives section if they want. Personally, my initial take is that is makes sense to at least start expecting one gate, and possibly to pivot to a separate gate if the Implementation order plan hits trouble during the step (3) testing step, before we've done anything to the payload except (1)'s lifting the current scope restrictions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing to clarify in this section is how we will continue handling gpg verification. In early discussions we had discussed having both side by side initially. We need to decide if we will keep it while this is in tech preview and then disable gpg entirely when we GA sigstore support.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this enhancement doesn't suggest changes to the OpenPGP/GPG handling, because:

  • We'll need to keep it for the GA 4.17, which won't have the TechPreviewNoUpgrade Sigstore stuff.
  • It's pretty orthogonal to the Sigstore proposal, so the CVO continuing to do it's GPG thing while the TechPreviewNoUpgrade Sigstore stuff happens too is just belt-and-suspenders. And also helps demo how the transitional period will work if this thing goes GA.

Do you think it's worth calling that out explicitly in Markdown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do - a viable plan to migrate, securely and transparently for the end user, over time is an important consideration to the overall proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed 4a5ca6b -> 7137261, adding an Overlap with OpenPGP signing section explaining how the two systems are orthogonal and working through some details of a transitional update to make that more clear.

I'm leaving this thread unresolved, because @saschagrunert 's initial ask about whether this deserves a separate feature gate than the existing ImagePolicy feature-gate is not yet clear. Do folks have any pros or cons to feed an Alternatives section about that? Or can we mark this thread resolved based on reasoning like:

No real pros/cons yet, but we if we bump into any later like "wow, testing shows this is going to take a long time to get working", we can come split off to a new feature-gate then.

@petr-muller
Copy link
Member

/cc

@openshift-ci openshift-ci bot requested a review from petr-muller May 30, 2024 13:51

#### Installer

As discussed in [the *Proposal* section](#proposal), the installer will grow a new, feature-gated `OPENSHIFT_INSTALL_EXPERIMENTAL_DISABLE_IMAGE_POLICY` environment variable driving ClusterVersion `overrides` extension.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why an env var rather than just configuring the appropriate feature gate in the install configuration via TechPreviewNoUpgrade and CustomNoUpgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variables are how the installer implements knobs used for testing which aren't customer-facing. See the Have the installer use an install-config property alternative section for more context, although I don't get at feature-gates directly there. But the idea with this knob is that "disable security" is expected to be internal forever, and isn't something we expect to eventually expose to customers.

wking added a commit to wking/cluster-update-keys that referenced this pull request Jul 20, 2024
Setting a policy to require Red Hat signatures on OpenShift release
images from the canonical quay.io/openshift-release-dev/ocp-release
repository, as described in [1].

[1]: openshift/enhancements#1633
wking added a commit to wking/cluster-update-keys that referenced this pull request Jul 20, 2024
Setting a TechPreviewNoUpgrade policy to require Red Hat signatures on
OpenShift release images from the canonical
quay.io/openshift-release-dev/ocp-release repository, as described in
[1].

[1]: openshift/enhancements#1633
@wking
Copy link
Member Author

wking commented Jul 20, 2024

I've pushed ea17075 -> c089c6f to pivot from the ImagePolicy feature gate to the SigstoreImageVerification feature gate, now that openshift/api#1964 has removed the former as a dup of the latter.

wking added a commit to wking/cluster-update-keys that referenced this pull request Jul 30, 2024
Setting a TechPreviewNoUpgrade policy to require Red Hat signatures on
OpenShift release images from the canonical
quay.io/openshift-release-dev/ocp-release repository, as described in
[1].

[1]: openshift/enhancements#1633
Increasing the security of OpenShift release images from the current
"check GPG signatures before initiating an update" to "check Sigstore
signatures on every Pod launch".

Like the (Cluster)ImagePolicy enhancement I'm building on, this
enhancement mostly belongs to the Node component.  But the node
component doesn't seem to have its own subdirectory, so I'm dropping
this into the enhancements/security directory for now.

I've tried to lay out context for most things in the enhancement text
itself, but the:

  --sort-by='{.lastTimestamp}{.metadata.creationTimestamp}'

event filtering seems peripheral enough to be worth punting to here in
the commit message.  The logic Kubernetes uses to populate the LAST
SEEN column is complicated [1].  While lastTimestamp seems to be what
these particular Pod events most commonly use, that property is
optional [2].  By prefering lastTimestamp, and falling back to
creationTimestamp if lastTimestamp is unset, I'll hopefully fairly
reliably deliver a descending LAST SEEN column.

[1]: https://github.com/kubernetes/kubernetes/blob/9c8c61aee4966d153fba0b9c365c7d03c602b4fc/staging/src/k8s.io/kubectl/pkg/cmd/events/event_printer.go#L66-L99
[2]: https://github.com/kubernetes/kubernetes/blame/9c8c61aee4966d153fba0b9c365c7d03c602b4fc/pkg/apis/core/types.go#L5607-L5609
@wking
Copy link
Member Author

wking commented Jul 30, 2024

I've pushed c089c6f -> ee2f985 to fix a rootoftrust -> rootOfTrust typo (API), sorry about that!

source: quay.io/openshift-release-dev/ocp-release
```

In any case, it's out of scope for a reason, but until it gets picked up, it might make things more difficult for `TechPreviewNoUpgrade` disconnected/restricted-network clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the main blocker here support in mirror tooling or we need any changes in c/image? @QiWang19 @saschagrunert @mtrmac

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, want to ensure that we capture this as a GA requirement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC this is entirely mirror tooling.

I have no idea; suggestions welcome :).

There will be a gap during install before the ClusterImagePolicy has been created and rolled out to nodes, during which unsigned `quay.io/openshift-release-dev/ocp-release` images might run.
But we do not perform any in-cluster signature checking during installs today; they are secured by a trusted installer using a baked-in release image digest.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would want this documented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, could we actually fix that, by (vaguely guessing) applying the ClusterImagePolicy earlier, when the bootstrap node/etcd is still being used, and there’s no possibility of triggering an upgrade?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to fix, because we get the ClusterImagePolicy from the release image, so you need that first release image pull to go through before you know what the policy should be. You could pull the policy out of the release image and inject it into the installer binary at extraction time, like we currently do for other release-image metadata. But Sigstore signatures on release images are still too late to establish trust in the installer. And once you have established trust in the installer, the installer's injected-at-extraction-time by-digest release image pullspec establishes trust in the release image. So it's not clear to me what gap earlier Sigstore signature coverage would close.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m sorry, that’s my mistake; it says “there will be a gap during install”, I was reading that as “after install there is a gap where an upgrade could be triggered before a policy has finished rolling out”.

I agree that the digest references flowing from the installer through the release image to the various product images is sufficient; and I don’t think that particularly needs documentation (end-user-focused documentation, at least).

Copy link
Member

@mrunalp mrunalp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can follow up on HCP and mirroring for the next phase.

@mrunalp
Copy link
Member

mrunalp commented Aug 6, 2024

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 6, 2024
Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 6, 2024
Copy link
Contributor

openshift-ci bot commented Aug 6, 2024

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-bot openshift-merge-bot bot merged commit 9374093 into openshift:master Aug 6, 2024
2 checks passed
@wking wking deleted the openshift-image-policy branch August 6, 2024 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet