Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cosigned] Cosigned does not use the provided imagePullSecret for GCR when running in GCP #1878

Closed
vpnachev opened this issue May 13, 2022 · 12 comments · Fixed by #1889
Closed
Labels
bug Something isn't working

Comments

@vpnachev
Copy link
Contributor

vpnachev commented May 13, 2022

Description

I have private images in Google Container Registry (GCR) that are signed and want to verify them from k8s clusters running in Google Cloud Platform. The cluster and the images are running in different GCP projects and there is no permissions allowing service account from the one project to access GCR resources in the other. So, I have deployed (following the instructions here) cosigned in the k8s clusters and both - defaulting and validations are failing with

UNAUTHORIZED: You don't have the needed permissions to perform this operation, and you may have invalid credentials. To authenticate your request, follow the steps in: https://cloud.google.com/container-registry/docs/advanced-authentication: spec.containers[0].image

If I disable the webhook, kubelet successfully pulls the image and starts the container, therefore I am sure the credentials provided in the image pull secret are correct.

I have followed exactly the same steps with kind and AWS clusters with exactly the same image and image pull secret - cosigned is working fine and successfully resolve tags to digests, as well as retrieves the signature.

I found some pointers in different issues in this repo, and the most prominent pointer seems to be sigstore/sigstore#138 (comment). Additionally, with #804 exactly k8schain is used. I don't have much experience with this module, but it looks like it discovers the GCP service account on the k8s node and prefer it over the provided image pull secret which looks wrong to me - I would expect the image pull secrets to have higher priority.

Special notes:
My observations are limited only to AWS|kind|GCP + GCR. I have not tried different OCI registry provider, but I guess it might fail on AWS+ECR and Azure+ACR.

/cc @mattmoor @imjasonh

@vpnachev vpnachev added the bug Something isn't working label May 13, 2022
@mattmoor
Copy link
Member

I wonder whether something changed with the recent keychain work? 🤔

cc @imjasonh

@imjasonh
Copy link
Member

I found some pointers in different issues in this repo, and the most prominent pointer seems to be sigstore/sigstore#138 (comment). Additionally, with #804 exactly k8schain is used. I don't have much experience with this module, but it looks like it discovers the GCP service account on the k8s node and prefer it over the provided image pull secret which looks wrong to me - I would expect the image pull secrets to have higher priority.

k8schain should take configured imagePullSecrets over implicit GCP SA auth. See https://github.com/google/go-containerregistry/blob/b7619f2a53b1/pkg/authn/k8schain/k8schain.go#L49 -- this list is consulted in order, so first it will use the k8s client to find pull secrets, then if none match it will use DefaultKeychain (looking for ~/.config/docker.json if present), then fallback to implicit GCP, AWS, Azure auth.

@mattmoor
Copy link
Member

One subtle thing about the k8s keychain is that it will ignore secrets that don't literally have type: dockerconfigjson (or the other one), so if it's a generic secret it won't work. This caused me some frustration once upon a time.

@vpnachev
Copy link
Contributor Author

vpnachev commented May 13, 2022

Thank you for the quick feedback!
I think I am doing it just you like suggest, though it still fails. Here are the exact steps that I follow on a freshly created GKE cluster:

kubectl create namespace cosign-system
kubectl create secret generic mysecret -n cosign-system --from-file=cosign.pub=cosigned/cosign.pub
helm repo add sigstore https://sigstore.github.io/helm-charts
helm repo update
helm install cosigned -n cosign-system sigstore/cosigned --devel --set cosign.secretKeyRef.name=mysecret --set webhook.image.version=v1.8.0 --wait
kubectl create namespace test-private-images
kubectl label namespace test-private-images cosigned.sigstore.dev/include=true
cat <<EOF | kubectl -n test-private-images apply -f -
---
apiVersion: v1
data:
  .dockerconfigjson: <credentials | base64 >
kind: Secret
metadata:
  name: ro-gcr
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: <private-image-name>@<private-image-digest>
  imagePullSecrets:
  - name: ro-gcr
EOF

And the pod creation fails with

Error from server (BadRequest): error when creating "STDIN": admission webhook "cosigned.sigstore.dev" denied the request: validation failed: GET https://eu.gcr.io/v2/<private-image-uri>/manifests/<private-image-digest>: DENIED: Permission denied for "<private-image-digest>" from request "/v2/<private-image-uri>/manifests/<private-image-digest>". : spec.containers[0].image
<private-image-name>@<private-image-digest>

I might be overlooking something, but if you find anything suspicious, please let me know.

To rule out any network or permissions issues, I have executed similar request with curl from a container deployed in the same cluster

kubectl run --image alpine:3.15.4 alpine -- sleep 3600
kubectl exec -it alpine -- sh
apk add curl
curl https://eu.gcr.io/v2/<privata-image-uri>/manifests/<private-image-digest> -u oauth2accesstoken:<credentials>
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
   "config": {
      "mediaType": "application/vnd.docker.container.image.v1+json",
      "size": 2424,
      "digest": "sha256:abcd"
   },
   "layers": [
      {
         "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
         "size": 2818413,
         "digest": "sha256:123"
      },
...
   ]

So, this is what made me think it should be something in the authentication used by cosigned.

@vaikas
Copy link
Contributor

vaikas commented May 13, 2022

@vpnachev The secret must live in the same namespace as cosigned. We do not want to have rights to read all the secrets in all the namespaces. I realize this was never documented (I'm fixing right now... Sorry!!!)

So, try creating that secret in cosign-system namespace

@vaikas
Copy link
Contributor

vaikas commented May 13, 2022

Sorry, are you using the ClusterImagePolicy by any chance? Sorry for possibly confusing the issue :)

@vpnachev
Copy link
Contributor Author

The secret must live in the same namespace as cosigned

If this was true, then I should have seen failures with kind/aws k8s clusters, but they are working perfectly fine. I also should have seen errors in the logs that cosigned tries to read the non-existing ro-gcr secret from the cosign-system namespace, but there are no such errors.

In general, cosigned cannot and should not expect that each and every image pull secret will be replicated from the application namespace to the cosign system namespace.

We do not want to have rights to read all the secrets in all the namespaces.

Looking in the chart templates, actually it has the read permissions for all secrets in all namespaces, ref

Sorry, are you using the ClusterImagePolicy by any chance? Sorry for possibly confusing the issue :)

I am not deploying any ClusterImagePolicies, but I guess there is a default one in the chart that should be configured, ref


Next week I will try to disable the service accounts on the GCP nodes, also will try with different OCI registries.

@hectorj2f
Copy link
Contributor

@vpnachev You could use that example to create your own ClusterImagePolicy but it is not enabled by default. You could follow the documentation in https://docs.sigstore.dev/cosign/kubernetes/ to create a new one.

Looking in the chart templates, actually it has the read permissions for all secrets in all namespaces, ref

Yes, as you certainly found in ref, we are relying on the same logic used by the PodSpec when getting the credentials to pull images from a registry. In addition to that, you could also specify the credentials to be used in case the registry with the signatures lives in a different location.

Let us know if you are still facing problems.

@vpnachev
Copy link
Contributor Author

vpnachev commented May 16, 2022

In addition to that, you could also specify the credentials to be used in case the registry with the signatures lives in a different location.

@hectorj2f I am aware about this option, but have not evaluated it because I think cosigned still needs access to the registry with the image to resolve the tag to digest.

Let us know if you are still facing problems.

Yes, I am still facing the problem. More details below


I will try to disable the service accounts on the GCP nodes

It turns out that I cannot do this, so I cannot prove this way that the service account available on the GCP nodes is the culprit.

However, I was able to

also will try with different OCI registries

In cluster running on GCP I am successfully using private image from AWS ECR without changing anything in how cosigned is installed in the cluster, also no additional configurations are applied. Pretty much I am following again the steps from #1878 (comment), I am just creating another namespace now with AWS specific image pull secret and replicated alpine image in my private registry (also signed with my cosign private key)

kubectl create namespace test-private-images-aws
kubectl label namespace test-private-images-aws cosigned.sigstore.dev/include=true
kubectl -n test-private-images-aws create secret docker-registry aws-ro --docker-server <account-id>.dkr.ecr.eu-central-1.amazonaws.com --docker-username AWS --docker-password=$(aws ecr get-login-password)
cat <<EOF | kubectl -n test-private-images-aws apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: <aws-account>.dkr.ecr.eu-central-1.amazonaws.com/alpine:3.15.4
    command:
    - sh
    - -c
    - "sleep 3600"
  imagePullSecrets:
  - name: aws-ro
EOF
kubectl -n test-private-images-aws get pod test-private-image -o jsonpath='{@.spec.containers[0].image}'
<aws-account>.dkr.ecr.eu-central-1.amazonaws.com/alpine@sha256:a777c9c66ba177ccfea23f2a216ff6721e78a662cd17019488c417135299cd89

So, the image is successfully resolved from tag to digest and also validated against the signature in the same registry.

Just for completeness, I will do again the same with GCR

kubectl create namespace test-private-images-gcp
kubectl label namespace test-private-images-gcp cosigned.sigstore.dev/include=true
cat <<EOF | kubectl -n test-private-images-gcp apply -f -
---
apiVersion: v1
data:
  .dockerconfigjson: <credentials>
kind: Secret
metadata:
  name: gcr-ro
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: eu.gcr.io/<gcp-project>/alpine:3.15.4
    command:
    - sh
    - -c
    - "sleep 3600"
  imagePullSecrets:
  - name: gcr-ro
EOF

And here I again get the error

Error from server (BadRequest): error when creating "STDIN": admission webhook "cosigned.sigstore.dev" denied the request: validation failed:
invalid value: eu.gcr.io/<gcp-project>/alpine:3.15.4 must be an image digest: spec.containers[0].image

If I disable the validation for the test-private-images-gcp namespace, kubelet successfully manage to pull the image

kubectl label namespace test-private-images-gcp cosigned.sigstore.dev/include=false --overwrite
cat <<EOF | kubectl -n test-private-images-gcp apply -f -
---
apiVersion: v1
data:
  .dockerconfigjson: <credentials>
kind: Secret
metadata:
  name: gcr-ro
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: eu.gcr.io/<gcp-project>/alpine:3.15.4
    command:
    - sh
    - -c
    - "sleep 3600"
  imagePullSecrets:
  - name: gcr-ro
EOF
kubectl -n test-private-images-gcp get pod
NAME                 READY   STATUS    RESTARTS   AGE
test-private-image   1/1     Running   0          119s
kubectl -n test-private-images-gcp get events --field-selector involvedObject.name=test-private-image | grep Pulled
2m17s       Normal   Pulled      pod/test-private-image   Successfully pulled image "eu.gcr.io/<gcp-project>/alpine:3.15.4" in 462.97357ms

I've also tested AWS k8s cluster with GCR and ECR, the cosigned installation is exactly the same as the one for the GCP cluster.

kubectl create namespace test-private-images-aws
kubectl label namespace test-private-images-aws cosigned.sigstore.dev/include=true
kubectl -n test-private-images-aws create secret docker-registry aws-ro --docker-server <aws-account>.dkr.ecr.eu-central-1.amazonaws.com --docker-username AWS --docker-password=$(aws ecr get-login-password)
cat <<EOF | kubectl -n test-private-images-aws apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: <aws-account>.dkr.ecr.eu-central-1.amazonaws.com/alpine:3.15.4
    command:
    - sh
    - -c
    - "sleep 3600"
  imagePullSecrets:
  - name: aws-ro
EOF
kubectl -n test-private-images-aws get pod test-private-image -o jsonpath='{@.spec.containers[0].image}'
<aws-account>.dkr.ecr.eu-central-1.amazonaws.com/alpine@sha256:a777c9c66ba177ccfea23f2a216ff6721e78a662cd17019488c417135299cd89

I was expecting AWS cluster to fail with ECR registry, though it succeeded. But it also succeeded with GCR registry

kubectl create namespace test-private-images-gcp
kubectl label namespace test-private-images-gcp cosigned.sigstore.dev/include=true
cat <<EOF | kubectl -n test-private-images-gcp apply -f -
---
apiVersion: v1
data:
  .dockerconfigjson: <credentials>
kind: Secret
metadata:
  name: gcr-ro
type: kubernetes.io/dockerconfigjson
---
apiVersion: v1
kind: Pod
metadata:
  name: test-private-image
spec:
  containers:
  - name: test
    image: eu.gcr.io/<gcp-project>/alpine:3.15.4
    command:
    - sh
    - -c
    - "sleep 3600"
  imagePullSecrets:
  - name: gcr-ro
EOF
kubectl -n test-private-images-gcp get pod test-private-image -o jsonpath='{@.spec.containers[0].image}'
eu.gcr.io/<gcp-project>/alpine@sha256:a777c9c66ba177ccfea23f2a216ff6721e78a662cd17019488c417135299cd89

I will try to dig deeper into the authentication code, but any help will be appreciated. Also, if anyone else can try to follow my steps to reproduce on GCP nodes with GCR, just make sure the service account associated with the nodes is not authorized to read the image.


Update: Private registry from ghcr.io works fine also on the GCP and AWS k8s clusters. So it looks like this issue is limited only on GCP with GCR

@vpnachev
Copy link
Contributor Author

I found some pointers in different issues in this repo, and the most prominent pointer seems to be sigstore/sigstore#138 (comment). Additionally, with #804 exactly k8schain is used. I don't have much experience with this module, but it looks like it discovers the GCP service account on the k8s node and prefer it over the provided image pull secret which looks wrong to me - I would expect the image pull secrets to have higher priority.

k8schain should take configured imagePullSecrets over implicit GCP SA auth. See https://github.com/google/go-containerregistry/blob/b7619f2a53b1/pkg/authn/k8schain/k8schain.go#L49 -- this list is consulted in order, so first it will use the k8s client to find pull secrets, then if none match it will use DefaultKeychain (looking for ~/.config/docker.json if present), then fallback to implicit GCP, AWS, Azure auth.

I think this is the issue that I am facing (thanks @imjasonh for the pointer). Simply, the cosigned is using version of github.com/google/go-containerregistry that does not include google/go-containerregistry#1346.

https://github.com/sigstore/cosign/blob/v1.8.0/go.mod#L72 -> https://github.com/google/go-containerregistry/blob/f1fa40b162a1601a863364e8a2f63bbb9e4ff36e/pkg/authn/k8schain/k8schain.go#L48-L53 and here the k8s keyChain is the last in the list, therefore the GCP service account is used.

The solution should be relatively easy - just revendor newer version of github.com/google/go-containerregistry/pkg/authn/k8schain, I will try to validate this tomorrow and if it works, I will open a PR.

@imjasonh
Copy link
Member

D'oh! That possibility didn't even occur to me. 🤦‍♂️

I agree it sounds like that should be the fix, thanks for thinking of it, and let me know if you send a PR. Or if you don't, I can. 👍

@vpnachev
Copy link
Contributor Author

I managed to vendor k8schain where the issue is fixed and now I can confirm that I can use private images from GCR in GCP k8s clusters 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants