Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image not found when pulling from integrated registry - service account not allowed to pull? #17523

Closed
amather opened this issue Nov 30, 2017 · 37 comments

Comments

@amather
Copy link

amather commented Nov 30, 2017

I can successfully deploy the integrated registry and push a custom image from my client. However, creating a pod that uses this image does not work. Error messages indicate that the image could not be found, but as I've tried to reference the image by different names and none worked, and I also see authentication related error messages from docker, I assume this has to do with failed authentication of the serviceaccount against the internal registry.

Version
[kubmaster@kubmaster1-prod ~]$ oc version
oc v3.6.1+008f2d5
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://kubmaster1.mycompany.com:8443
kubernetes v1.6.1+5115d708d7
Steps To Reproduce
  1. After installing openshift with Ansible, but without the registry, create the registry to the existing cluster as in Deploying a Registry on Existing Cluster.
  2. push a custom image.
    2a. I've exposed the registry with a route and public hostname so I could use it from my client.
    2b. I've created a user for remote access as described in Accessing the Registry
    2c. I've pushed the image to the registry using the public hostname:
    $ docker push registry.mycompany.com/default/ipsec-router
    2d. I can see the image on the master logged in as system:admin:
[kubmaster@kubmaster1-prod ~]$ oc get is
NAME           DOCKER REPO                                             TAGS      UPDATED
ipsec-router   docker-registry.default.svc:5000/default/ipsec-router   latest    42 minutes ago
  1. create pod referencing the image
apiVersion: v1
kind: Pod
metadata:
  generateName: testapp-
spec:
  # for testing; known where to grab the docker logs from
  nodeSelector:
    openshift-infra: apiserver
  containers:
  - name: nginx
    image: nginx:1.7.9
    ports:
    - containerPort: 80
  - name: ipsec-router
    image: default/ipsec-router

NOTE: I'm note sure about naming conventions of the image attribute but I've tried different variations like ipsec-router, docker-registry.default.svc:5000/default/ipsec-router, etc. All errors indicate that the image was not found, but I don't think that this is the issue, see below.

All actions happen in default project.

Current Result

The pod creation fails.

[kubmaster@kubmaster1-prod ~]$ oc get pods
NAME                      READY     STATUS             RESTARTS   AGE
docker-registry-2-53kgw   1/1       Running            1          1h
router-1-b8d16            1/1       Running            2          1h
testapp-qp101             1/2       ImagePullBackOff   0          15m

Pod events:

Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath			Type		Reason		Message
  ---------	--------	-----	----					-------------			--------	------		-------
  15m		15m		1	default-scheduler							Normal		Scheduled	Successfully assigned testapp-qp101 to kubmaster1.mycompany.com
  15m		15m		1	kubelet, kubmaster1.mycompany.com	spec.containers{nginx}		Normal		Pulled		Container image "nginx:1.7.9" already present on machine
  15m		15m		1	kubelet, kubmaster1.mycompany.com	spec.containers{nginx}		Normal		Created		Created container
  15m		15m		1	kubelet, kubmaster1.mycompany.com	spec.containers{nginx}		Normal		Started		Started container
  15m		13m		4	kubelet, kubmaster1.mycompany.com	spec.containers{ipsec-router}	Normal		Pulling		pulling image "default/ipsec-router"
  15m		13m		4	kubelet, kubmaster1.mycompany.com	spec.containers{ipsec-router}	Warning		Failed		Failed to pull image "default/ipsec-router": rpc error: code = 2 desc = Error: image default/ipsec-router:latest not found
  15m		5m		41	kubelet, kubmaster1.mycompany.com	spec.containers{ipsec-router}	Normal		BackOff		Back-off pulling image "default/ipsec-router"
  15m		15s		70	kubelet, kubmaster1.mycompany.com					Warning		FailedSync	Error syncing pod
Expected Result

Pod should get created.

Additional Information
ERROR: [DClu1019 from diagnostic ClusterRegistry@openshift/origin/pkg/diagnostics/cluster/registry.go:343]
       Diagnostics created a test ImageStream and compared the registry IP
       it received to the registry IP available via the docker-registry service.

       docker-registry      : 172.30.210.175:5000
       ImageStream registry : docker-registry.default.svc:5000

       They do not match, which probably means that an administrator re-created
       the docker-registry service but the master has cached the old service
       IP address. Builds or deployments that use ImageStreams with the wrong
       docker-registry IP will fail under this condition.

       To resolve this issue, restarting the master (to clear the cache) should
       be sufficient. Existing ImageStreams may need to be re-created.

According to this mailing list entry this could be a bug. In any case, I do not experience DNS related issues. In fact, the service can be reached and I see credentials for the IP address as well as the service name (see below).

  • registry container logs show no relevant information (only health checks)

  • dockerd system logs on the system where the pod should be created indicate authentication problem:

Nov 30 01:53:00 kubmaster1-prod dockerd-current: time="2017-11-30T01:53:00.158288891+01:00" level=error msg="Attempting next endpoint for pull after error: unauthorized: authentication required"
Nov 30 01:53:00 kubmaster1-prod dockerd-current: time="2017-11-30T01:53:00.588303570+01:00" level=error msg="Not continuing with pull after error: Error: image default/ipsec-router:latest not found"
Nov 30 01:53:15 kubmaster1-prod dockerd-current: time="2017-11-30T01:53:15.954754041+01:00" level=error msg="Handler for GET /v1.24/images/default/ipsec-router:latest/json returned error: No such image: default/ipsec-router:latest"
Nov 30 01:53:31 kubmaster1-prod dockerd-current: time="2017-11-30T01:53:31.950696553+01:00" level=error msg="Handler for GET /v1.24/images/default/ipsec-router:latest/json returned error: No such image: default/ipsec-router:latest"
Nov 30 01:53:43 kubmaster1-prod dockerd-current: time="2017-11-30T01:53:43.947386178+01:00" level=error msg="Handler for GET /v1.24/images/default/ipsec-router:latest/json returned error: No such image: default/ipsec-router:latest"

This sounds to me as if there was an authentication problem for the internal registry and the the not found message comes from the other registries tried.

  • not sure if following should work, but it doesn't:
$ oc get secrets
$ oc describe secret default-dockercfg-zbb95
...
dockercfg:      {"172.30.210.175:5000":{"username":"serviceaccount","password":"xxx...","email":"serviceaccount@example.org","auth":"yyy..."},"docker-registry.default.svc:5000":{"username":"serviceaccount","password":"xxx...","email":"serviceaccount@example.org","auth":"yyy..."}}
...
$ oc login --token=xxx....
Logged into "https://kubmaster1.mycompany.com:8443" as "system:serviceaccount:default:default" using the token provided.
...
$ docker login -u $(oc whoami) -p $(oc whoami -t) docker-registry.default.svc.cluster.local:5000
Error response from daemon: Get https://docker-registry.default.svc.cluster.local:5000/v2/: unauthorized: authentication required

As said, not sure if that should actually work, but it would match the error message seen in the system log from dockerd.

@wanghaoran1988
Copy link
Member

       To resolve this issue, restarting the master (to clear the cache) should
       be sufficient. Existing ImageStreams may need to be re-created.

From the diagnostics, could you please try restart your master ? add paste registry log here?

@amather
Copy link
Author

amather commented Nov 30, 2017

I've already restarted the master, but did again and also restarted the registry pod and tried to recreate the pod in question. Here're the logs:

time="2017-11-30T11:45:28.386212208Z" level=info msg="start registry" distribution_version="v2.4.1+unknown" kubernetes_version=v1.6.1+5115d708d7 openshift_version=v3.6.1+008f2d5
time="2017-11-30T11:45:28.38849778Z" level=info msg="OpenShift middleware for storage driver initializing"
time="2017-11-30T11:45:28.388523992Z" level=info msg="redis not configured" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.388592281Z" level=info msg="Starting upload purge in 33m0s" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.40335816Z" level=info msg="using inmemory blob descriptor cache" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.403376617Z" level=info msg="OpenShift registry middleware initializing"
time="2017-11-30T11:45:28.403394826Z" level=info msg="Using Origin Auth handler" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.40340797Z" level=debug msg="configured \"openshift\" access controller" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.403434169Z" level=debug msg="configured token endpoint at \"/openshift/token\"" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
time="2017-11-30T11:45:28.403945705Z" level=info msg="listening on :5000, tls" go.version=go1.7.6 instance.id=11ec594e-3ef9-4d05-a06d-9c574425b080 openshift.logger=registry
10.129.0.1 - - [30/Nov/2017:11:45:31 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:45:41 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:45:41 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
<somewhere here I did the pod create -f ...>
10.129.0.1 - - [30/Nov/2017:11:45:51 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:45:51 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:01 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:01 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:11 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:11 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:21 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:21 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:31 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:31 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"
10.129.0.1 - - [30/Nov/2017:11:46:41 +0000] "GET /healthz HTTP/2.0" 200 0 "" "Go-http-client/2.0"```

@pweil-
Copy link
Contributor

pweil- commented Nov 30, 2017

If you are unable to log in to your registry directly from inside the cluster with your OpenShift token something is very broken in your installation. I would first recommend removing the registry and redeploying then trying to verify that you can log in with a token.

The command you listed is correct docker login -u $(oc whoami) -p $(oc whoami -t) <registry_ip>:<port>.

@pweil- pweil- closed this as completed Nov 30, 2017
@amather
Copy link
Author

amather commented Nov 30, 2017

@pweil- Why did you close this issue? Removing the registry and redeploying it was one of the first things I thought of. In fact, I did remove not only the registry but the whole cluster multiple times now and can reproduce this issue by installing the OS from scratch and ramping up the cluster by following the docs starting at Host Preparation up to Accessing the Registry.

@sverboven
Copy link

@pweil-, @amather I just wanted to report that we are seeing identical problems when upgrading a fully automated installation from 3.6 to 3.7. The only difference being the version of OpenShift. I does seem like something has changed between these versions causing this registry problem.

@pweil-
Copy link
Contributor

pweil- commented Dec 6, 2017

@sverboven thanks for the report. Did you also test trying to log in to the registry with a service account manually? Do you have registry logs from the times when the authorization is failing? Can you also detail the env a bit for us, version numbers at least. Thanks!

While we need to avoid using GH as a support channel I think we need a little more information to determine if this is a bug since we have not seen this in our testing.

@pweil- pweil- reopened this Dec 6, 2017
@bparees
Copy link
Contributor

bparees commented Dec 6, 2017

were RBAC roles reconciled during/after the upgrade process?

@valentinabojan
Copy link

valentinabojan commented Dec 6, 2017

@pweil @bparees I will come with the detailed scenario of @sverboven.

Version

master-host$ oc version
oc v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://master-ip:8443
openshift v3.7.0+7ed6862
kubernetes v1.7.6+a08f5eeb62

Steps To Reproduce (steps that worked perfectly fine using Openshift 3.6, but that are not giving the expected result in Openshift 3.7)

  1. Run openshift-ansible playbook to install Openshift origin, with the internal docker-registry in place

  2. Configure the internal docker registry to push/pull images there
    2a. Create a regular user
    2b. Configure the regular user by giving it the system:registry, system:image-builder and admin roles
    2c. Login as the regular user
    2d. Create secret for the internal docker registry server and the regular user
    2e. Login successfully to the internal docker registry using docker login -u $(oc whoami) -p $(oc whoami -t) docker-registry-default.router.default.svc.cluster.local (following command works, too: docker login -u $(oc whoami) -p $(oc whoami -t) docker-registry.default.svc:5000)
    2f. Push the images to the internal registry
    2g. View the imagestreams on the master logged in as system:admin:

master-host$ oc get is
NAME         DOCKER REPO                                                  TAGS        UPDATED
image-name   docker-registry.default.svc:5000/custom-project/image-name   1.0         1 minute ago
  1. Deploy pod on worker node using the image from the internal docker registry

Current result
The pod fails to pull the image from the internal registry with

Failed to pull image "docker-registry-default.router.default.svc.cluster.local/custom-project/image-name:1.0": rpc error: code = 2 desc = Error: image custom-project/image-name:1.0 not found

Internal docker registry logs

time="2017-12-06T15:46:37.546838999Z" level=debug msg="authorizing request" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=d029fcef-b1f7-43d0-8240-237316042311 http.request.method=GET http.request.remoteaddr="10.131.0.1:54559" http.request.uri="/v2/" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry 
time="2017-12-06T15:46:37.546925705Z" level=error msg="error authorizing context: authorization header required" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=d029fcef-b1f7-43d0-8240-237316042311 http.request.method=GET http.request.remoteaddr="10.131.0.1:54559" http.request.uri="/v2/" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry 
10.131.0.1 - - [06/Dec/2017:15:46:37 +0000] "GET /v2/ HTTP/1.1" 401 87 "" "docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)"
time="2017-12-06T15:46:37.560008303Z" level=debug msg="anonymous token request" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=f08b42cc-a3bd-40b1-832e-34e9008bf5e0 http.request.method=GET http.request.remoteaddr="10.129.0.1:48915" http.request.uri="/openshift/token?scope=repository%3Acustom-project%2Fimage-name%3Apull" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry 
time="2017-12-06T15:46:37.560067783Z" level=info msg="response completed" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=d87ed760-0317-41f6-a1e8-e07f839b9d71 http.request.method=GET http.request.remoteaddr="10.129.0.1:48915" http.request.uri="/openshift/token?scope=repository%3Acustom-project%2Fimage-name%3Apull" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" http.response.contenttype="application/json" http.response.duration="105.069µs" http.response.status=200 http.response.written=49 instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry 
10.129.0.1 - - [06/Dec/2017:15:46:37 +0000] "GET /openshift/token?scope=repository%3Acustom-project%2Fimage-name%3Apull HTTP/1.1" 200 49 "" "docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)"
time="2017-12-06T15:46:37.568298005Z" level=debug msg="authorizing request" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=6680c002-e1a2-4fa2-b2cc-b25e9a73083a http.request.method=GET http.request.remoteaddr="10.131.0.1:54562" http.request.uri="/v2/custom-project/image-name/manifests/1.0.0" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry vars.name="custom-project/image-name" vars.reference=1.0.0 
time="2017-12-06T15:46:37.568489532Z" level=debug msg="Origin auth: checking for access to repository:custom-project/image-name:pull" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=6680c002-e1a2-4fa2-b2cc-b25e9a73083a http.request.method=GET http.request.remoteaddr="10.131.0.1:54562" http.request.uri="/v2/custom-project/image-name/manifests/1.0.0" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.auth.user=anonymous openshift.logger=registry vars.name="custom-project/image-name" vars.reference=1.0.0 
time="2017-12-06T15:46:37.570029873Z" level=error msg="OpenShift access denied: User \"system:anonymous\" cannot get imagestreams/layers.image.openshift.io in project \"custom-project\"" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=6680c002-e1a2-4fa2-b2cc-b25e9a73083a http.request.method=GET http.request.remoteaddr="10.131.0.1:54562" http.request.uri="/v2/custom-project/image-name/manifests/1.0.0" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.auth.user=anonymous openshift.logger=registry vars.name="custom-project/image-name" vars.reference=1.0.0 
time="2017-12-06T15:46:37.570101749Z" level=error msg="error authorizing context: access denied" go.version=go1.8.3 http.request.host=docker-registry-default.router.default.svc.cluster.local http.request.id=6680c002-e1a2-4fa2-b2cc-b25e9a73083a http.request.method=GET http.request.remoteaddr="10.131.0.1:54562" http.request.uri="/v2/custom-project/image-name/manifests/1.0.0" http.request.useragent="docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)" instance.id=bd0cbd99-a601-4188-bb43-e57dbc7de473 openshift.logger=registry vars.name="custom-project/image-name" vars.reference=1.0.0
10.131.0.1 - - [06/Dec/2017:15:46:37 +0000] "GET /v2/custom-project/image-name/manifests/1.0.0 HTTP/1.1" 401 175 "" "docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)"
10.131.0.1 - - [06/Dec/2017:15:46:37 +0000] "GET /v1/_ping HTTP/1.1" 404 19 "" "docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)"
10.131.0.1 - - [06/Dec/2017:15:46:37 +0000] "GET /v1/repositories/custom-project/image-name/images HTTP/1.1" 404 19 "" "docker/1.12.5 go/go1.7.4 kernel/3.10.0-327.el7.x86_64 os/linux arch/amd64 UpstreamClient(Go-http-client/1.1)"

Important
We haven't changed anything during the upgrade procedure regarding security constraints, secrets, service accounts or role bindings. Everything remained as it was configured for working with Openshift Origin 3.6

@ghost
Copy link

ghost commented Dec 7, 2017

@pweil- @bparees

From the internal docker registry logs, we can derive that the pods are being authorized as system:anonymous. This is not what we expected, since we defined imagePullSecrets for the deployment which contains a secret with a user that has roles system:image-builder, system:image-puller and system:registry. Additionally, we added role system:image-puller to the service account that is used for the deployment.
When we add define system:image-puller to system:anonymous, the pods can pull the images and start running.

@bparees
Copy link
Contributor

bparees commented Dec 7, 2017

@Jens-vd can you confirm via the pod.yaml what the service account is for the pod?

I think there are also some issues/changes in 3.7 w/ the format of the dockercfg secret to be used. The old .dockercfg format is no longer accepted and only the docker/config.json format is allowed. Can you show us your secret content (redacting the token value of course)? And/or show us how you're creating your secret.

@valentinabojan
Copy link

@bparees I will paste the command that we used to create the secret (for @Jens-vd)

oc login -u some-user -p some-password
oc create secret docker-registry docker-registry-secret \
    --namespace=custom-project \
    --docker-server=docker-registry-default.router.default.svc.cluster.local \
    --docker-username=some-user \
    --docker-password=$(oc whoami -t) \
    --docker-email=someuseremail@gmail.com

@bparees
Copy link
Contributor

bparees commented Dec 8, 2017

@mfojtik @sjenning @pweil- I can never remember, does the secret need to be linked to the pod's service account in order for it to be able to use a particular secret when pulling the image?

What other debug can we do to determine why it would appear that the secret is not being picked up when the pod attempts to pull the image? (Or is there a way to see what secret the pod is using when it attempts to pull the image?)

@ghost
Copy link

ghost commented Dec 11, 2017

@bparees Here is more detailed information regarding our configurations.

Configuration of the service account:

$ oc describe sa <custom-service-account>
Name:		<custom-service-account>
Namespace:	<custom-project>
Labels:		<none>
Annotations:	kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{"kubernetes.io/change-cause":"oc apply --filename=<path> --recur...
		kubernetes.io/change-cause=oc apply --filename=<path> --recursive=true --record=true

Image pull secrets:	<custom-project>-dockercfg-xmdlq
                   	docker-registry-secret

Mountable secrets: 	<custom-project>-dockercfg-xmdlq
                   	<custom-project>-token-bnrss

Tokens:            	<custom-project>-token-bnrss
                   	<custom-project>-token-pf4rw

Events:	<none>

Configuration of the pod which cannot pull an image from the registry:

$ oc export pod <custom-pod> -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicaSet","namespace":"<custom-project>","name":"<custom-pod>-3192653527","uid":"e2779ad7-de52-11e7-88e6-0a49faba971a","apiVersion":"extensions","resourceVersion":"11734"}}
    openshift.io/scc: anyuid
  creationTimestamp: null
  generateName: <custom-pod>-3192653527-
  labels:
    # ...
  ownerReferences:
  - apiVersion: extensions/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: <custom-pod>-3192653527
    uid: e2779ad7-de52-11e7-88e6-0a49faba971a
spec:
  affinity:
    # ...
  containers:
  - env:
      # ...
    envFrom:
      # ...
    image: docker-registry-default.router.default.svc.cluster.local/<custom-project>/<custom-project>-<custom-pod>:1.0.0
    imagePullPolicy: IfNotPresent
    name: <custom-pod>
    resources: {}
    securityContext:
      capabilities:
        drop:
        - MKNOD
      privileged: false
      seLinuxOptions:
        level: s0:c11,c0
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: <custom-project>-token-bnrss
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: docker-registry-secret
  nodeName: <custom-node>
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seLinuxOptions:
      level: s0:c11,c0
  serviceAccount: <custom-project>
  serviceAccountName: <custom-project>
  terminationGracePeriodSeconds: 30
  volumes:
  - name: <custom-project>-token-bnrss
    secret:
      defaultMode: 420
      secretName: <custom-project>-token-bnrss

Configuration of the docker registry secret:

oc export secret docker-registry-secret -o yaml
apiVersion: v1
data:
  .dockercfg: <base64-encoded-value>
kind: Secret
metadata:
  creationTimestamp: null
  name: docker-registry-secret
type: kubernetes.io/dockercfg

@bparees
Copy link
Contributor

bparees commented Dec 11, 2017

That all looks reasonable to me.. @mfojtik or @sjenning how can we determine what secret the node/pod is using to attempt to pull the pod image?

@sjenning
Copy link
Contributor

Honestly, I'd need to look into it more. I can recreate this situation though; that being an image pushed to the internal registry from outside the cluster and can't deploy pods that reference that image because whatever is doing the pulling is not authorized (resolves to system:anonymous) according to the internal registry logs.

I did notice that the error authorizing context: authorization header required error is caused by colons in the username for the docker login; the result of oc whoami returning a seviceaccount name. If I just use some placeholder (the username doesn't matter when using token auth against the registry), I could pull the image out of the registry from an external machine with the registry exposed via a Route.

However, I don't seem to be able to pull the image from inside the cluster for deployment using the same SA so.. that is strange. There must be something I'm missing. @liggitt any help?

@liggitt
Copy link
Contributor

liggitt commented Dec 12, 2017

I could pull the image out of the registry from an external machine with the registry exposed via a Route.

However, I don't seem to be able to pull the image from inside the cluster for deployment using the same SA so.. that is strange. There must be something I'm missing. @liggitt any help?

presumably the URL used to reference the registry inside and outside the cluster is different? are you manually creating the image pull secret for use inside the cluster

@liggitt
Copy link
Contributor

liggitt commented Dec 13, 2017

wondering if kubernetes/kubernetes#25435 is related... normalization of the URL for the passed credentials... can you try creating the credential for --docker-server=docker-registry-default.router.default.svc.cluster.local:443 as well?

@ghost
Copy link

ghost commented Dec 13, 2017

The result is the same. I created a second secret (trying 2 different ways taking @sjenning 's comment into account):

oc create secret docker-registry docker-registry-secret-2 \
    --namespace=custom-project \
    --docker-server=docker-registry-default.router.default.svc.cluster.local:443 \
    --docker-username=some-user \
    --docker-password=$(oc whoami -t) \
    --docker-email=someuseremail@gmail.com
oc create secret docker-registry docker-registry-secret-2 \
    --namespace=custom-project \
    --docker-server=docker-registry-default.router.default.svc.cluster.local:443 \
    --docker-username=anyuser \
    --docker-password=$(oc whoami -t) \
    --docker-email=someuseremail@gmail.com

I updated the pod's configuration accordingly:

imagePullSecrets:
  - name: docker-registry-secret-2
  - name: docker-registry-secret

I also created the deployment using 2 different values for the image:

image: "docker-registry-default.router.default.svc.cluster.local:443/<custom-project>/<custom-project>-<custom-pod>:1.0.0"

and

image: "docker-registry-default.router.default.svc.cluster.local/<custom-project>/<custom-project>-<custom-pod>:1.0.0"

@wanghaoran1988
Copy link
Member

oc export secret docker-registry-secret -o yaml
apiVersion: v1
data:
.dockercfg:
kind: Secret
metadata:
creationTimestamp: null
name: docker-registry-secret
type: kubernetes.io/dockercfg

Can you try this:

  1. docker login using your user/token to the internal registry
  2. Create the secret like this:
    oc secrets new my-secret .dockerconfigjson=path/to/.docker/config.json
  3. link the secret to the service account

I am still wondering the command you used create the old docker config secret.

@ghost
Copy link

ghost commented Dec 13, 2017

I managed to get the pod running by performing the steps you suggested. I first created the secret as we did before. Then, I captured the base64 contents shown when exporting the secret and put the decoded content in a new file dockercfg.json. I used this file to create a new secret which I linked to the service account we are using for the pod. This was not sufficient for getting the pod to run, but when I added the new secret to the imagePullSecrets of the pod, it was able to pull the image.

Steps I performed summarized:

  • Get dockercfg contents
$ oc export secret docker-registry-secret -o yaml
apiVersion: v1
data:
  .dockercfg: <base64-dockercfg>
kind: Secret
metadata:
  creationTimestamp: null
  name: dockercfg-secret
type: kubernetes.io/dockercfg
  • Decode <base64-dockercfg> and put the contents in file dockercfg.json
  • Create new secret based on dockercfg.json and link it to the service account used by the pod
$ oc secrets new my-secret .dockerconfigjson=dockercfg.json
secret/my-secret
$ oc secrets add serviceaccount/<custom-project> secrets/my-secret --for=pull
$ oc secrets add serviceaccount/builder secrets/my-secret
  • At this moment, the pod cannot pull the image yet
  • Add the new secret to the imagePullSecrets of the deployment to which the pod belongs
imagePullSecrets:
  - name: my-secret
  - name: docker-registry-secret
  • Now the pod can pull the image

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

@juanvallejo @liggitt seems like there's still an issue w/ how oc secret is creating secrets when you use the oc create secret docker-registry helper, though I don't see any obvious difference between the secrets it creates and the secrets we put into the project by default. (both use the .dockercfg key, not the .dockerconfigjson key)

@juanvallejo
Copy link
Contributor

@bparees will take a look

@juanvallejo
Copy link
Contributor

@bparees There was a PR from a few months ago that defaulted secrets created using the new-dockercfg helper to config.json format.

Based on what I've seen locally with the docker-registry helper, it creates a secret in the old .dockercfg format, which I'm suspecting is the cause behind this issue. Will open a PR updating the docker-registry helper to store auth data in the new config.json format under the .dockercfg field of a secret

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

@liggitt @juanvallejo i'm still a little confused because if you look at the SA secrets we create, they are in the old format and seem to work fine:

$ oc get secret builder-dockercfg-ckhdr -o yaml
apiVersion: v1
data:
  .dockercfg: <redacted>
kind: Secret
metadata:
  annotations:
    kubernetes.io/service-account.name: builder
    kubernetes.io/service-account.uid: f81faf30-e022-11e7-9632-507b9d27b5d9
    openshift.io/token-secret.name: builder-token-b5xsl
    openshift.io/token-secret.value: <redacted>
  creationTimestamp: 2017-12-13T16:30:46Z
  name: builder-dockercfg-ckhdr
  namespace: myproject
  resourceVersion: "944"
  selfLink: /api/v1/namespaces/myproject/secrets/builder-dockercfg-ckhdr
  uid: f8b656b1-e022-11e7-9632-507b9d27b5d9
type: kubernetes.io/dockercfg

@liggitt
Copy link
Contributor

liggitt commented Dec 13, 2017

the content of the .dockercfg: <redacted> bit is the interesting part for image pull secrets

@liggitt
Copy link
Contributor

liggitt commented Dec 13, 2017

Will open a PR updating the docker-registry helper to store auth data in the new config.json format under the .dockercfg field of a secret

that doesn't sound proper... the name should match the format

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

the content of the .dockercfg: bit is the interesting part for image pull secrets

@liggitt that's what I figured, and yet:
#17523 (comment)

@ghost
Copy link

ghost commented Dec 13, 2017

@liggitt @bparees This is the contents of .dockercfg for our docker-registry-secret:

{  
  "auths":{  
    "docker-registry-default.router.default.svc.cluster.local":{  
      "username":"some-user",
      "password":"<password>",
      "email":"someuseremail@gmail.com",
      "auth":"<encoded>"
    }
  }
}

Where <encoded> translates to the following:

some-user:<password>

@bparees
Copy link
Contributor

bparees commented Dec 13, 2017

this does not appear to be a registry issue(registry auth appears to be working fine when a valid secret is presented), this is an issue w/ how k8s creates/finds/uses secrets, so fixing the issue ownership.

@sjenning
Copy link
Contributor

@DirectXMan12 PTAL

@sjenning sjenning assigned DirectXMan12 and unassigned sjenning Jan 22, 2018
@soltysh
Copy link
Member

soltysh commented Jan 23, 2018

It looks like your problem seems to be related with the one described in https://bugzilla.redhat.com/show_bug.cgi?id=1531511, can you please verify?

@ghost
Copy link

ghost commented Feb 9, 2018

We have changed the way we authorize pods for pulling images from the docker registry. Since the imagePullSecrets were not working as desired for us (and are more intended for pulling from external registries), we decided to investigate how we could authorize pods to the registry using their service account token.

This was not working when using the docker registry route (docker-registry-default.router.default.svc.cluster.local), but when using the docker registry service (docker-registry.default.svc:5000) pods were authorizing correctly using their service account token. This allowed us to remove the imagePullSecrets in the pod templates and the dockercfg secret we had to create for this.

@soltysh
Copy link
Member

soltysh commented Apr 10, 2018

I think the root cause of this issue was fixed in #18062 for 3.7. I'm closing this issue based on that. If you're still experiencing issues and you're using latest version of oc which includes the aforementioned fix please reopen.

@soltysh soltysh closed this as completed Apr 10, 2018
@zoobab
Copy link

zoobab commented Jan 17, 2019

I still have the issue with an openshift cluster 3.9, the docker client wrongly returns an error "Error: image myimage/myimage:latest not found" while it is not logged in. It should return an error like "Authentication required".

@bparees
Copy link
Contributor

bparees commented Jan 17, 2019

"Error: image myimage/myimage:latest not found" while it is not logged in. It should return an error like "Authentication required".

that would reveal/confirm that the image exists, to users who potentially should not know the image exists.

@pontuspalmenas
Copy link

I still have the issue with an openshift cluster 3.9, the docker client wrongly returns an error "Error: image myimage/myimage:latest not found" while it is not logged in. It should return an error like "Authentication required".

Awesome. I had this frustrating issue and didn't realize that I had to docker login as well, not just oc login, until I read this comment.

@jribmartins
Copy link

hello all,

I had the same problem (version 3.9), and resolved adding the the role image-puller to my pod service account, in this case: service account "default"
oc policy add-role-to-user system:image-puller system:serviceaccount:<project-id>:default -n <project-id>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests