Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to pull images from private registry inside vcluster #33

Closed
eranbibi opened this issue Aug 13, 2020 · 7 comments
Closed

unable to pull images from private registry inside vcluster #33

eranbibi opened this issue Aug 13, 2020 · 7 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@eranbibi
Copy link

eranbibi commented Aug 13, 2020

Hi

I am trying to deploy an application in which the image is hosted on a private ACR registry. it is failing because of an image pulling issue.

I set up kubenetes secret as following
kubectl -n aqua create secret docker-registry registry-creds --docker-server=#####.azurecr.io --docker-username=##### --docker-password=##### --docker-email=#####

and the deployment of the application is set with
imagePullSecrets:
- name: registry-creds

And service account that is using the same image pull secret

Deployment failed with
aqua-db-c988798b4-j8vsn 0/1 ImagePullBackOff 0 15s

the describe pod Events:

Type Reason Age From Message


Normal Scheduled 13m default-scheduler Successfully assigned aqua/aqua-db-c988798b4-j8vsn to gke-gke8010-default-pool-9eb8eb33-pnbd
Warning SyncError 13m pod-syncer Error updating pod: Operation cannot be fulfilled on pods "aqua-db-c988798b4-j8vsn": the object has been modified; please apply your changes to the latest version and try again
Normal BackOff 11m (x6 over 13m) kubelet, gke-gke8010-default-pool-9eb8eb33-pnbd Back-off pulling image "#####.azurecr.io/database:5.0.0"
Normal Pulling 11m (x4 over 13m) kubelet, gke-gke8010-default-pool-9eb8eb33-pnbd Pulling image "#####.azurecr.io/database:5.0.0"
Warning Failed 11m (x4 over 13m) kubelet, gke-gke8010-default-pool-9eb8eb33-pnbd Error: ErrImagePull
Warning Failed 11m (x4 over 13m) kubelet, gke-gke8010-default-pool-9eb8eb33-pnbd Failed to pull image "#####.azurecr.io/database:5.0.0": rpc error: code = Unknown desc = Error response from daemon: Get https://*****.azurecr.io/v2/database/manifests/5.0.0: unauthorized: authentication required, visit https://aka.ms/acr/authorization for more information.
Warning Failed 2m55s (x43 over 13m) kubelet, gke-gke8010-default-pool-9eb8eb33-pnbd Error: ImagePullBackOff

The above is working well, on any “regular” cluster

@FabianKramm
Copy link
Member

FabianKramm commented Aug 14, 2020

@eranbibi thanks for reporting this issue! Thats actually a bug in the virtual cluster where the imagePullSecrets names are not translated into the real existing physical ones, which results in the error message because the secret couldn't be found in the host cluster. I'll fix this and make a new release.

@FabianKramm FabianKramm added the kind/bug Something isn't working label Aug 14, 2020
@FabianKramm FabianKramm self-assigned this Aug 14, 2020
@FabianKramm
Copy link
Member

@eranbibi this issue should be fixed with loft version v0.3.7! The easiest way is to just create a new virtual cluster for the fix to take effect. If you want to upgrade an existing virtual cluster, you have to modify the CRD settings, which can be done in the UI through the Show Yaml button and add

chart:
  version: 0.0.1-beta.20

and press 'Update' as seen in this screenshot:

Bildschirmfoto 2020-08-14 um 09 52 53

@eranbibi
Copy link
Author

Hi @FabianKramm

First, thank you for the super quick response and new release.

I was creating a new loft environment with latest loft v0.3.7 (deployed using your helm chart)

I created new space and new vc, and when I deployed my app I faced the exact same issue of Error: ErrImagePull

I doubled confirmed that I am indeed using v0.3.7 as it seems the fix is not there.

Then I was trying to follow your other suggestion and adding the

chart:
version: 0.0.1-beta.20

To the vc yaml file.

But when I hit “Save” I got an error message

Failed to save state in cluster real-cluster-gke1
Error: Operation cannot be fulfilled on virtualclusters.storage.loft.sh "csp3-vc": the object has been modified; please apply your changes to the latest version and try again (Conflict)

What are you suggesting?

@FabianKramm
Copy link
Member

@eranbibi Regarding the first issue, that is odd, I tested it on my install and it worked for me. Can you check in the host cluster where the vcluster was created in, how the pod yaml looks (kubectl get pods -n vcluster-NAME -o yaml) and if the referenced secret under imagePullSecrets exists in the namespace? Would be good if you could post the yamls here.

Regarding the second issue: that usually occurs if there was an update to the crd in between your modification and pressing Update to the resource, we probably should change that to a patch instead of an update, however usually this is solved by just refreshing the table and reapplying the update to the resource

@FabianKramm FabianKramm reopened this Aug 15, 2020
@eranbibi
Copy link
Author

eranbibi commented Aug 15, 2020

Hi @FabianKramm
I run the following command on the host cluster context

kubectl get pod aqua-db-7c878cfc5b-fs29c-x-aqua-x-vc1 -n dev-space1 -o yaml

see attached vs1_db_pod.txt

there is
imagePullSecrets:
- name: aqua-registry-x-aqua-x-vc1

And i confirm the secret exist on the name space using the following command
kubectl get secrets -n dev-space1

let me know what additional information i should provide, or even getting you an access to my cluster if needed.

Edit: @FabianKramm I think this is related to a service account configuration that doesn't populated to the real cluster

in the virtual cluster i am creating the deployment with sa called "aqua-sa" (and a the pull secret on this sa)

ubuntu@gke8040-3064:~$ kubectl get sa -n aqua
NAME SECRETS AGE
default 1 48m
aqua-sa 1 48m

on the host cluster i dont see it

ubuntu@gke8040-3064:~$ kubectl get sa -n dev-space1
NAME SECRETS AGE
default 1 52m
vc-vc1 1 52m

can you check my assumption

thanks,
Eran

@FabianKramm
Copy link
Member

FabianKramm commented Aug 17, 2020

@eranbibi I investigated a little bit more and it seems that sometimes the secret type was not correctly synced, which caused your issue. I fixed it and this time it should work (loft v0.3.9), I tested it with multiple configurations.

Regarding the service accounts: we don't need to sync them, because only the service account from the virtual cluster is used (only the secret of the service account is needed as it is bound as a volume which will also be synced to the host cluster) instead of the host cluster which is why they are not needed in there. Pull secrets specified in the virtual service account are automatically applied to the pod configuration which are then correctly translated to the host secrets and will work.

@eranbibi
Copy link
Author

Hi @FabianKramm

i was able to confirm that the issue was resolved. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants