-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nodes created by Karpenter are unable to pull images from a private Azure Container Registry (ACR), resulting in a 401 Unauthorized error #411
Comments
@ATymus I recommend enabling the debug log level in Karpenter, redeploying and sharing more Resource Specs and Logs: |
I have debug mode enabled in Karpenter but no errors for this problem
|
@danielhamelberg Looks like the issue is reproducible. Both grant permission manually or using Permission is definitely there: Using Provide the whole issue demo set-up here: (Please execute the command one by one, as I did not write the command to grant "Azure Kubernetes Service RBAC Cluster Admin" to the logged-in user.) ranNum=$(echo $RANDOM)
rG=aks-auto-${ranNum}
aks=aks-auto-${ranNum}
acr=acrauto${ranNum}
location=southeastasia
az extension add --name aks-preview
az group create -n ${rG} -l ${location} -o none
# Specify "Standard_D8pds_v5" as this is the one in my sub can be created among 3 availability zones
az aks create -n ${aks} -g ${rG} --node-vm-size Standard_D8pds_v5 \
--sku automatic --no-ssh-key
az acr create --resource-group ${rG} --name ${acr} --sku Basic
az acr login --name ${acr}
docker pull nginx
docker tag nginx ${acr}.azurecr.io/nginx
docker push ${acr}.azurecr.io/nginx
kubeletObjID=$(az aks show -n ${aks} -g ${rG} --query identityProfile.kubeletidentity.objectId -o tsv)
acrResID=$(az resource show -n ${acr} -g ${rG} \
--namespace Microsoft.ContainerRegistry --resource-type registries --query id -o tsv)
az role assignment create --assignee-object-id ${kubeletObjID} \
--assignee-principal-type ServicePrincipal --role "AcrPull" --scope ${acrResID}
# Grant your own user as "Azure Kubernetes Service RBAC Cluster Admin" and I skip the CLI command for that here.
az aks get-credentials -n ${aks} -g ${rG}
# Deploy test Pod
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: ${acr}.azurecr.io/nginx
imagePullPolicy: IfNotPresent
EOF
# Wait for 3 mins for new node being provisioned and see the result
sleep 180;
kubectl describe po nginx
# Result: ImagePullErr
kubectl delete po nginx
# Try `--attach-acr` method, which is intended approach
az aks update -n ${aks} -g ${rG} --attach-acr ${acr}
# Deploy Pod again
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
containers:
- name: nginx
image: ${acr}.azurecr.io/nginx
imagePullPolicy: IfNotPresent
EOF
# Wait for 3 mins for new node being provisioned and see the result
sleep 180;
kubectl describe po nginx
# Still failed Debug: root [ / ]# crictl pull acrauto2462.azurecr.io/nginx
E0807 18:18:39.889144 21228 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"acrauto2462.azurecr.io/nginx:latest\": failed to resolve reference \"acrauto2462.azurecr.io/nginx:latest\": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://acrauto2462.azurecr.io/oauth2/token?scope=repository%3Anginx%3Apull&service=acrauto2462.azurecr.io: 401 Unauthorized" image="acrauto2462.azurecr.io/nginx"
FATA[0000] pulling image: failed to pull and unpack image "acrauto2462.azurecr.io/nginx:latest": failed to resolve reference "acrauto2462.azurecr.io/nginx:latest": failed to authorize: failed to fetch anonymous token: unexpected status from GET request to https://acrauto2462.azurecr.io/oauth2/token?scope=repository%3Anginx%3Apull&service=acrauto2462.azurecr.io: 401 Unauthorized We can see using At the time, I am realizing something. So I avoid use the node created by Karpenter and use system nodepool instead: cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
name: nginx-test
spec:
nodeSelector:
kubernetes.azure.com/agentpool: nodepool1
tolerations:
- key: CriticalAddonsOnly
operator: Exists
containers:
- name: nginx
image: ${acr}.azurecr.io/nginx
imagePullPolicy: IfNotPresent
EOF
kubectl logs nginx-test
exec /docker-entrypoint.sh: exec format error |
We have just discovered a suspect: This overall issue seems to not be present on 1.28 as well (from my reproduction attempt, at least), further backing that claim. Will give updates on the potential fix for this. |
Looks like from this code for out of tree provider we default to not including the settings for out of tree provider if CredentialProviderURL is empty.
We don't have the logic conditionally enabled for 1.29, so auth pull will not work for that specific kubernetes version. This can be easily fixed by A) Switching 1.29 to use out of tree credential provider.(Have karpenter pass in the rest of the required OOT Provider kubelet flags.) I believe its best we use option A. |
Merged in the fix, need to still release it so keeping this open for tracking |
@Bryce-Soghigian - do you know when this will be released? I am trying to triangulate if I should wait or go back to standard node pools for now. |
@vikas-rajvanshy Its in the current release that's rolling out. I believe you can track it via the aks release tracker https://releases.aks.azure.com/. The fix is a part of the |
Version
Karpenter Version: v0.5.0
Kubernetes Version: v1.29.4
Expected Behavior
The expected behavior is that the nodes can access the private ACR using the configured managed identity.
Actual Behavior
Nodes created by Karpenter and regular Kubernetes nodes both have the same managed identity configured. This managed identity has been granted both AcrPull and AcrPush roles on the ACR. However, while pods on regular Kubernetes nodes can successfully pull images from the private ACR, pods on nodes created by Karpenter fail with the following error: 401 Unauthorized
Steps to Reproduce the Problem
az aks update -n aks-dev -g rg-dev --attach-acr myregistry
Resource Specs and Logs
Community Note
The text was updated successfully, but these errors were encountered: