Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ECR build permissions issue with Kaniko v1.19.0 & v1.19.1 (found using v1.19.0-debug & v1.19.1-debug) #2882

Closed
AndrewFarley opened this issue Nov 29, 2023 · 26 comments · Fixed by #2885 or #2908
Labels
area/authentication area/aws kind/bug Something isn't working priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. registry/ecr regression/v1.18.0 regression

Comments

@AndrewFarley
Copy link

AndrewFarley commented Nov 29, 2023

Actual behavior
I noticed that the 1.19 build was just promoted for the debug tag in gcr and that since this occurred all our builds have been failing. We've had a build process working for years now on debug without more version locking than that. However, it appears 1.19 breaks something. Here's the command and log...

$ /kaniko/executor --context /builds/replaced/for-privacy --dockerfile /builds/replaced/for-privacy/Dockerfile --build-arg ENV_NAME=devk8s --destination REPLACED-FOR-PRIVACY.dkr.ecr.us-west-2.amazonaws.com/replaced/for-privacy:latest --destination REPLACED-FOR-PRIVACY.dkr.ecr.us-west-2.amazonaws.com/replaced/for-privacy:dev-4020ff2f802c66db1485e30334347fe3d752f06a --cache --cache-repo REPLACED-FOR-PRIVACY.dkr.ecr.us-west-2.amazonaws.com/kaniko-cache
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "REPLACED-FOR-PRIVACY.dkr.ecr.us-west-2.amazonaws.com/replaced/for-privacy:latest": POST https://REPLACED-FOR-PRIVACY.dkr.ecr.us-west-2.amazonaws.com/v2/replaced/for-privacy/blobs/uploads/: unexpected status code 401 Unauthorized: Not Authorized

Expected behavior
That same exact command run on the same exact runner in the same exact instance in the same exact environment when run with any version of debug released in the last two years including 1.18 works perfectly. The setup here is we are using an AWS IAM instance role on an EC2 instance which is running a "gitlab runner" to perform our Kaniko builds. I don't want to use "docker-in-docker" for security reasons so am using Kaniko so we can do Docker builds from within' docker. Gitlab runner is configured to run it's executor as docker images, thus the kaniko image above. The way I fixed this issue is I version pinned to use gcr.io/kaniko-project/executor:v1.18.0-debug instead of simply gcr.io/kaniko-project/executor:debug. To confirm the issue, I temporarily pinned using the v1.19.0-debug image and the issue re-surfaced. I do not know enough about Kaniko to gather more debug data, if needed please provide me recommendation for command(s)/args to add to provide more logs to you about the possibly underlying issue.

Note: I searched through the issues and couldn't find a duplicate, sorry if there is one!

@rrossouw01
Copy link

JFYI we started having issues pushing to ECR a few hours ago and using gcr.io/kaniko-project/executor:v1.18.0-debug works for us.

error:

+ /kaniko/executor --context /home/jenkins/agent/workspace/microservice/catalog/build --destination <account>.dkr.ecr.us-west-2.amazonaws.com/microservice-catalog:792d704
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "<account>.dkr.ecr.us-west-2.amazonaws.com/microservice-catalog:792d704": POST https://<account>.dkr.ecr.us-west-2.amazonaws.com/v2/microservice-catalog/blobs/uploads/: unexpected status code 401 Unauthorized: Not Authorized

was using this image before: gcr.io/kaniko-project/executor:debug

@cgill27
Copy link

cgill27 commented Nov 29, 2023

Added a thumbs up but wanted to also say seeing the same exact thing with the latest version of debug and rolling back to 1.18 works fine

@AndrewFarley
Copy link
Author

Okay well dang guys, if it's THAT bad, to me that's a sign they need to pull that image immediately and re-tag the 1.18 release ASAP.

@aaron-prindle aaron-prindle added regression regression/v1.18.0 priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. area/aws registry/ecr kind/bug Something isn't working area/authentication labels Nov 29, 2023
@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Nov 29, 2023

Thanks for flagging this @AndrewFarley, are you able to post the debug logs for a failed run w/ kaniko using --verbosity debug (w/ any information necessary redacted). Below is the the list of change PRs from v1.18.0 -> v1.19.0, nothing is immediately obvious to me from these here that might cause this regression (aws-sdk-go-v2 was updated but that is upstream):

  • chore(deps): bump docker/build-push-action from 5.0.0 to 5.1.0 #2857
  • chore(deps): bump github.com/aws/aws-sdk-go-v2 from 1.22.1 to 1.22.2 #2846
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.22.0 to 1.24.0 #2851
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.25.5 to 1.25.8 #2875
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.1 to 1.14.0 #2861
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.14.0 to 1.14.3 #2874
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.0 to 1.42.1 #2847
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.43.0 to 1.44.0 #2872
  • chore(deps): bump github.com/containerd/containerd from 1.7.8 to 1.7.9 #2873
  • chore(deps): bump golang.org/x/net from 0.17.0 to 0.18.0 #2859
  • chore(deps): bump golang.org/x/oauth2 from 0.13.0 to 0.14.0 #2871
  • chore(deps): bump golang.org/x/sys from 0.13.0 to 0.14.0 #2848
  • chore(deps): bump google-github-actions/auth from 1.1.1 to 1.2.0 #2868
  • chore(deps): bump google.golang.org/api from 0.149.0 to 0.150.0 #2845
  • chore(deps): bump google.golang.org/api from 0.150.0 to 0.151.0 #2862
  • chore(deps): bump sigstore/cosign-installer from 3.1.2 to 3.2.0 #2849
  • fix: create intermediate directories in COPY with correct uid and gid #2795
  • feat: support https URLs for digest-file #2811
  • fix: makefile container-diff on darwin #2842
  • fix: print error to stderr instead of stdout before exiting #2823
  • fix: Remove references to deprecated io/ioutil pkg #2867
  • fix: resolve issue with copy_multistage_test.go and broken ioutil import #2879
  • fix: resolve warmer memory leak. #2763
  • fix: skip the /kaniko directory when copying root #2863
  • impl: add a retry with result function (impl: add a retry with result function #2837) #2853
  • impl: add a retry with result function #2837
  • refactor: rm bool param detectFilesystem in InitIgnoreList #2843

@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Nov 29, 2023

I have reverted the latest, debug, and slim tags to v1.18.0 given the regression identified here. I don't readily have access to a repro setup currently so if anyone in the thread here can help attempting to identify the root cause of the issue would that would be much appreciated

@cgill27
Copy link

cgill27 commented Nov 29, 2023

Here's a screenshot with --verbosity debug in my Gitlab pipeline job but I don't see much that's helpful:
image

Here's a screenshot of the actual Gitlab job:
image

@cgill27
Copy link

cgill27 commented Nov 29, 2023

I've also added the below env vars to the job and they made no difference in my case:

AWS_SDK_LOAD_CONFIG=true
AWS_EC2_METADATA_DISABLED=true (and tried false)

@AndrewFarley
Copy link
Author

chore(deps): bump github.com/aws/aws-sdk-go-v2 from 1.22.1 to 1.22.2 https://github.com/GoogleContainerTools/kaniko/pull/2846
chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.22.0 to 1.24.0 https://github.com/GoogleContainerTools/kaniko/pull/2851
chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.25.5 to 1.25.8 https://github.com/GoogleContainerTools/kaniko/pull/2875
chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.13.1 to 1.14.0 https://github.com/GoogleContainerTools/kaniko/pull/2861
chore(deps): bump github.com/aws/aws-sdk-go-v2/feature/s3/manager from 1.14.0 to 1.14.3 https://github.com/GoogleContainerTools/kaniko/pull/2874
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.42.0 to 1.42.1 https://github.com/GoogleContainerTools/kaniko/pull/2847
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.43.0 to 1.44.0 https://github.com/GoogleContainerTools/kaniko/pull/2872

Based on the above list and the verbose debug this feels like it's probably one of the AWS dependency updates that caused this, meaning that this problem might be an upstream issue in one of these bumps above. If someone has the time, might be good to rebuild kaniko debug but reverting these above commits and see if it makes a difference. Sorry I don't have the time to do that. :(

@mattj150
Copy link

Just noticed this on our builds too. Gitlab CI runners running on a AWS EKS Cluster. Reverting to 1.18 solved the issue. This setup has been working for us for well over a year.

$ echo "{\"credsStore\":\"ecr-login\"}" > /kaniko/.docker/config.json
$ /kaniko/executor --context "${CI_PROJECT_DIR}" --dockerfile "${CI_PROJECT_DIR}/${DOCKERFILE_PATH}" --no-push --cache=true --cache-repo ${CACHE_REPO} --cache-ttl ${CACHE_TTL} --build-arg PIP_CONFIG_PATH=/kaniko/pip.conf
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "xxxx.dkr.ecr.us-east-1.amazonaws.com/xxxx/cache": POST https://xxxx.dkr.ecr.us-east-1.amazonaws.com/v2/xxxx/cache/blobs/uploads/: unexpected status code 401 Unauthorized: Not Authorized

(Account ID and repo redacted)

@aaron-prindle aaron-prindle changed the title Build permissions issue with debug (1.19 build) AWS ECR build permissions issue with v1.19.0 (found using v1.19.0-debug) Nov 29, 2023
@aaron-prindle aaron-prindle changed the title AWS ECR build permissions issue with v1.19.0 (found using v1.19.0-debug) AWS ECR build permissions issue with Kaniko v1.19.0 (found using v1.19.0-debug) Nov 29, 2023
@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Nov 29, 2023

@dennis-helm-sp I believe this is only affecting AWS ECR users atm from this thread, if someone in the thread encountered an issue w/ v1.19.0 using a different registry (not AWS ECR) please comment with that information. Kaniko's unit tests themselves use docker-credential-gcloud which also lives in .docker/config which makes me think this is specific to some AWS ECR dependencies. I don't believe this regression is related to #2863 as that should only affect kaniko's logic for Dockerfile COPY commands and Push would not honor that specific otai Skip config for /kaniko (eg: /kaniko/.docker/config) even with that change IIUC (haven't root caused this though so not 100% sure)

@railsharipov
Copy link

railsharipov commented Nov 29, 2023

We have the same issue when using latest debug version with AWS ECR, reverting to v1.18.0-debug fixes the issue

@pdecat
Copy link
Contributor

pdecat commented Nov 30, 2023

Maybe related to aws/aws-sdk-go-v2#2370

If it is, the solution is to upgrade all AWS SDK dependencies at once:
aws/aws-sdk-go-v2#2370 (comment)

@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Nov 30, 2023

@pdecat thank you for tagging the issue aws/aws-sdk-go-v2#2370 (comment), I believe this is the root cause here.

@aaron-prindle
Copy link
Collaborator

This should now be fixed with the release of v1.19.1 - https://github.com/GoogleContainerTools/kaniko/releases/tag/v1.19.1

Closing

@cm3lindsay
Copy link

This should now be fixed with the release of v1.19.1 - https://github.com/GoogleContainerTools/kaniko/releases/tag/v1.19.1

Closing

It seems that v1.19.1 still has the same issue.
For now we've downgraded from v1.19.1 to v1.18.0 to unblock our CI.

@alex-hempel
Copy link

Seeing the same issue as @cm3lindsay, pinning 1.18.0 is our workaround as well.

@aaron-prindle aaron-prindle reopened this Dec 15, 2023
@aaron-prindle aaron-prindle changed the title AWS ECR build permissions issue with Kaniko v1.19.0 (found using v1.19.0-debug) AWS ECR build permissions issue with Kaniko v1.19.0 & v1.19.1 (found using v1.19.0-debug & v1.19.1-debug) Dec 15, 2023
@ChristopherKlinge
Copy link

@aaron-prindle,
if you are able to provide a nightly build when a new version is available, I'd be willing to test it on one of our pipelines ahead of a full-release.

@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Dec 15, 2023

Thank you @ChristopherKlinge, I will ping the thread here for test assistance in trying to fix this in another patch release. Additionally kaniko has per PR merge image builds (closest thing to nightly) available @ the labels w/ the git commit sha - see example below. I'll add the commit sha to test with in the thread here to validate the fix once it is submitted.

gcr.io/kaniko-project/executor:<git-commit-sha>
gcr.io/kaniko-project/executor:<git-commit-sha>-debug
gcr.io/kaniko-project/executor:<git-commit-sha>-slim

@pdecat
Copy link
Contributor

pdecat commented Dec 15, 2023

I've managed to reproduce the issue with a local build of the current main branch of kaniko:

# docker run \
    -v "$HOME"/.aws:/root/.aws \
    -v $(pwd):/workspace \
    -e AWS_PROFILE=pdecat-test \
    gcr.io/kaniko-project/executor:latest \
    --dockerfile /workspace/Dockerfile \
    --destination "012345678901.dkr.ecr.eu-west-3.amazonaws.com/test-decat:latest" \
    --context dir:///workspace/ \
    --verbosity trace
DEBU[0000] Getting source context from dir:///workspace/
DEBU[0000] Build context located at /workspace/
DEBU[0000] Copying file /workspace/Dockerfile to /kaniko/Dockerfile
TRAC[0000] Adding /var/run to default ignore list
DEBU[0000] Retrieving credentials                        region=eu-west-3 registry=012345678901 serverURL=012345678901.dkr.ecr.eu-west-3.amazonaws.com service=ecr
DEBU[0000] Checking file cache                           registry=012345678901
DEBU[0000] Calling ECR.GetAuthorizationToken             registry=012345678901
error checking push permissions -- make sure you entered the correct tag name, and that you are authenticated correctly, and try again: checking push permission for "012345678901.dkr.ecr.eu-west-3.amazonaws.com/test-decat:latest": POST https://012345678901.dkr.ecr.eu-west-3.amazonaws.com/v2/test-decat/blobs/uploads/: unexpected status code 401 Unauthorized: Not Authorized

And also ensured v1.18.0 works in the same context:

# docker run \
    -v "$HOME"/.aws:/root/.aws \
    -v $(pwd):/workspace \
    -e AWS_PROFILE=pdecat-test \
    gcr.io/kaniko-project/executor:v1.18.0 \
    --dockerfile /workspace/Dockerfile \
    --destination "012345678901.dkr.ecr.eu-west-3.amazonaws.com/test-decat:latest" \
    --context dir:///workspace/ \
    --verbosity trace
INFO[0013] Using dockerignore file: /workspace/.dockerignore
INFO[0013] Retrieving image manifest alpine:3.19.0
INFO[0013] Retrieving image alpine:3.19.0 from registry index.docker.io
...
DEBU[0082] Retrieving credentials                        region=eu-west-3 registry=012345678901 serverURL=012345678901.dkr.ecr.eu-west-3.amazonaws.com service=ecr
DEBU[0082] Checking file cache                           registry=012345678901
DEBU[0082] Using cached token                            registry=012345678901
INFO[0082] Pushing image to 012345678901.dkr.ecr.eu-west-3.amazonaws.com/test-decat:latest
INFO[0086] Pushed 012345678901.dkr.ecr.eu-west-3.amazonaws.com/test-decat@sha256:f68b531728be67c32245cffc8f1df4acb97420765e7548c1bd6888076698c0b6

One noticeable difference is that the push permissions check happens earlier in the former.
Edit: turns out the check also happens early in the latter, it's just that nothing is traced if it passes.

@pdecat
Copy link
Contributor

pdecat commented Dec 15, 2023

I believe the issue is with the github.com/awslabs/amazon-ecr-credential-helper dependency which was not updated.

@pdecat
Copy link
Contributor

pdecat commented Dec 15, 2023

Can confirm that, will submit a PR ASAP.

pdecat added a commit to pdecat/kaniko that referenced this issue Dec 15, 2023
…esolve issues with AWS ECR authentication (resolves GoogleContainerTools#2882)

Signed-off-by: Patrick Decat <pdecat@gmail.com>
@pdecat
Copy link
Contributor

pdecat commented Dec 15, 2023

Submitted #2908

pdecat added a commit to pdecat/kaniko that referenced this issue Dec 15, 2023
…esolve issues with AWS ECR authentication (resolves GoogleContainerTools#2882)

As mentioned in aws/aws-sdk-go-v2#2370, AWS
SDK for Go v2 releases after 2023/11/15 broke compatibility with all
previous releases.

Signed-off-by: Patrick Decat <pdecat@gmail.com>
pdecat added a commit to pdecat/kaniko that referenced this issue Dec 15, 2023
…esolve issues with AWS ECR authentication

As mentioned in aws/aws-sdk-go-v2#2370, AWS
SDK for Go v2 releases after 2023/11/15 broke compatibility with all
previous releases.

Resolves GoogleContainerTools#2882

Signed-off-by: Patrick Decat <pdecat@gmail.com>
aaron-prindle pushed a commit that referenced this issue Dec 15, 2023
…esolve issues with AWS ECR authentication (#2908)

As mentioned in aws/aws-sdk-go-v2#2370, AWS
SDK for Go v2 releases after 2023/11/15 broke compatibility with all
previous releases.

Resolves #2882

Signed-off-by: Patrick Decat <pdecat@gmail.com>
@aaron-prindle aaron-prindle reopened this Dec 15, 2023
@aaron-prindle
Copy link
Collaborator

aaron-prindle commented Dec 15, 2023

Thank you @pdecat for the fix PR here which is now merged. Our CI/CD build with the fix can be pulled from the following locations:

NOTE: a946b82 is the git SHA for kaniko that has the fix PR included

gcr.io/kaniko-project/executor:a946b82f22240eb8e3f7e73aaf0e592a323fa466
gcr.io/kaniko-project/executor:a946b82f22240eb8e3f7e73aaf0e592a323fa466-debug
gcr.io/kaniko-project/executor:a946b82f22240eb8e3f7e73aaf0e592a323fa466-slim
gcr.io/kaniko-project/warmer:a946b82f22240eb8e3f7e73aaf0e592a323fa466

@ChristopherKlinge + other folks in the thread - can you attempt to use the CI/CD image(s) linked above and reply here validating/invalidating that kaniko w/ this fix PR resolves the previously seen AWS ECR auth regression? Thanks!

@krrose27
Copy link

@aaron-prindle We've had an open dependency update to v1.19.0 and then to v1.19.1, which has been broken due to this in our CI. Seeing your request above, I moved that over to the 323fa466-debug image you posted above, and our CI is passing and pushing an image to ECR successfully now.

@gunturaf
Copy link

image @aaron-prindle I can confirm that the `a946b82f22240eb8e3f7e73aaf0e592a323fa466-debug` works on GitLab CI that pushes to AWS ECR.

@aaron-prindle
Copy link
Collaborator

v1.19.2 has the fix from #2908 that was verified here. Thank to all here who helped flag, contribute to and validate this fix. Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/authentication area/aws kind/bug Something isn't working priority/p0 Highest priority. Break user flow. We are actively looking at delivering it. registry/ecr regression/v1.18.0 regression
Projects
None yet