Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when replacing a folder by symlink : i get corrupt images when pulling : failed to mknod("somefile", S_IFCHR, 0): no such file or directory #2308

Closed
ptempier opened this issue Nov 5, 2022 · 22 comments · Fixed by #2590
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) area/symlinks kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. works-with-docker
Milestone

Comments

@ptempier
Copy link

ptempier commented Nov 5, 2022

Actual behavior

When replacing a folder by symlink i get corrupt images , docker pull will fail with :
Failed to register layer: Error processing tar file(exit status 1): failed to mknod("/etc/fail2ban/action.d", S_IFCHR, 0): no such file or directory

Expected behavior
Be able to pull the image

To Reproduce

The error doesn't seems t happen with every files/folder.
I tried disabling all caches to be sure

The build/pull works when using the dockerfile directly with docker build command
The build works with kaniko but appears corrupt as i can t pull it

Additional Information

  • Dockerfile

FROM quay.io/centos/centos:stream9
RUN dnf -y update && dnf -y install systemd
RUN /usr/bin/systemctl mask systemd-logind && /usr/bin/systemctl mask systemd-hostnamed && /usr/bin/systemctl mask dbus-
RUN ln -s /data/git_conf/conf/systemd/journald.conf.d /etc/systemd/journald.conf.d
RUN dnf install -y rsyslog && mv -f /etc/rsyslog.d /etc/rsyslog.d.orig && ln -s /data/git_conf/conf/rsyslog/rsyslog.d /etc/rsyslog.d
RUN systemctl enable rsyslog
RUN dnf -y install procps net-tools
STOPSIGNAL SIGRTMIN+3
CMD [ "/sbin/init" ]

This one build and pull fine even with the the broken symlink that will lead to a mounted volume set in docker compose

FROM previousdockerimage
RUN dnf -y update && dnf install -y epel-release && dnf -y install fail2ban && systemctl enable fail2ban && dnf clean all
RUN rm -rf /etc/fail2ban && ln -s /data/git_conf/conf/fail2ban /etc/fail2ban
STOPSIGNAL SIGRTMIN+3
EXPOSE 22
CMD [ "/sbin/init" ]

When pulling this one errors with
Error processing tar file(exit status 1): failed to mknod("/etc/fail2ban/action.d", S_IFCHR, 0): no such file or directory
The folder action.d isn't supposed to be there anymore.

  • Build Context

Nothing special, here is the gitlab ci/cd, but i get the same error running it manually

make_fail2ban:
stage: make_fail2ban
image:
name: gcr.io/kaniko-project/executor:v1.9.1-debug
entrypoint: [""]
script:
- /kaniko/executor -c /workspace --dockerfile https://mygitlab/mydockerfile --destination "${CI_REGISTRY_IMAGE}/fail2ban:${CI_COMMIT_TAG}"

  • Kaniko Image (fully qualified with digest)

gcr.io/kaniko-project/executor:v1.9.1-debug
sha256:ac169723b2076f9d5804f4bc05c98397e286da6fdcdd5a09fdc179f06ccb3be1

Triage Notes for the Maintainers

Description Yes/No
Please check if this a new feature you are proposing
Please check if the build works in docker but not in kaniko
Please check if this error is seen when you use --cache flag
Please check if your dockerfile is a multistage dockerfile
@TobiX
Copy link

TobiX commented Feb 27, 2023

Minimal reproducer:

Dockerfile

FROM busybox

RUN mkdir /a /b /c && echo a > /a/a
RUN rm -Rfv /a && ln -sf /b /a

Build with:

podman run --rm -ti --volume $PWD:/workspace gcr.io/kaniko-project/executor:v1.9.1-debug --tar-path /workspace/out.tar --no-push --destination dummy:latest

The resulting tar file cannot be loaded into podman/docker:

$ podman --debug load -i out.tar
INFO[0000] podman filtering at log level debug
DEBU[0000] Called load.PersistentPreRunE(podman --debug load -i out.tar)
DEBU[0000] Merged system config "/usr/share/containers/containers.conf"
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /home/tobias/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/tobias/.local/share/containers/storage
DEBU[0000] Using run root /tmp/containers-user-1000/containers
DEBU[0000] Using static dir /home/tobias/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /tmp/podman-run-1000/libpod/tmp
DEBU[0000] Using volume path /home/tobias/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] Not configuring container store
DEBU[0000] Initializing event backend file
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 7
INFO[0000] podman filtering at log level debug
DEBU[0000] Called load.PersistentPreRunE(podman --debug load -i out.tar)
DEBU[0000] Merged system config "/usr/share/containers/containers.conf"
DEBU[0000] Using conmon: "/usr/bin/conmon"
DEBU[0000] Initializing boltdb state at /home/tobias/.local/share/containers/storage/libpod/bolt_state.db
DEBU[0000] Overriding run root "/tmp/podman-run-1000/containers" with "/tmp/containers-user-1000/containers" from database
DEBU[0000] Using graph driver overlay
DEBU[0000] Using graph root /home/tobias/.local/share/containers/storage
DEBU[0000] Using run root /tmp/containers-user-1000/containers
DEBU[0000] Using static dir /home/tobias/.local/share/containers/storage/libpod
DEBU[0000] Using tmp dir /tmp/podman-run-1000/libpod/tmp
DEBU[0000] Using volume path /home/tobias/.local/share/containers/storage/volumes
DEBU[0000] Set libpod namespace to ""
DEBU[0000] [graphdriver] trying provided driver "overlay"
DEBU[0000] Cached value indicated that overlay is supported
DEBU[0000] Cached value indicated that overlay is supported
DEBU[0000] Cached value indicated that metacopy is not being used
DEBU[0000] Cached value indicated that native-diff is usable
DEBU[0000] backingFs=extfs, projectQuotaSupported=false, useNativeDiff=true, usingMetacopy=false
DEBU[0000] Initializing event backend file
DEBU[0000] Configured OCI runtime runsc initialization failed: no valid executable found for OCI runtime runsc: invalid argument
DEBU[0000] Configured OCI runtime krun initialization failed: no valid executable found for OCI runtime krun: invalid argument
DEBU[0000] Configured OCI runtime runj initialization failed: no valid executable found for OCI runtime runj: invalid argument
DEBU[0000] Configured OCI runtime kata initialization failed: no valid executable found for OCI runtime kata: invalid argument
DEBU[0000] Using OCI runtime "/usr/bin/crun"
INFO[0000] Setting parallel job count to 7
DEBU[0000] Loading image from "out.tar"
DEBU[0000] -> Attempting to load "out.tar" as an OCI directory
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] }
DEBU[0000] parsed reference into "[overlay@/home/tobias/.local/share/containers/storage+/tmp/containers-user-1000/containers]localhost/out.tar:latest"
DEBU[0000] Copying source image out.tar: to destination image [overlay@/home/tobias/.local/share/containers/storage+/tmp/containers-user-1000/containers]localhost/out.tar:latest
DEBU[0000] Error loading out.tar (oci): initializing source oci:out.tar:: open out.tar/index.json: not a directory
DEBU[0000] -> Attempting to load "out.tar" as an OCI archive
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] }
DEBU[0000] Error deleting temporary directory: <nil>
DEBU[0000] Error loading out.tar (oci-archive): loading index: open /var/tmp/oci784683435/index.json: no such file or directory
DEBU[0000] -> Attempting to load "out.tar" as a Docker archive
DEBU[0000] No compression detected
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] }
DEBU[0000] parsed reference into "[overlay@/home/tobias/.local/share/containers/storage+/tmp/containers-user-1000/containers]localhost/dummy:latest"
DEBU[0000] Copying source image out.tar:docker.io/library/dummy:latest to destination image [overlay@/home/tobias/.local/share/containers/storage+/tmp/containers-user-1000/containers]localhost/dummy:latest
DEBU[0000] Using blob info cache at /home/tobias/.local/share/containers/cache/blob-info-cache-v1.boltdb
DEBU[0000] Detected compression format gzip
DEBU[0000] Detected compression format gzip
DEBU[0000] Detected compression format gzip
DEBU[0000] IsRunningImageAllowed for image docker-archive:
DEBU[0000]  Using default policy section
DEBU[0000]  Requirement 0: allowed
DEBU[0000] Overall: allowed
Getting image source signatures
DEBU[0000] Manifest has MIME type application/vnd.docker.distribution.manifest.v2+json, ordered candidate list [application/vnd.docker.distribution.manifest.v2+json, application/vnd.docker.distribution.manifest.v1+prettyjws, application/vnd.oci.image.manifest.v1+json, application/vnd.docker.distribution.manifest.v1+json]
DEBU[0000] ... will first try using the original manifest unmodified
DEBU[0000] Checking if we can reuse blob sha256:b16b83b3d83f480e19392177a22e7b8525a8127c0d724da41a041c595cd06025: general substitution = true, compression for MIME type "application/vnd.docker.image.rootfs.diff.tar.gzip" = true
DEBU[0000] Checking if we can reuse blob sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2: general substitution = true, compression for MIME type "application/vnd.docker.image.rootfs.diff.tar.gzip" = true
DEBU[0000] Checking if we can reuse blob sha256:d5c80c0fdbc4e9e39c554da5acba3e6b069facc36da2715c704ccb10c3c9c069: general substitution = true, compression for MIME type "application/vnd.docker.image.rootfs.diff.tar.gzip" = true
DEBU[0000] Skipping blob sha256:b64792c17e4ad443d16b218afb3a8f5d03ca0f4ec49b11c1a7aebe17f6c3c1d2 (already present):
DEBU[0000] Skipping blob sha256:d5c80c0fdbc4e9e39c554da5acba3e6b069facc36da2715c704ccb10c3c9c069 (already present):
DEBU[0000] Detected compression format gzip
DEBU[0000] No compression detected
DEBU[0000] Using original blob without modification
Copying blob b16b83b3d83f done
Copying blob b64792c17e4a skipped: already exists
Copying blob d5c80c0fdbc4 skipped: already exists
DEBU[0000] Cached value indicated that idmapped mounts for overlay are not supported
DEBU[0000] Check for idmapped mounts support
DEBU[0000] Applying tar in /home/tobias/.local/share/containers/storage/overlay/f3a8ff84ddd4014d5adb37278167168c82798082c2b34a1da51e1dd8f29ccad3/diff
DEBU[0000] Error loading out.tar (docker-archive): writing blob: adding layer with blob "sha256:b16b83b3d83f480e19392177a22e7b8525a8127c0d724da41a041c595cd06025": processing tar file(no such file or directory): exit status 1
DEBU[0000] -> Attempting to load "out.tar" as a Docker dir
DEBU[0000] Normalized platform linux/amd64 to {amd64 linux  [] }
DEBU[0000] Error loading out.tar (dir): open out.tar/manifest.json: not a directory
Error: payload does not match any of the supported image formats:
 * oci: initializing source oci:out.tar:: open out.tar/index.json: not a directory
 * oci-archive: loading index: open /var/tmp/oci784683435/index.json: no such file or directory
 * docker-archive: writing blob: adding layer with blob "sha256:b16b83b3d83f480e19392177a22e7b8525a8127c0d724da41a041c595cd06025": processing tar file(no such file or directory): exit status 1
 * dir: open out.tar/manifest.json: not a directory

@ptempier
Copy link
Author

Haa thanks :)
After banging my head against the wall to find out what was going on, i was too tired to make a minimal reproducer.

@TobiX
Copy link

TobiX commented Feb 28, 2023

Bisecting yields

bc46c24 is the first bad commit

commit bc46c247073d861d39614a72132c27a74f461548
Author: Andreas Fleig <andreasfleig@gmail.com>
Date:   Tue May 31 22:42:32 2022 +0200

    Write parent directories to tar before whiteout files (#2113)

    * Write parent directories to tar before whiteout files

    Fixes #1149

    The OCI image spec does not specify this order but it's a good idea and Docker
    does the same.

    When manually comparing layers created by Docker and Kaniko there are still
    some differences (that container-diff does not show):

    * Kaniko adds / to layers
    * For `mkdir /test`, docker adds `/test` and an opaque whiteout file
      `/test/.wh..wh..opq`. Kaniko only adds `/test/` (and /).

    * snapshot_test: cleanup

    Fix typos and use listFilesInTar() where possible

 pkg/snapshot/snapshot.go      |  36 +++++++----
 pkg/snapshot/snapshot_test.go | 145 ++++++++++++++++++++++++++++--------------
 2 files changed, 118 insertions(+), 63 deletions(-)

as the culprit, which sounds related.

@TobiX
Copy link

TobiX commented Mar 27, 2023

@andreasf @gabyx Any ideas?

@gabyx
Copy link
Contributor

gabyx commented Mar 28, 2023

Hmm not sure about, I refactored the strange whiteout files once to be clearer, there was lots of old stuff laying around, but I think that commit came afterwards? My refactor was about to track correctly deleted and new files for each layer and act on these two lists later in the process when the tar for the layer is created. #2066
I dont have time this week, maybe I can have a look at it next week.

Info: the line RUN rm -Rfv /a && ln -sf /b /a should create a whiteout (thats a deletion of a file) file and at the same time in the next command create a normal symlink a -> /b and I suspect these two things do not play nice with each other. I suspect that maybe there is no creation of a new file a detected because of symlink following and the whiteout for a is not discarded, resulting in a whiteout and a writing of the symlink into the tar because the tar mechanism is kind of different (???), something along these lines

@dannygueta
Copy link

any update on the above ? this is becoming a real issue :(

@andreasf
Copy link
Contributor

Oopsie. I think I know what the problem is and I'll prepare a fix

andreasf added a commit to andreasf/kaniko that referenced this issue Jun 19, 2023
If a non-empty directory gets replaced with a link, the files in that
directory also get deleted. However, there should not be any whiteout
files for them in the layer. If there were, the resulting tar file would
not be extractable.

Fixes GoogleContainerTools#2308
@andreasf
Copy link
Contributor

@dannygueta @TobiX this branch fixes the minimal testcase: https://github.com/andreasf/kaniko/tree/fix_2308

Can you try it with some other Dockerfiles that triggered the issue?

If you cannot build kaniko images yourself, I pushed a version to docker.io/andreasfleig/kaniko-executor:debug.

@gabyx
Copy link
Contributor

gabyx commented Jun 19, 2023

Testing the above Docker file:
At bc46c24707~1:

�[36mINFO�[0m[0001] RUN mkdir /a /b /c && echo a > /a/a                       
�[36mINFO�[0m[0001] Initializing snapshotter ...                              
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                     
�[37mDEBU�[0m[0001] Current image filesystem: map[]                           
�[37mDEBU�[0m[0001] Adding to layer: [/bin/strings /bin/id /bin/ftpput /bin/lo
�[37mDEBU�[0m[0001] Deleting in layer: map[]                                  
�[36mINFO�[0m[0001] Cmd: /bin/sh                                              
�[36mINFO�[0m[0001] Args: [-c mkdir /a /b /c && echo a > /a/a]                
�[36mINFO�[0m[0001] Running: [/bin/sh -c mkdir /a /b /c && echo a > /a/a]     
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                     
�[37mDEBU�[0m[0001] Current image filesystem: map[/:d2c5427ffdb4fe186a7456ccfb
�[37mDEBU�[0m[0001] Adding to layer: [/ /c /a /a/a /b]                        
�[37mDEBU�[0m[0001] Deleting in layer: map[]                                  
�[36mINFO�[0m[0001] RUN rm -Rfv /a && ln -sf /b /a                            
�[36mINFO�[0m[0001] Cmd: /bin/sh                                              
�[36mINFO�[0m[0001] Args: [-c rm -Rfv /a && ln -sf /b /a]                     
�[36mINFO�[0m[0001] Running: [/bin/sh -c rm -Rfv /a && ln -sf /b /a]          
removed '/a/a'                                                                  
removed directory: '/a'                                                         
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                     
�[37mDEBU�[0m[0001] Current image filesystem: map[/:da7bc1a25453bea1264515af13
�[37mDEBU�[0m[0001] Adding to layer: [/ /a /b]                                
�[37mDEBU�[0m[0001] Deleting in layer: map[/a/a:{}]                           
�[37mDEBU�[0m[0001] Mapping stage idx 0 to digest sha256:96140bb5279053f61b690
�[37mDEBU�[0m[0001] Mapping digest sha256:96140bb5279053f61b6900ab4c7bb7b180a3
�[36mINFO�[0m[0002] Skipping push to container registry due to --no-push flag 

image
This seems correct and the docker file loads with docker load -i out.tar

at bc46c24707:

�[36mINFO�[0m[0001] RUN mkdir /a /b /c && echo a > /a/a                      
�[36mINFO�[0m[0001] Initializing snapshotter ...                             
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                    
�[37mDEBU�[0m[0001] Current image filesystem: map[]                          
�[37mDEBU�[0m[0001] Adding to layer: [/bin/sed /bin/kill /bin/dmesg /bin/stat
�[37mDEBU�[0m[0001] Deleting in layer: map[]                                 
�[36mINFO�[0m[0001] Cmd: /bin/sh                                             
�[36mINFO�[0m[0001] Args: [-c mkdir /a /b /c && echo a > /a/a]               
�[36mINFO�[0m[0001] Running: [/bin/sh -c mkdir /a /b /c && echo a > /a/a]    
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                    
�[37mDEBU�[0m[0001] Current image filesystem: map[/:57448699613d56efe35c1627a
�[37mDEBU�[0m[0001] Adding to layer: [/b / /c /a /a/a]                       
�[37mDEBU�[0m[0001] Deleting in layer: map[]                                 
�[36mINFO�[0m[0001] RUN rm -Rfv /a && ln -sf /b /a                           
�[36mINFO�[0m[0001] Cmd: /bin/sh                                             
�[36mINFO�[0m[0001] Args: [-c rm -Rfv /a && ln -sf /b /a]                    
�[36mINFO�[0m[0001] Running: [/bin/sh -c rm -Rfv /a && ln -sf /b /a]         
removed '/a/a'                                                                 
removed directory: '/a'                                                        
�[36mINFO�[0m[0001] Taking snapshot of full filesystem...                    
�[37mDEBU�[0m[0001] Current image filesystem: map[/:70247bf83a3da8ea4d48b2455
�[37mDEBU�[0m[0002] Adding to layer: [/b / /a]                               
�[37mDEBU�[0m[0002] Deleting in layer: map[/a/a:{}]                          
�[37mDEBU�[0m[0002] Mapping stage idx 0 to digest sha256:1cc290ca89c3ea8e9a55
�[37mDEBU�[0m[0002] Mapping digest sha256:1cc290ca89c3ea8e9a55e34ff7533daf160
�[36mINFO�[0m[0002] Skipping push to container registry due to --no-push flag

output seems identical and the error seems to be in writeToTar which changed. When I replace it with the
code in bc46c24~1, everythin works.

We have to look close to this function why it was coded/changed like it was.

Further investigation needed.

@gabyx
Copy link
Contributor

gabyx commented Jun 19, 2023

When I print the offending last layer with

mkdir layers && tar -xf out.tar && cd layers
tar -tvf last-layer.tar.gz

I get for bc46c24~1 :

---------- 0/0               0 1970-01-01 01:00 a/.wh.a
drwxr-xr-x 0/0               0 2023-06-19 22:35 /
lrwxrwxrwx 0/0               0 2023-06-19 22:35 a -> /b
drwxr-xr-x 0/0               0 2023-06-19 22:35 b/

and for bc46c24 (offending commit from git-bisect):

drwxr-xr-x 0/0               0 2023-06-19 22:43 /
lrwxrwxrwx 0/0               0 2023-06-19 22:43 a -> /b
---------- 0/0               0 1970-01-01 01:00 a/.wh.a
drwxr-xr-x 0/0               0 2023-06-19 22:43 b/

which seems to crash on load. Anybody knows the difference?
Ahh dammit: It creates a symlink, and then wants to delete a file in a/ which points to b which does not exist
I guess thats the bug.

I suppose there is a relation ship between deleting and creating symlinks.
Shouldn't the creation of symlinks always happen after deletes.

  • Either one collects all current symlinks, and adds them at last (whitouts before any symlinks)
  • This function call here needs to not add any symlinks
    if err := addParentDirectories(t, addedPaths, path); err != nil {
    , so stopping immediately if a symlink is discovered in the parent path. because adding this before the whiteout leads to the above problem.

I try to make a fix for reviewing. I am not 100% sure.
@andreasf : Seen that you already found the same bug. I review your PR

@andreasf
Copy link
Contributor

andreasf commented Jun 19, 2023

Building the Dockerfile above with kaniko 1.8.1, I am getting the following third layer:

$ docker run --rm -ti --volume $PWD:/workspace \
	gcr.io/kaniko-project/executor:v1.8.1-debug \
	--tarPath /workspace/out.tar --no-push --destination dummy:latest
...

$ tar -tf c50ab0ae6b4f68b322aa3d4a37a059e42d4d0a1b422e66dee14879e4695499ec.tar.gz 
a/.wh.a
/
a
b/

The layer contains a whiteout file for a/a but not its parent directory. tar on Mac OS extracts the file and creates the parent directory, but there is an error message when it reaches the symlink a:

$ tar xvf c50ab0ae6b4f68b322aa3d4a37a059e42d4d0a1b422e66dee14879e4695499ec.tar.gz 
x a/.wh.a
tar: Removing leading '/' from member names
x .
x a: Can't replace existing directory with non-directory
x b/
tar: Error exit delayed from previous errors.

With kaniko 1.9.1, the order of files in the tar file is different, because there was a problem running images in Cloud Foundry when a file was written before its parent directory.

$ docker run --rm -ti --volume $PWD:/workspace \
	gcr.io/kaniko-project/executor:v1.9.1-debug \
	--tar-path /workspace/out.tar --no-push --destination dummy:latest
...

$ tar tf 0b250393fc3b6d0368463e4674d59c20cc4068acd8c182d736af6e6075459e82.tar.gz 
/
a
a/.wh.a
b/

a of course is still a symlink, so the whiteout file a/.wh.a cannot be extracted.

$ tar xvf 0b250393fc3b6d0368463e4674d59c20cc4068acd8c182d736af6e6075459e82.tar.gz 
tar: Removing leading '/' from member names
x .
x a
x a/.wh.a: Cannot extract through symlink a/.wh.a
x b/
tar: Error exit delayed from previous errors.

$ docker load -i out.tar 
6f1816a637c7: Loading layer [==================================================>]     340B/340B
787bca5424f9: Loading layer [==================================================>]     326B/326B
failed to mknod("/a/a", S_IFCHR, 0): no such file or directory

Docker 24.0.2 only writes the symlink itself to the last layer. It doesn't add a whiteout file for a/a: (but fore some reason there is /etc)

$ tar tf f508695d1b04992ce690782587081acf8d379876779b984b07ba3e73c5d79564/layer.tar 
a
etc/

My proposed fix is to not write whiteout files if a directory is replaced with a symlink:

$ docker run --rm -ti --volume $PWD:/workspace \
	docker.io/andreasfleig/kaniko-executor:debug \
	--tar-path /workspace/out.tar --no-push --destination dummy:latest

$ tar tf 4f3d0c3322c3c15266c1dc26d1cb5453a7e137f5a476022a591c5373c47cfc9c.tar.gz 
/
a
b/

This is now relatively close to Docker's output. Kaniko also adds / and the link target, which should not be necessary.

I don't think reverting to the old code is a good alternative. It would lead to issues extracting images in Cloud Foundry, and as we saw further up tar also can't extract the layer properly.

@gabyx
Copy link
Contributor

gabyx commented Jun 19, 2023

@andreasf : Ok, I guess my PR still solves the issue. But you are probably right:
We can also argue when ever a symlink is encountered in the parent chain, we omit the whitout completely.
The link must get added to the adds since it is obviously got added newly in this snapshot. (?)

So is that correct?: What happens if the last layer contains

/
a/ (link -> /b)
b/

and no whitout for a/a on the filesystem. Is folder a completly deleted and replaced with the link when docker makes the overlay FS?

Also: I am wondering why there is no delete for dir /a in the above debug log : Deleting in layer: map[/a/a:{}]. This is weird???

@andreasf
Copy link
Contributor

I just realized that the issue is not specific to symlinks. When the directory a is replaced with a file a, then there also should not be a whiteout file for a/a in the layer. So before adding whiteout files, it should be checked if one of the parent paths is not a directory.

The link must get added to the adds since it is obviously got added newly in this snapshot. (?)

Yes, the link is new, so I assume it should be in "adds" (I did not try to fully understand the LayeredMap implementation)

Is folder a completly deleted and replaced with the link when docker makes the overlay FS?

Yes

@andreasf
Copy link
Contributor

andreasf commented Jun 19, 2023

Test case with file replacing directory:

FROM busybox

RUN mkdir /a /b /c && echo a > /a/a
RUN rm -Rfv /a && echo "foo" > /a
$ docker build -t kaniko-2308-file .
$ docker save kaniko-2308-file -o docker-built.tar
$ tar xf docker-built.tar

$ tar tf 79f19a47769a5ba199174aad45aeebf0714486c1c52bf8edddfb4b0441b45457/layer.tar 
a
etc/

$ docker run --rm kaniko-2308-file ls -l
total 52
-rw-r--r--    1 root     root             4 Jun 19 22:08 a
drwxr-xr-x    2 root     root          4096 Jun 18 10:48 b
...

andreasf added a commit to andreasf/kaniko that referenced this issue Jun 20, 2023
…es or links

If a non-empty directory gets replaced with something other than a
directory (e.g. file or symlink), the files in that directory also get
deleted. However, there should not be any whiteout files for them in the
layer. If there were, the resulting tar file would not be extractable.

Fixes GoogleContainerTools#2308
@aaron-prindle aaron-prindle added this to the v1.12.0 milestone Jun 20, 2023
andreasf added a commit to andreasf/kaniko that referenced this issue Jun 21, 2023
…es or links

If a non-empty directory gets replaced with something other than a
directory (e.g. file or symlink), the files in that directory also get
deleted. However, there should not be any whiteout files for them in the
layer. If there were, the resulting tar file would not be extractable.

Fixes GoogleContainerTools#2308
@ptempier
Copy link
Author

@andreasf
If the fix is very different, i would concentrate on the symlink issue and not the file issue.
Because it's a common occurence to want to place the config or data in a different place than the default one (in a volume). Which can be done via a symlink. This allows to use a single volume for all the folders that are using volume data , instead of tons of volumes.
Meanwhile i am not sure what the use case for the files would be.

@andreasf
Copy link
Contributor

@ptempier the problem and the fix are actually the same in both cases. The fix will not affect the ability to use symlinks, it is more about how file system changes are represented in layer tar files, something that users shouldn't have to worry about (assuming everything works):

Layer 1: two directories and a file in foo
/
├─ foo/
│  ├─ bar
├─ baz/

Layer 2: directory foo (which had a file in it) is replaced by a file or a symlink
/
├─ foo
├─ baz/

In this case, Kaniko would also add a whiteout file foo/.wh.bar to layer 2's tar file, because the file foo/bar was present in layer 1 but on layer 2. But because in the same tar file, foo exists and is not a directory anymore, there should not be any files with names like foo/.wh.bar which expect that foo is a directory. This contradiction was why the tar files could not be extracted.

@gabyx
Copy link
Contributor

gabyx commented Jun 21, 2023 via email

@ptempier
Copy link
Author

Hi

I resurrected a pipeline that i converted to dind dues to this issue, back to kaniko to test the fixes.

Maybe i tested incorrectly, but with both theses versions :
docker.io/andreasfleig/kaniko-executor:debug (taken from there
: #2308 (comment))
gcr.io/kaniko-project/executor:v1.11.0-debug (Is the bug even supposed to be fixed there ?)

The build and pull seems to work. (seems like an improvement to earlier versions, where i couldn't pull)
Unless i use the cache option, in that case, the build also fails, with a symptom similar to previously.
But then it's less annoying as the pipeline fails and i don't push a corrupt image.
Still annoying as i would need to disable the cache sometime, but better than nothing.

INFO[0013] RUN mkdir -p /data/log/ && mv /var/log/ /var/log.orig && ln -sf /data/log /var/log
INFO[0013] Found cached layer, extracting to filesystem
error building image: error building stage: failed to execute command: extracting fs from image: mkdir /var/log: file exists

or with another image :

INFO[0001] RUN dnf -y install python-dns python-inotify fail2ban && systemctl enable fail2ban
INFO[0001] Found cached layer, extracting to filesystem
INFO[0004] RUN mkdir -p /data/conf/fail2ban/ && mv /etc/fail2ban/ /etc/fail2ban.orig && ln -s /data/conf/fail2ban /etc/fail2ban && mv /var/lib/fail2ban/ /var/lib/fail2ban.orig && ln -s /data/data/fail2ban /var/lib/fail2ban
INFO[0004] Found cached layer, extracting to filesystem
error building image: error building stage: failed to execute command: extracting fs from image: mkdir /etc/fail2ban: file exists

andreasf added a commit to andreasf/kaniko that referenced this issue Jun 21, 2023
…es or links

If a non-empty directory gets replaced with something other than a
directory (e.g. file or symlink), the files in that directory also get
deleted. However, there should not be any whiteout files for them in the
layer. If there were, the resulting tar file would not be extractable.

Fixes GoogleContainerTools#2308
@andreasf
Copy link
Contributor

@ptempier

Is it possible that your cache still contains invalid layers?

I tried to reproduce your second case but it works for me, but of course I don't have a cache with old data:

$ cat Dockerfile 
FROM fedora:38

RUN dnf -y install python-dns python-inotify fail2ban && systemctl enable fail2ban
RUN mkdir -p /data/conf/fail2ban/ && mv /etc/fail2ban/ /etc/fail2ban.orig && ln -s /data/conf/fail2ban /etc/fail2ban && mv /var/lib/fail2ban/ /var/lib/fail2ban.orig && ln -s /data/data/fail2ban /var/lib/fail2ban

$ docker run --rm -ti --volume $PWD:/workspace \
	docker.io/andreasfleig/kaniko-executor:debug \
	--tar-path /workspace/out.tar --no-push --destination dummy:latest
...

$ docker load -i out.tar 
2f71e0079f1f: Loading layer   68.3MB/68.3MB
014234cd1863: Loading layer  230.9MB/230.9MB
e2a38a9c3a32: Loading layer  104.6kB/104.6kB
The image dummy:latest already exists, renaming the old one with ID sha256:3bff1580e6b043d1524f8acd73644583906c106df120bcadf1eb3aaa22fd5c63 to empty string
Loaded image: dummy:latest

@gabyx
Copy link
Contributor

gabyx commented Jun 21, 2023

@andreasf : Sorry I was mistaken, everything works fine, /a symlink and /b are added to the add list in the last layer, all seams correct.

@aaron-prindle aaron-prindle added area/symlinks area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) labels Jun 21, 2023
@aaron-prindle aaron-prindle added kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. works-with-docker labels Jun 21, 2023
@ptempier
Copy link
Author

I specified a different registry for cache and it works with the cache as well.
So i guess a bad layer was still in cache. (in gitlab registry deletion is not instant).

@TobiX
Copy link

TobiX commented Jun 22, 2023

@andreasf Found some time to test your fixed image (docker.io/andreasfleig/kaniko-executor:debug) with the Dockerfile which lead me here and it fixes the issue for me 🎉

aaron-prindle pushed a commit that referenced this issue Jun 22, 2023
…es or links (#2590)

If a non-empty directory gets replaced with something other than a
directory (e.g. file or symlink), the files in that directory also get
deleted. However, there should not be any whiteout files for them in the
layer. If there were, the resulting tar file would not be extractable.

Fixes #2308
kylecarbs pushed a commit to coder/kaniko that referenced this issue Jul 12, 2023
…es or links (GoogleContainerTools#2590)

If a non-empty directory gets replaced with something other than a
directory (e.g. file or symlink), the files in that directory also get
deleted. However, there should not be any whiteout files for them in the
layer. If there were, the resulting tar file would not be extractable.

Fixes GoogleContainerTools#2308
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/filesystems For all bugs related to kaniko container filesystems (mounting issues etc) area/symlinks kind/bug Something isn't working priority/p1 Basic need feature compatibility with docker build. we should be working on this next. works-with-docker
Projects
None yet
6 participants