Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure files and links have missing parent dirs created. Support hard links. #1179

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 60 additions & 2 deletions internal/engine/engine.go
Original file line number Diff line number Diff line change
Expand Up @@ -412,6 +412,13 @@ func untar(ctx context.Context, dst string, r io.Reader) error {

// if it's a file create it
case tar.TypeReg:
// If the file's parent dir doesn't exist, create it.
dirname := filepath.Dir(target)
if _, err := os.Stat(dirname); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now happening in every branch. Perhaps it should be outside the switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean, but I think it has to happen in the switch because it conditionally happens in some of the cases (e.g. Symlinks), and must happen (though we could probably make it work) in others (e.g. Directories)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't conditionally happen though. It happens in every case. The only thing that changes is that target is either the real thing in the case of tar.TypeDir, or just the Dir portion of the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced up with @bcrochet but to document it here: Symlinks are skipped if they refer to a path outside of the tarball root once fully resolved (e.g. by way of ../../../../). We skip them without writing anything. For this reason, we cannot write before the case statement.

if err := os.MkdirAll(dirname, 0o755); err != nil {
return err
}
}
f, err := os.OpenFile(target, os.O_CREATE|os.O_RDWR, os.FileMode(header.Mode))
if err != nil {
return err
Expand All @@ -429,15 +436,66 @@ func untar(ctx context.Context, dst string, r io.Reader) error {

// if it's a link create it
case tar.TypeSymlink:
err := os.Symlink(header.Linkname, filepath.Join(dst, header.Name))
nobaseLinkname, nobaseName := resolveLinkPaths(header.Linkname, header.Name)
fullLinkname := filepath.Join(dst, nobaseLinkname)
fullName := filepath.Join(dst, nobaseName)
// Safeguard for cases where we're trying to link to something
// outside of our base fs.
if !strings.HasPrefix(fullLinkname, dst) {
logger.V(log.DBG).Info("Error processing symlink. Symlink would reach outside of the image archive. Skipping this link", "link", header.Name, "linkedTo", header.Linkname, "resolvedTo", fullLinkname)
continue
}
// Create the new link's directory if it doesn't exist.
dirname := filepath.Dir(fullName)
if _, err := os.Stat(dirname); err != nil {
if err := os.MkdirAll(dirname, 0o755); err != nil {
return err
}
}
err := os.Symlink(fullLinkname, fullName)
if err != nil {
logger.V(log.DBG).Info(fmt.Sprintf("Error creating symlink: %s. Ignoring.", header.Name), "link", fullName, "linkedTo", fullLinkname, "reason", err)
continue
}
case tar.TypeLink:
// We assume hard links will not contain relative pathing in the archive.
original := filepath.Join(dst, header.Linkname)
// Create the new link's directory if it doesn't exist.
dirname := filepath.Dir(target)
if _, err := os.Stat(dirname); err != nil {
if err := os.MkdirAll(dirname, 0o755); err != nil {
return err
}
}
err := os.Link(original, target)
if err != nil {
logger.V(log.DBG).Info(fmt.Sprintf("Error creating link: %s. Ignoring.", header.Name))
logger.V(log.DBG).Info(fmt.Sprintf("Error creating hard link: %s. Ignoring.", header.Name), "link", target, "linkedTo", original, "reason", err)
continue
}
}
}
}

// resolveLinkPaths determines if oldname is an absolute path or a relative
// path, and returns oldname relative to newname if necessary.
func resolveLinkPaths(oldname, newname string) (string, string) {
if filepath.IsAbs(oldname) {
return oldname, newname
}

linkDir := filepath.Dir(newname)
// If the newname is at the root of the filesystem, but the oldname is
// relative, we'll swap out the value we get from filepath.Dir for a / to
// allow relative pathing to resolve. This strips `..` references given the
// link exists at the very base of the filesystem. In effect, it converts
// oldname to an absolute path
if linkDir == "." {
linkDir = "/"
}

return filepath.Join(linkDir, oldname), newname
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is incorrect for hard links: their targets are relative to the root directory of the layer archive. For confirmation, an example of an image with such hard links is docker.io/library/busybox.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why busybox would be relevant here, since all our images must be based off of UBI...but if you look at busybox...it doesn't have an os-release file, hence it's failure when ran against preflight.

~/busybox/etc 
❯ ll
total 20
-rw-r--r--. 1 acornett acornett 306 May 18  2023 group
-rw-r--r--. 1 acornett acornett 114 May 18  2023 localtime
drwxr-xr-x. 1 acornett acornett  82 May 18  2023 network
-rw-r--r--. 1 acornett acornett 494 May 18  2023 nsswitch.conf
-rw-r--r--. 1 acornett acornett 340 May 18  2023 passwd
-rw-------. 1 acornett acornett 136 May 18  2023 shadow

~/busybox 
❯ find . -name os-release
(no-output)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The busybox image is relevant in that it provides an example of how hard links are stored in tar archives and container image layers, and it was an image that I remembered included hard links in its layers. If it has to be UBI-based to matter, feel free to build this and examine the added layers:

FROM registry.access.redhat.com/ubi8
RUN echo hello > /usr/local/hello.txt
RUN ln /usr/local/hello.txt /usr/local/hello2.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 @nalind, thanks. I misread my test tar archives - I only treated link origins with .. as relative, and everything else as absolute (or, rather, relative to /). The inverse seems to be what I want (everything not explicitly absolute should be treated as a relative path).

Latest push should implement this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what was previously done (passing the value from the Linkname field to os.Symlink()) was correct for symbolic links. I've only seen "Linkname values are always relative to the root of the archive" with hard links.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nalind I believe this resolves correctly now for both symlinks and hardlinks. From what I see, an image that looks like this:

FROM registry.access.redhat.com/ubi8

RUN ln -v /etc/os-release /os-release
RUN ln -sf /etc/os-release /os-release-soft

Has no issue linking the symlinks and hard links. E.g.

[root@2656a86bb595 fs]# ls -l
total 24
lrwxrwxrwx. 1 root root   35 Jun 27 15:52 bin -> /tmp/preflight-472685243/fs/usr/bin
drwxr-xr-x. 1 root root    0 Jun 27 15:52 boot
drwxr-xr-x. 1 root root    0 Jun 27 15:52 dev
drwxr-xr-x. 1 root root 1842 Jun 27 15:52 etc
-rw-r--r--. 1 root root    0 Jun 27 15:52 foo
drwxr-xr-x. 1 root root    0 Jun 27 15:52 home
lrwxrwxrwx. 1 root root   35 Jun 27 15:52 lib -> /tmp/preflight-472685243/fs/usr/lib
lrwxrwxrwx. 1 root root   37 Jun 27 15:52 lib64 -> /tmp/preflight-472685243/fs/usr/lib64
drwxr-xr-x. 1 root root    0 Jun 27 15:52 lost+found
drwxr-xr-x. 1 root root    0 Jun 27 15:52 media
drwxr-xr-x. 1 root root    0 Jun 27 15:52 mnt
drwxr-xr-x. 1 root root    0 Jun 27 15:52 opt
lrwxrwxrwx. 1 root root   46 Jun 27 15:52 os-release -> /tmp/preflight-472685243/fs/usr/lib/os-release
lrwxrwxrwx. 1 root root   42 Jun 27 15:52 os-release-soft -> /tmp/preflight-472685243/fs/etc/os-release
drwxr-xr-x. 1 root root    0 Jun 27 15:52 proc
drwxr-xr-x. 1 root root  254 Jun 27 15:52 root
drwxr-xr-x. 1 root root    8 Jun 27 15:52 run
lrwxrwxrwx. 1 root root   36 Jun 27 15:52 sbin -> /tmp/preflight-472685243/fs/usr/sbin
drwxr-xr-x. 1 root root    0 Jun 27 15:52 srv
drwxr-xr-x. 1 root root    0 Jun 27 15:52 sys
drwxr-xr-x. 1 root root   72 Jun 27 15:52 tmp
drwxr-xr-x. 1 root root  100 Jun 27 15:52 usr
drwxr-xr-x. 1 root root  166 Jun 27 15:52 var

You might notice /os-release is hardlinked to /etc/os-release, which itself is a link. I see some issues with links to links to links (spanning link types), but GNU tar seems to exhibit the same issue, and the running container based on the image also sees the broken link. To that end, I'm thinking this is close enough to what we expect to be written to disk for most tasks Preflight needs to execute.

Let me know if you see something different.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think that trying to mix the handling of symbolic links with the handling of hard links is adding more complexity than it's removing. For symbolic links, the oldname to pass to os.Symlink() is just the Linkname. For hard links and os.Link(), it's filepath.Join(dst, header.Linkname).

If it's possible that any of the images being analyzed will attempt tricks by having the Linkname refer to something outside of dst, it can get more complicated than that, but I don't know if that's a concern here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough. I've split the two, and as discussed, added a few safeguards around the writing of symlinks that may be broken as present in the archive, but may point to things outside of our extraction base directory.

I was unable to get rid of the link resolution code even with splitting it out, but overall I think the hardlink logic functions a bit better (I saw a hardlink testcase resolve that was previously broken).

PTAL.

}

// writeCertImage takes imageRef and writes it to disk as JSON representing a pyxis.CertImage
// struct. The file is written at path certification.DefaultCertImageFilename.
//
Expand Down
15 changes: 15 additions & 0 deletions internal/engine/engine_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -379,6 +379,21 @@ var _ = Describe("Check Name Queries", func() {
})
})

var _ = Describe("Link Path Resolution", func() {
DescribeTable(
"Link targets should resolve correctly",
func(old, new, expectedOld, expectedNew string) {
resO, resN := resolveLinkPaths(old, new)
Expect(resO).To(Equal(expectedOld))
Expect(resN).To(Equal(expectedNew))
},
Entry("Link at root with relative origin", "../usr/lib/file", "file", "/usr/lib/file", "file"),
Entry("Origin is absolute", "/usr/lib/file", "file", "/usr/lib/file", "file"),
Entry("Link in dir with relative origin", "../usr/lib/file", "etc/file", "usr/lib/file", "etc/file"),
Entry("Link in dir with relative origin and up multiple levels", "../../cfg/file", "etc/foo/file", "cfg/file", "etc/foo/file"),
)
})

// writeTarball writes a tar archive to out with filename containing contents at the base path
// with extra bytes written at the end of length extraBytes.
// note: this should only be used as a helper function in tests
Expand Down
Loading