Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes inability to use /dev/null when inside a container #3623

Merged
merged 1 commit into from
Oct 12, 2022

Conversation

evanphx
Copy link
Contributor

@evanphx evanphx commented Oct 3, 2022

This is a forward port of #3620

The original code depended on the origin filesystem to have /dev/{block,char} populated. This is done by udev normally and while is very common non-containerized systemd installs, it's very easy to start systemd in a container created by runc itself and not have /dev/{block,char} populated. When this occurs, the following error output is observed:

$ docker run hello-world
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error reopening /dev/null inside container: open /dev/null: operation not permitted: unknown.

/dev/null can't be opened because it was not added to the deviceAllowList, as there was no /dev/char directory. The change here utilizes the fact that when sysfs in in use, there is a /sys/dev/{block,char} that is kernel maintained that we can check.

Signed-off-by: Evan Phoenix evan@phx.io

testPath, err := securejoin.SecureJoin("/sys", entry.Path)
if err != nil {
logrus.Errorf("error joining entry path: %s", err)
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's an error, I guess we should not continue but return it.

Copy link
Contributor

@kolyshkin kolyshkin Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we should not use securejoin here, since this is merely a check that such device exists, and we do not use testPath in any other way. Worst case scenario, a bad guy will trick the system into adding a line for a device that does not exist into systemd unit properties.

So, I guess, using plain old path.Join is sufficient here.

@cyphar WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original patch didn't have secure join, but I saw @AkihiroSuda preferred having it (not sure if it's really needed here indeed); see #3620 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM, but probably the reason not to use securejoin should be recorded in the comment lines in the source

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I removed securejoin and added a comment why it's not used.

Copy link
Member

@cyphar cyphar Oct 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah securejoin is not necessary here (these are host paths anyway).

This is a forward port of opencontainers#3620

The original code depended on the origin filesystem to have
/dev/{block,char} populated. This is done by udev normally and while is
very common non-containerized systemd installs, it's very easy to start
systemd in a container created by runc itself and not have
/dev/{block,char} populated. When this occurs, the following error
output is observed:

$ docker run hello-world
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error reopening /dev/null inside container: open /dev/null: operation not permitted: unknown.

/dev/null can't be opened because it was not added to the
deviceAllowList, as there was no /dev/char directory. The change here
utilizes the fact that when sysfs in in use, there is a
/sys/dev/{block,char} that is kernel maintained that we can check.

Signed-off-by: Evan Phoenix <evan@phx.io>
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kolyshkin kolyshkin added kind/bug area/systemd backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 labels Oct 11, 2022
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I do wonder if we could just always use /sys/dev/....

@cyphar cyphar closed this in 56b01e7 Oct 12, 2022
@cyphar cyphar merged commit 56b01e7 into opencontainers:main Oct 12, 2022

// We don't bother with securejoin here because we create entry.Path
// right above here, so we know it's safe.
if _, err := os.Stat("/sys" + entry.Path); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realised we probably want to do entry.Path = "/sys" + entry.Path here, don't we? Otherwise systemd will get a /dev/... path that doesn't exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I’m not entirely clear on what systemd does with these values but I can tell you that in my setup, those /dev/char entries don’t exist and systemd still does the right thing. Maybe it parses the path and uses the major:minor directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, the /sys/dev/{block,char}/<major>:<minor> is a directory, not a device file, so we can't use it for systemd.

Perhaps systemd has a set of built-in devices it can recognize. I'll take a look.

@kolyshkin
Copy link
Contributor

backport to 1.1 is done in #3620

@kolyshkin kolyshkin added backport/done/1.1 A PR in main branch which was backported to release-1.1 and removed backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 labels Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/systemd backport/done/1.1 A PR in main branch which was backported to release-1.1 kind/bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants