Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youki fails when some unused capability is missing #1999

Closed
jprendes opened this issue Jun 2, 2023 · 4 comments · Fixed by #2000
Closed

Youki fails when some unused capability is missing #1999

jprendes opened this issue Jun 2, 2023 · 4 comments · Fixed by #2000

Comments

@jprendes
Copy link
Contributor

jprendes commented Jun 2, 2023

I am trying to run youki in a container context (think Docker in Docker), and it fails to run if the container doesn't have all the capabilities enabled.

In particular, this also happens if some capabilities added in kernel 5.8 / 5.9 are not present (like CAP_BPF or CAP_CHECKPOINT_RESTORE). I haven't tested in a post-5.3 and pre-5.9 kernel, but I think it would hit the same issue.

Note that runc does work in the same situation.

Please see the attachment for reproduction steps: reproduction.zip

@jprendes jprendes changed the title Youki fails some (non-requested) capability is missing Youki fails when some unused capability is missing Jun 2, 2023
@jprendes
Copy link
Contributor Author

jprendes commented Jun 2, 2023

The issue is caused by reset_effective setting all the capabilities listed in caps::all(), which includes the capabilities added in kernel 5.8 / 5.9.
I don't understand why this is needed, in either in container_init_process or set_id.

@yihuaf
Copy link
Collaborator

yihuaf commented Jun 2, 2023

@jprendes Thank you for the bug report. dind is a special case where we have not tested before. So I am not surprised that it does not work out of box. Also thank you for the PR, but I remember the reset capabilities has a purpose. However, it's been a while since I touched that part of the code, so I don't recall exactly the reason off the top of my head. I will have to take a deeper look.

I believe the first step to address this issue is to set up an integration test like we do for containerd and k8s. Then we can start to explore how to consistently support the dind usecase.

In addition, it would be good to understand why runc works whereas we fail in this case.

The issue is caused by reset_effective setting all the capabilities listed in caps::all(), which includes the capabilities added in kernel 5.8 / 5.9.

This sounds like a bug to me. May be caps::all() is not the right choice here.

@jprendes
Copy link
Contributor Author

jprendes commented Jun 2, 2023

@yihuaf thanks for looking into this.
I've simplified the reproductions to not involve docker. Just plain youki and plain runc running inside a container.
reproduction.zip

Additionally, doing an strace -f --trace=capset ... on youki and runc, I see that youki does 4 calls to capset:

capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=ALL, permitted=ALL, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, permitted=ALL, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, permitted=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, inheritable=0}) = 0
capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, permitted=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, inheritable=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE}) = 0

while runc does only one call (the same as the last call from youki):

capset({version=_LINUX_CAPABILITY_VERSION_3, pid=0}, {effective=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, permitted=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE, inheritable=1<<CAP_KILL|1<<CAP_NET_BIND_SERVICE|1<<CAP_AUDIT_WRITE}) = 0

@jprendes
Copy link
Contributor Author

jprendes commented Jun 5, 2023

May be caps::all() is not the right choice here.

Yeah, I think this is the way to go.

The problem is that the thread is that caps::all() returns a hardcoded list. The thread can only acquire capabilities in the permitted set, so it should be caps::read(None, CapSet::Permitted), anything beyond that will result in an error.

I've updated the PR to reflect this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants