Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seccomp: initial small additions to the rules #2040

Merged
merged 9 commits into from
Jun 7, 2024

Conversation

martinetd
Copy link
Contributor

This is the first part of the seccomp policy work described in #1988: the small things that probably won't be too controversial -- sorry for the delay.

I have finished the rest all the way to unshare/mount blocked as per docker policies in my seccomp-hardened branch, this one has a few more restrictions that are in place on docker so probably would not cause much trouble but still are more restrictive, so left out for now (that branch just modified the same filter, but it'll be a separate file when ready to open as a PR; I was just using it to test ; I've noticed a few more syscalls I'll want to look at closer and maybe block so there's a bit of work left first)

Each commit is small and self-contained, so please refer to commit messages for individual changes -- most of the new blockages are obsolete or never existed from what I can tell; the exceptions that might be worth discussing are:

  • reboot, the syscall itself should do the capability check anyway
  • pivot_root, it's not allowed even with CAP_SYS_ADMIN for docker so I don't think there are any users of it in containers, if someone thinks it might be a problem I'll move that commit for the restricted profile later PR
  • syscall, having this really makes the seccomp profile moot so I think we need to disable it; as far as I know there are no valid users of this syscall I've only seen it used in exploits (to be extra clear, it doesn't block the syscall assembly instruction (int 0x80) so syscall(__NR_foo, ..) will still work, it blocks syscall(__NR_syscall, __NR_foo, ..))

The rest should be strictly more permissive than currently unless I missed something.

martinetd added 5 commits June 5, 2024 12:42
These have been replaced by the rt_sigaction family, and have not been
compiled in on most kernels since linux v3.9 (2013)

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
timerfd as a syscall does not seem to have ever existed,
remove it from allowed syscalls list.

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
The following syscalls have been added in recent kernels and considered
for this list:
 - cachestat, prints information about cache misses; it is less accurate
   than userfaultfd so probably safe but deny it until a clear need
   shows up
 - io_pgetevents_time64: io_pgetevents is already blocked, so block this
   variant as well. Note these are pretty close to io_getenvents, so we
   should probably block that as well, but since it is currently allowed
   keep that where it is.
 - map_shadow_stack: this allows creating a new shadow stack, required for
   user-space threading if shadow stack verification is enabled (prctl
   PR_SET_SHADOW_STACK_STATUS with PR_SHADOW_STACK_ENABLE); this might
   be required in the future but delay this decision until someone
   requests it
 - futex_* new interface is primarily intended for io_uring which we
   disallow, and does not have any known user yet so likewise block
   until someone requests it.
 - quotactl_fd: this is identical to quotactl, so only allow for
   SYS_ADMIN like quotactl.

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
swapcontext seems to be used for coroutines in some languages (at least
ruby), enough to have been added to other major engines by an actual user.

Link: moby/moby#43092
Link: systemd/systemd#9487
Link: containerd/containerd#6411
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
apparently harmless and used

Link: systemd/systemd#25018
Link: containerd/containerd#6882
Link: moby/moby#43553
Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
@Luap99
Copy link
Member

Luap99 commented Jun 5, 2024

@giuseppe PTAL

@@ -293,7 +294,6 @@ func DefaultProfile() *Seccomp {
"pidfd_send_signal",
"pipe",
"pipe2",
"pivot_root",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't pivot_root used by container runtimes so this breaks the nested container case I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need this otherwise we break nested containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry I should have tested this, I was somehow convinced it wasn't used. I've removed that commit.

Comment on lines 650 to 658
Names: []string{
"reboot",
},
Action: ActAllow,
Args: []*Arg{},
Includes: Filter{
Caps: []string{"CAP_SYS_BOOT"},
},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this additional check in the profile since it is already in place in the kernel

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of these is to stack layers so if the kernel gets a check wrong we had this safety, but I just had a look at the reboot syscall definition and checking the capability is literally the first thing done so there isn't much room for mistake here;
ok, removing this.

"io_pgetevents",
"io_pgetevents_time64",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets block io_getenvents,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we block io_getevents too I think programs using libaio will stop working; this is just listing a syscall that was already blocked.
I agree the hardened profile can definitely block the io_{setup,destroy,cancel,getevents,submit} family of syscalls like we have io_uring blocked, but I'm not sure we should block them here.

Perhaps for podman 6.0 defaults?

@rhatdan
Copy link
Member

rhatdan commented Jun 5, 2024

Great work @martinetd Thanks for this effort
A couple of changes requested and then we can move forward.
Glad to see security increasing, even if it is minimal.

@rhatdan
Copy link
Member

rhatdan commented Jun 5, 2024

/approve

@openshift-ci openshift-ci bot added the approved label Jun 5, 2024
martinetd added 4 commits June 6, 2024 11:23
the readdir syscall hasn't existed forever (wasn't present in linux
2.6.12 initial import into git), remove it and don't even bother adding
it to the list of EPERM syscalls

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
This does not deny anything new (bpf is currently allowed for sys admin)

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
This doesn't deny anything new (perf_event_open is currently allowed for
SYS_ADMIN)

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
syscall() emulates all other syscalls, so having this allowed makes no
sense as far as seccomp filters go.

This is a breaking change, but this probably will not break much.

Signed-off-by: Dominique Martinet <dominique.martinet@atmark-techno.com>
Copy link
Contributor Author

@martinetd martinetd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews, I've replied/updated as pointed out.
I'll do some more tests (including podman in podman!) with this later today, but I think it should be good to go.

@@ -293,7 +294,6 @@ func DefaultProfile() *Seccomp {
"pidfd_send_signal",
"pipe",
"pipe2",
"pivot_root",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, sorry I should have tested this, I was somehow convinced it wasn't used. I've removed that commit.

Comment on lines 650 to 658
Names: []string{
"reboot",
},
Action: ActAllow,
Args: []*Arg{},
Includes: Filter{
Caps: []string{"CAP_SYS_BOOT"},
},
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of these is to stack layers so if the kernel gets a check wrong we had this safety, but I just had a look at the reboot syscall definition and checking the capability is literally the first thing done so there isn't much room for mistake here;
ok, removing this.

"io_pgetevents",
"io_pgetevents_time64",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we block io_getevents too I think programs using libaio will stop working; this is just listing a syscall that was already blocked.
I agree the hardened profile can definitely block the io_{setup,destroy,cancel,getevents,submit} family of syscalls like we have io_uring blocked, but I'm not sure we should block them here.

Perhaps for podman 6.0 defaults?

@rhatdan
Copy link
Member

rhatdan commented Jun 6, 2024

LGTM

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

openshift-ci bot commented Jun 7, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, martinetd, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 9331830 into containers:main Jun 7, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants