seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default #3588

kolyshkin · 2022-09-02T23:38:53Z

~~Currently a draft pending #3580 merge. When reviewing as a draft, please skip the first 2 commits (they are from #3580).~~

This PR reworks seccomp flags support:

adds new functions KnownFlags(), SupportedFlags() and FlagSupported() to seccomp pkg;
adds showing of known and supported seccomp filter flags to runc features;
consolidates/deduplicates code that handles flags (as much as possible);
changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

~~TODO: fix the seccomp flags test to use runc features and not depend on kernel version (maybe in a separate PR).~~

kolyshkin · 2022-09-02T23:53:07Z

@rata @alban I'll be much obliged if you can PTAL.

rata · 2022-09-05T14:45:06Z

@kolyshkin thanks for the cc! I'll be on and off for a few weeks. Is this something we want for the next runc release and I should try to prioritize its review? Or can it wait a few weeks maybe?

alban · 2022-09-07T11:20:48Z

changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

It is done here:
https://github.com/containers/crun/blob/0b3086af0926146ac13ebdb8639bbbd3c93a2617/src/libcrun/seccomp.c#L198-L200

The man page says:

SECCOMP_FILTER_FLAG_SPEC_ALLOW (since Linux 4.17)
Disable Speculative Store Bypass mitigation.

Does it mean it is less secure with this flag?

kolyshkin · 2022-09-08T19:41:36Z

changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

It is done here: https://github.com/containers/crun/blob/0b3086af0926146ac13ebdb8639bbbd3c93a2617/src/libcrun/seccomp.c#L198-L200

The man page says:

SECCOMP_FILTER_FLAG_SPEC_ALLOW (since Linux 4.17)
Disable Speculative Store Bypass mitigation.

Does it mean it is less secure with this flag?

It sort of is -- meaning, if someone will find a way to use seccomp ebpf program to exploit the hardware SSB issue. Practically, this seems highly unlikely, since the seccomp ebpf code is generated by libseccomp. IOW a user do not have direct control over ebpf code being loaded into the kernel, they can only influence is via the seccomp rules, which makes it much harder to exploit the vulnerability.

kolyshkin · 2022-09-08T19:49:00Z

Also, it seems that in kernel v5.16 this (disabling the mitigations for seccomp) is the default (according to https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/seccomp&id=2f46993d83ff4abb310ef7b4beced56ba96f0d9d, which made its way into v5.16). By the way, the commit mistakenly claims runc sets this flag by default -- it does not (until this PR is merged, that is).

rata

@kolyshkin Thanks, on a cursory look, it LGTM.

I have not tried hard to see if more code simplifications are possible. But the PR seems reasonable to me :)

Left some coments, but cosmetical

rata · 2022-09-27T09:41:05Z

libcontainer/seccomp/patchbpf/enosys_linux.go

@@ -665,6 +665,9 @@ func filterFlags(config *configs.Seccomp, filter *libseccomp.ScmpFilter) (flags
 }

 func sysSeccompSetFilter(flags uint, filter []unix.SockFilter) (fd int, err error) {
+	// This debug output is validated in tests/integration/seccomp.bats
+	// by the SECCOMP_FILTER_FLAG_* test.
+	logrus.Debugf("seccomp filter flags: %d", flags)


Any reason to not use %v? I find it more clear to use %v unless you want something else. Isn't %v printing the same here?

So, %d says "print argument as an integer", and %v means "print argument using the default formatting for its type". Since the type is integer, in this case %v will result in the same output as %d. The only difference (I guess but not entirely sure), is %v incurs slightly more runtime overhead, and the only benefit of using %v here would be that we won't have to change it if we change the flags type.

Since this is clearly an integer, I don't see why we shouldn't use %d. Personally, I find it more clear, as I can see we're printing an integer without looking at the flags type.

So I have to ask the same question to you -- why not use %d?

OK, it seems that you have ignored the description and reviewed the first two commits, which are actually from #3580 and were merged in there.

Oh, yes, I missed the description. Sorry!

I see reasons to one or the other, I personally prefer the other and was wondering if this was on purpose. SGTM if you prefer it :)

rata · 2022-09-27T09:45:33Z

libcontainer/seccomp/seccomp_linux.go

+}
+
+func (e *unknownFlagError) Error() string {
+	return "seccomp flag " + string(e.flag) + " is not known to runc"


not known? Not supported seems a more standard way to say this to the user

I think I copy-pasted that from the earlier code. And yes, I agree that "not supported" seems more common language, OTOH "more known" is more specific in this particular case.

"Not supported" is sort of generic, and can mean multiple things, such as the support for this feature or flags is

not yet added;

dropped;

not compiled it;

is impossible because of the current hardware or the kernel;
and so on.

"Not known" is more specific, meaning this runc version is not aware that such flag exists (so, if it is exists, this is a clear hint that the runc version used is too old).

So, if you take a look at the code adding flags to runc features, you'll see there are "known" and "supported" set of flags (with supported being a subset of known). This is why I'm using the word "known" here.

rata · 2022-09-27T09:48:46Z

tests/integration/seccomp.bats

+	declare -A FLAGS=(
+		['REMOVE']=0 # No setting, use built-in default.
+		['EMPTY']=0  # Empty set of flags.
+		['"SECCOMP_FILTER_FLAG_LOG"']=2
+		['"SECCOMP_FILTER_FLAG_SPEC_ALLOW"']=4
+		['"SECCOMP_FILTER_FLAG_TSYNC"']=0 # tsync flag is ignored.
+		['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_SPEC_ALLOW"']=6
+		['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_TSYNC"']=2
+		['"SECCOMP_FILTER_FLAG_SPEC_ALLOW","SECCOMP_FILTER_FLAG_TSYNC"']=4
+		['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_SPEC_ALLOW","SECCOMP_FILTER_FLAG_TSYNC"']=6


A comment explaining where you got these values seems useful for future collaborators.

Also, the tsync ignored comment can either be in all instances of tsync being used or a more visible comment on the top, maybe?

I wanted to rework this test so that I won't have to add all the combinations manually. Guess this is what I'll do now :)

Reworked the test (see the last commit). Beware: hardcode bash.

rata

LGTM, one minor question

rata · 2022-09-28T11:01:47Z

tests/integration/seccomp.bats

+	# Filter out flags not supported by the current seccomp/kernel,
+	# so this test can be run on older systems.


I'm sure I'm missing something, but why hardcode the vars in the flags array and then ignore the not supported ones? Wouldn't be the same to just test all the supported ones? (and we don't harcode any flag here)

I see we might get unlucky in the CI and maybe some flag is never tested, and that is interesting to know. But we are not detecting that here either, right? Maybe that can be a different test for that (well, not sure it is easy to detect that... and seems unlikely)?

But I'm sure I'm missing something obvious. My brain is off today. Sorry if this is indeed noise :)

Since this test is actually checking that the flags are set as expected (the check was added by commit 26dc55e after we found out in #3580 that SPEC_ALLOW is just ignored), we need to know their numeric values.

Surely, we can just get the list of supported flags (which we should test) from runc features, but then again we still need their values.

Let me see if I can refactor the code to simplify it more (use supported flags as received from runc directly, and only use the hardcoded ones for numeric values).

Oh, and wouldn't make sense for runc features to also print the numeric values (if we can easily parse it here, not worth the pain if this is hard)?

But the code is okay as it is now too from my POV :)

Oh, and wouldn't make sense for runc features to also print the numeric values (if we can easily parse it here, not worth the pain if this is hard)?

Well, no one ever needs the numeric values (except for this test), and runc features target audience is runc users, not our tests.

Thus we have to hardcode it (unfortunately there is no easy way to see if a seccomp flag is set or not, so the test have to rely on debug output and compare the numbers).

Ohh, ok. I thought it might have been present on some errors as hex or something maybe. But makes sense, in any case it is not so useful either

kolyshkin · 2022-10-04T18:59:04Z

not ok 104 runc run [seccomp] (SECCOMP_FILTER_FLAG_*)
# (in test file tests/integration/seccomp.bats, line 104)
#   `mapfile -td, flags < <(__runc features | jq -c '.linux.seccomp.supported_flags' | tr -d '[]\n')' failed with status 2
# runc spec (status=0):
# 
# /tmp/bats-run-14469/bats.26308.src: line 104: mapfile: -d: invalid option

Seems I've used a feature of bash which is relatively new.

Rewrote without using -d option for mapfile

kolyshkin · 2022-10-04T18:59:53Z

@AkihiroSuda @cyphar @thaJeztah PTAL

types/features/features.go

AkihiroSuda · 2022-10-04T22:59:46Z

types/features/features.go

+	// Nil value means "unknown", not "no support for any flag".
+	KnownFlags []string `json:"known_flags,omitempty"`
+
+	// Flags is the list of the supported filter flags, e.g., "SECCOMP_FILTER_FLAG_LOG".


Doesn't match the variable name

How is "supported" different from "recognized"?

Having two concepts doesn't seem consistent with other properties in this JSON

Doesn't match the variable name

oops, fixed

In this context "recognized" or "known" means runc is aware of the flag.

OTOH "supported" means this flag is not merely known to (recognized by) runc, but is also supported by the currently used libseccomp and the running kernel.

If you don't like such distinction, we can only print one set -- choose which one makes more sense. To me, "supported" makes more sense than "known" because, despite the flag being known, one can't use it anyway. The downside is one can't know if more flags would be supported after e.g. a kernel or libseccomp upgrade.

@AkihiroSuda PTAL 🙏🏻

@AkihiroSuda let me add some more context here.

All the "features" that we have in runc features are "known features", meaning they are recognized by this runc build. Some of those (like mount flags, for example) can not be used because the kernel version is too old, some library version is old, or a binary is missing. So, the feature is known but not supported.

For the sake of the seccomp flags test case, I need to have seccomp flags that are supported by the current runtime (meaning runc + libraries + kernel), so I added a distinction between "known" and "supported" flags.

I guess that the use case for this (distinguishing between known and supported flags) can also be used by the upper level (container orchestration) to query the runtime platform capabilities. Say, some mount attributes (like "rro") might be known but not supported due to older kernel.

I agree that this distinction doesn't really fit into current output of runc features, since we print known (rather than supported) features and flags.

Do you have any idea how to add such a distinction (between known and supported), if we ever want to add it? Something like runc features --supported (and maybe --known for the sake of completeness?).

It is really hard to write a seccomp flags test (in bash, that is) without relying on runc features to print the list of supported seccomp flags, but I guess this is a separate issue. Perhaps I can rewrite the test in Go (and make it the first runc integration test written in Go).

kolyshkin · 2022-10-15T02:31:28Z

@mrunalp addressed your review comments

kolyshkin · 2022-10-18T23:31:52Z

@cyphar @AkihiroSuda @thaJeztah PTAL

kolyshkin · 2022-10-27T00:20:33Z

@cyphar @AkihiroSuda @thaJeztah PTAL

Fix a few copy-paste errors. Fixes: 520702d Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Amend runc features to print seccomp flags. Two set of flags are added: * known flags are those that this version of runc is aware of; * supported flags are those that can be set; normally, this is the same set as known flags, but due to older version of kernel and/or libseccomp, some known flags might be unsupported. This commit also consolidates three different switch statements dealing with flags into one, in func setFlag. A note is added to this function telling what else to look for when adding new flags. Unfortunately, it also adds a list of known flags, that should be kept in sync with the switch statement. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

If no seccomps flags are set in OCI runtime spec (not even the empty set), set SPEC_ALLOW as the default (if it's supported). Otherwise, use the flags as they are set (that includes no flags for empty seccomp.Flags array). This mimics the crun behavior, and makes runc seccomp performance on par with crun. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

This test (initially added by commit 58ea21d and later amended in commit 26dc55e) currently has two major deficiencies: 1. All possible flag combinations, and their respective numeric values, have to be explicitly listed. Currently we support 3 flags, so there is only 2^3 - 1 = 7 combinations, but adding more flags will become increasingly difficult (for example, 5 flags will result in 31 combinations). 2. The test requires kernel 4.17 (for SECCOMP_FILTER_FLAG_SPEC_ALLOW), and not doing any tests when running on an older kernel. This, too, will make it more difficult to add extra flags in the future. Both issues can be solved by using runc features which now prints all known and supported runc flags. We still have to hardcode the numeric values of all flags, but most of the other work is coded now. In particular: * The test only uses supported flags, meaning it can be used with older kernels, removing the limitation (2) above. * The test calculates the powerset (all possible combinations) of flags and their numeric values. This makes it easier to add more flags, removing the limitation (1) above. * The test will fail (in flags_value) if any new flags will be added to runc but the test itself is not amended. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2022-12-15T23:32:54Z

@opencontainers/runc-maintainers anything I can do to move this forward?

AkihiroSuda

Thanks, sorry for delay in review.

LGTM but needs #3700

AkihiroSuda · 2023-01-20T10:27:46Z

Can't click the merge button until getting one more LGTM 😞

Opened a proposal to uncheck the Dismiss stale pull request approvals when new commits are pushed checkbox

Uncheck Dismiss stale pull request approvals when new commits are pushed #3706

dqminh · 2023-01-20T11:28:22Z

LGTM !

bobbypage · 2023-02-07T01:05:43Z

Is there any thoughts around backporting this change to runc 1.1?

The change in ac04154 is especially important in context of k8s graduating secomp to default in 1.27 (kubernetes/enhancements#3718).

The change in ac04154 is critical for performance as otherwise there is very significant perf degradation with seccomp enabled. The only alternative is for users to use 5.16 kernel (which is not a long term release; only 5.15 is) or alternatively tune their kernel flags.

bobbypage · 2023-02-07T01:06:50Z

/cc @saschagrunert

rata · 2023-02-22T12:15:29Z

@bobbypage I'll be fine with it. Although, I'd prefer if we can make a smaller change to just enable SPEC_ALLOW by default (that is what you need, right?) and have the rest of this patch land in runc 1.2. That can be potentially way simpler (i hope :)). I've not tried to see if it is easy to create a backport a simpler patch just for the SPEC_ALLOW by default case, though.

Any maintainers has thoughts on this?

kolyshkin requested review from AkihiroSuda and cyphar and removed request for AkihiroSuda September 2, 2022 23:39

kolyshkin force-pushed the seccomp-flags-rework branch from 7548ef4 to 924ce7a Compare September 2, 2022 23:40

kolyshkin mentioned this pull request Sep 2, 2022

seccomp: set SPEC_ALLOW by default #3581

Closed

kolyshkin added area/seccomp kind/refactor refactoring impact/changelog labels Sep 2, 2022

kolyshkin added this to the 1.2.0 milestone Sep 2, 2022

rata previously approved these changes Sep 27, 2022

View reviewed changes

kolyshkin dismissed rata’s stale review via 8f9d08a September 28, 2022 01:48

kolyshkin force-pushed the seccomp-flags-rework branch from 924ce7a to 8f9d08a Compare September 28, 2022 01:48

rata previously approved these changes Sep 28, 2022

View reviewed changes

kolyshkin dismissed rata’s stale review via ca9bf61 September 28, 2022 21:21

kolyshkin force-pushed the seccomp-flags-rework branch 3 times, most recently from 560aa07 to 57dc492 Compare October 3, 2022 23:17

kolyshkin marked this pull request as ready for review October 4, 2022 18:59

kolyshkin requested a review from rata October 4, 2022 18:59

AkihiroSuda reviewed Oct 4, 2022

View reviewed changes

types/features/features.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Oct 4, 2022

View reviewed changes

types/features/features.go Outdated Show resolved Hide resolved

AkihiroSuda reviewed Oct 4, 2022

View reviewed changes

kolyshkin force-pushed the seccomp-flags-rework branch from 57dc492 to f89f2df Compare October 5, 2022 02:03

kolyshkin requested a review from mrunalp October 18, 2022 18:52

mrunalp previously approved these changes Oct 18, 2022

View reviewed changes

kolyshkin force-pushed the seccomp-flags-rework branch from f510e5f to 80639c4 Compare October 18, 2022 23:32

kolyshkin dismissed mrunalp’s stale review via 80639c4 October 27, 2022 18:29

kolyshkin requested a review from mrunalp November 2, 2022 17:11

kolyshkin force-pushed the seccomp-flags-rework branch 2 times, most recently from 485b562 to a0aa305 Compare November 2, 2022 22:02

mrunalp previously approved these changes Nov 21, 2022

View reviewed changes

kolyshkin added 4 commits November 29, 2022 17:24

types/features: fix docstrings

ab84808

Fix a few copy-paste errors. Fixes: 520702d Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin force-pushed the seccomp-flags-rework branch from a0aa305 to 19a9d9f Compare November 30, 2022 01:24

kolyshkin dismissed mrunalp’s stale review via 19a9d9f December 8, 2022 20:39

AkihiroSuda approved these changes Jan 17, 2023

View reviewed changes

AkihiroSuda requested a review from mrunalp January 17, 2023 23:36

dqminh approved these changes Jan 20, 2023

View reviewed changes

AkihiroSuda merged commit a2f27f0 into opencontainers:main Jan 20, 2023

AkihiroSuda mentioned this pull request Jan 25, 2023

Add features.md to formalize the runc features JSON opencontainers/runtime-spec#1130

Merged

mrunalp mentioned this pull request Feb 3, 2023

KEP-2413: Graduate SeccompDefault to stable kubernetes/enhancements#3718

Merged

cyphar mentioned this pull request Mar 14, 2024

release 1.2.0-rc.1 #4221

Merged

		# Filter out flags not supported by the current seccomp/kernel,
		# so this test can be run on older systems.

seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default #3588

seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default #3588

Conversation

kolyshkin commented Sep 2, 2022 • edited Loading

kolyshkin commented Sep 2, 2022

rata commented Sep 5, 2022

alban commented Sep 7, 2022

kolyshkin commented Sep 8, 2022

kolyshkin commented Sep 8, 2022

rata left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata Sep 27, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata left a comment

Choose a reason for hiding this comment

rata Sep 28, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented Oct 4, 2022

kolyshkin commented Oct 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin Oct 5, 2022 • edited Loading

Choose a reason for hiding this comment

kolyshkin Oct 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented Oct 15, 2022

kolyshkin commented Oct 18, 2022

kolyshkin commented Oct 27, 2022

kolyshkin commented Dec 15, 2022

AkihiroSuda left a comment

Choose a reason for hiding this comment

AkihiroSuda commented Jan 20, 2023

dqminh commented Jan 20, 2023

bobbypage commented Feb 7, 2023 • edited Loading

bobbypage commented Feb 7, 2023

rata commented Feb 22, 2023

kolyshkin commented Sep 2, 2022 •

edited

Loading

rata left a comment •

edited

Loading

rata Sep 27, 2022 •

edited

Loading

rata Sep 28, 2022 •

edited

Loading

kolyshkin Oct 5, 2022 •

edited

Loading

kolyshkin Oct 5, 2022 •

edited

Loading

bobbypage commented Feb 7, 2023 •

edited

Loading