Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default #3588

Merged
merged 4 commits into from
Jan 20, 2023

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Sep 2, 2022

Currently a draft pending #3580 merge. When reviewing as a draft, please skip the first 2 commits (they are from #3580).

This PR reworks seccomp flags support:

  • adds new functions KnownFlags(), SupportedFlags() and FlagSupported() to seccomp pkg;
  • adds showing of known and supported seccomp filter flags to runc features;
  • consolidates/deduplicates code that handles flags (as much as possible);
  • changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

TODO: fix the seccomp flags test to use runc features and not depend on kernel version (maybe in a separate PR).

@kolyshkin
Copy link
Contributor Author

@rata @alban I'll be much obliged if you can PTAL.

@rata
Copy link
Member

rata commented Sep 5, 2022

@kolyshkin thanks for the cc! I'll be on and off for a few weeks. Is this something we want for the next runc release and I should try to prioritize its review? Or can it wait a few weeks maybe?

@alban
Copy link
Contributor

alban commented Sep 7, 2022

changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

It is done here:
https://github.com/containers/crun/blob/0b3086af0926146ac13ebdb8639bbbd3c93a2617/src/libcrun/seccomp.c#L198-L200

The man page says:

SECCOMP_FILTER_FLAG_SPEC_ALLOW (since Linux 4.17)
Disable Speculative Store Bypass mitigation.

Does it mean it is less secure with this flag?

@kolyshkin
Copy link
Contributor Author

changes the default (when no flags are set) to have SPEC_ALLOW, if supported (crun does that).

It is done here: https://github.com/containers/crun/blob/0b3086af0926146ac13ebdb8639bbbd3c93a2617/src/libcrun/seccomp.c#L198-L200

The man page says:

SECCOMP_FILTER_FLAG_SPEC_ALLOW (since Linux 4.17)
Disable Speculative Store Bypass mitigation.

Does it mean it is less secure with this flag?

It sort of is -- meaning, if someone will find a way to use seccomp ebpf program to exploit the hardware SSB issue. Practically, this seems highly unlikely, since the seccomp ebpf code is generated by libseccomp. IOW a user do not have direct control over ebpf code being loaded into the kernel, they can only influence is via the seccomp rules, which makes it much harder to exploit the vulnerability.

@kolyshkin
Copy link
Contributor Author

Also, it seems that in kernel v5.16 this (disabling the mitigations for seccomp) is the default (according to https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/commit/?h=for-next/seccomp&id=2f46993d83ff4abb310ef7b4beced56ba96f0d9d, which made its way into v5.16). By the way, the commit mistakenly claims runc sets this flag by default -- it does not (until this PR is merged, that is).

rata
rata previously approved these changes Sep 27, 2022
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolyshkin Thanks, on a cursory look, it LGTM.

I have not tried hard to see if more code simplifications are possible. But the PR seems reasonable to me :)

Left some coments, but cosmetical

@@ -665,6 +665,9 @@ func filterFlags(config *configs.Seccomp, filter *libseccomp.ScmpFilter) (flags
}

func sysSeccompSetFilter(flags uint, filter []unix.SockFilter) (fd int, err error) {
// This debug output is validated in tests/integration/seccomp.bats
// by the SECCOMP_FILTER_FLAG_* test.
logrus.Debugf("seccomp filter flags: %d", flags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not use %v? I find it more clear to use %v unless you want something else. Isn't %v printing the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, %d says "print argument as an integer", and %v means "print argument using the default formatting for its type". Since the type is integer, in this case %v will result in the same output as %d. The only difference (I guess but not entirely sure), is %v incurs slightly more runtime overhead, and the only benefit of using %v here would be that we won't have to change it if we change the flags type.

Since this is clearly an integer, I don't see why we shouldn't use %d. Personally, I find it more clear, as I can see we're printing an integer without looking at the flags type.

So I have to ask the same question to you -- why not use %d?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it seems that you have ignored the description and reviewed the first two commits, which are actually from #3580 and were merged in there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, I missed the description. Sorry!

I see reasons to one or the other, I personally prefer the other and was wondering if this was on purpose. SGTM if you prefer it :)

}

func (e *unknownFlagError) Error() string {
return "seccomp flag " + string(e.flag) + " is not known to runc"
Copy link
Member

@rata rata Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not known? Not supported seems a more standard way to say this to the user

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I copy-pasted that from the earlier code. And yes, I agree that "not supported" seems more common language, OTOH "more known" is more specific in this particular case.

"Not supported" is sort of generic, and can mean multiple things, such as the support for this feature or flags is

  • not yet added;
  • dropped;
  • not compiled it;
  • is impossible because of the current hardware or the kernel;
    and so on.

"Not known" is more specific, meaning this runc version is not aware that such flag exists (so, if it is exists, this is a clear hint that the runc version used is too old).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, if you take a look at the code adding flags to runc features, you'll see there are "known" and "supported" set of flags (with supported being a subset of known). This is why I'm using the word "known" here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines 82 to 91
declare -A FLAGS=(
['REMOVE']=0 # No setting, use built-in default.
['EMPTY']=0 # Empty set of flags.
['"SECCOMP_FILTER_FLAG_LOG"']=2
['"SECCOMP_FILTER_FLAG_SPEC_ALLOW"']=4
['"SECCOMP_FILTER_FLAG_TSYNC"']=0 # tsync flag is ignored.
['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_SPEC_ALLOW"']=6
['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_TSYNC"']=2
['"SECCOMP_FILTER_FLAG_SPEC_ALLOW","SECCOMP_FILTER_FLAG_TSYNC"']=4
['"SECCOMP_FILTER_FLAG_LOG","SECCOMP_FILTER_FLAG_SPEC_ALLOW","SECCOMP_FILTER_FLAG_TSYNC"']=6
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment explaining where you got these values seems useful for future collaborators.

Also, the tsync ignored comment can either be in all instances of tsync being used or a more visible comment on the top, maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to rework this test so that I won't have to add all the combinations manually. Guess this is what I'll do now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked the test (see the last commit). Beware: hardcode bash.

rata
rata previously approved these changes Sep 28, 2022
Copy link
Member

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one minor question

Comment on lines 104 to 105
# Filter out flags not supported by the current seccomp/kernel,
# so this test can be run on older systems.
Copy link
Member

@rata rata Sep 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure I'm missing something, but why hardcode the vars in the flags array and then ignore the not supported ones? Wouldn't be the same to just test all the supported ones? (and we don't harcode any flag here)

I see we might get unlucky in the CI and maybe some flag is never tested, and that is interesting to know. But we are not detecting that here either, right? Maybe that can be a different test for that (well, not sure it is easy to detect that... and seems unlikely)?

But I'm sure I'm missing something obvious. My brain is off today. Sorry if this is indeed noise :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this test is actually checking that the flags are set as expected (the check was added by commit 26dc55e after we found out in #3580 that SPEC_ALLOW is just ignored), we need to know their numeric values.

Surely, we can just get the list of supported flags (which we should test) from runc features, but then again we still need their values.

Let me see if I can refactor the code to simplify it more (use supported flags as received from runc directly, and only use the hardcoded ones for numeric values).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and wouldn't make sense for runc features to also print the numeric values (if we can easily parse it here, not worth the pain if this is hard)?

But the code is okay as it is now too from my POV :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and wouldn't make sense for runc features to also print the numeric values (if we can easily parse it here, not worth the pain if this is hard)?

Well, no one ever needs the numeric values (except for this test), and runc features target audience is runc users, not our tests.

Thus we have to hardcode it (unfortunately there is no easy way to see if a seccomp flag is set or not, so the test have to rely on debug output and compare the numbers).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, ok. I thought it might have been present on some errors as hex or something maybe. But makes sense, in any case it is not so useful either

@kolyshkin
Copy link
Contributor Author

not ok 104 runc run [seccomp] (SECCOMP_FILTER_FLAG_*)
# (in test file tests/integration/seccomp.bats, line 104)
#   `mapfile -td, flags < <(__runc features | jq -c '.linux.seccomp.supported_flags' | tr -d '[]\n')' failed with status 2
# runc spec (status=0):
# 
# /tmp/bats-run-14469/bats.26308.src: line 104: mapfile: -d: invalid option

Seems I've used a feature of bash which is relatively new.

Rewrote without using -d option for mapfile

@kolyshkin kolyshkin marked this pull request as ready for review October 4, 2022 18:59
@kolyshkin kolyshkin requested a review from rata October 4, 2022 18:59
@kolyshkin
Copy link
Contributor Author

@AkihiroSuda @cyphar @thaJeztah PTAL

// Nil value means "unknown", not "no support for any flag".
KnownFlags []string `json:"known_flags,omitempty"`

// Flags is the list of the supported filter flags, e.g., "SECCOMP_FILTER_FLAG_LOG".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't match the variable name

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is "supported" different from "recognized"?

Having two concepts doesn't seem consistent with other properties in this JSON

Copy link
Contributor Author

@kolyshkin kolyshkin Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't match the variable name

oops, fixed

Copy link
Contributor Author

@kolyshkin kolyshkin Oct 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this context "recognized" or "known" means runc is aware of the flag.

OTOH "supported" means this flag is not merely known to (recognized by) runc, but is also supported by the currently used libseccomp and the running kernel.

If you don't like such distinction, we can only print one set -- choose which one makes more sense. To me, "supported" makes more sense than "known" because, despite the flag being known, one can't use it anyway. The downside is one can't know if more flags would be supported after e.g. a kernel or libseccomp upgrade.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda PTAL 🙏🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AkihiroSuda let me add some more context here.

All the "features" that we have in runc features are "known features", meaning they are recognized by this runc build. Some of those (like mount flags, for example) can not be used because the kernel version is too old, some library version is old, or a binary is missing. So, the feature is known but not supported.

For the sake of the seccomp flags test case, I need to have seccomp flags that are supported by the current runtime (meaning runc + libraries + kernel), so I added a distinction between "known" and "supported" flags.

I guess that the use case for this (distinguishing between known and supported flags) can also be used by the upper level (container orchestration) to query the runtime platform capabilities. Say, some mount attributes (like "rro") might be known but not supported due to older kernel.

I agree that this distinction doesn't really fit into current output of runc features, since we print known (rather than supported) features and flags.

Do you have any idea how to add such a distinction (between known and supported), if we ever want to add it? Something like runc features --supported (and maybe --known for the sake of completeness?).

It is really hard to write a seccomp flags test (in bash, that is) without relying on runc features to print the list of supported seccomp flags, but I guess this is a separate issue. Perhaps I can rewrite the test in Go (and make it the first runc integration test written in Go).

@kolyshkin
Copy link
Contributor Author

@mrunalp addressed your review comments

mrunalp
mrunalp previously approved these changes Oct 18, 2022
@kolyshkin
Copy link
Contributor Author

@cyphar @AkihiroSuda @thaJeztah PTAL

@kolyshkin
Copy link
Contributor Author

@cyphar @AkihiroSuda @thaJeztah PTAL

mrunalp
mrunalp previously approved these changes Nov 21, 2022
Fix a few copy-paste errors.

Fixes: 520702d
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Amend runc features to print seccomp flags. Two set of flags are added:
 * known flags are those that this version of runc is aware of;
 * supported flags are those that can be set; normally, this is the same
   set as known flags, but due to older version of kernel and/or
   libseccomp, some known flags might be unsupported.

This commit also consolidates three different switch statements dealing
with flags into one, in func setFlag. A note is added to this function
telling what else to look for when adding new flags.

Unfortunately, it also adds a list of known flags, that should be
kept in sync with the switch statement.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
If no seccomps flags are set in OCI runtime spec (not even the empty
set), set SPEC_ALLOW as the default (if it's supported).

Otherwise, use the flags as they are set (that includes no flags for
empty seccomp.Flags array).

This mimics the crun behavior, and makes runc seccomp performance on par
with crun.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
This test (initially added by commit 58ea21d and later amended in
commit 26dc55e) currently has two major deficiencies:

1. All possible flag combinations, and their respective numeric values,
   have to be explicitly listed. Currently we support 3 flags, so
   there is only 2^3 - 1 = 7 combinations, but adding more flags will
   become increasingly difficult (for example, 5 flags will result in
   31 combinations).

2. The test requires kernel 4.17 (for SECCOMP_FILTER_FLAG_SPEC_ALLOW),
   and not doing any tests when running on an older kernel. This, too,
   will make it more difficult to add extra flags in the future.

Both issues can be solved by using runc features which now prints all
known and supported runc flags. We still have to hardcode the numeric
values of all flags, but most of the other work is coded now.

In particular:

 * The test only uses supported flags, meaning it can be used with
   older kernels, removing the limitation (2) above.

 * The test calculates the powerset (all possible combinations) of
   flags and their numeric values. This makes it easier to add more
   flags, removing the limitation (1) above.

 * The test will fail (in flags_value) if any new flags will be added
   to runc but the test itself is not amended.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Contributor Author

@opencontainers/runc-maintainers anything I can do to move this forward?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sorry for delay in review.

LGTM but needs #3700

@AkihiroSuda
Copy link
Member

Can't click the merge button until getting one more LGTM 😞

Opened a proposal to uncheck the Dismiss stale pull request approvals when new commits are pushed checkbox

@dqminh
Copy link
Contributor

dqminh commented Jan 20, 2023

LGTM !

@bobbypage
Copy link

bobbypage commented Feb 7, 2023

Is there any thoughts around backporting this change to runc 1.1?

The change in ac04154 is especially important in context of k8s graduating secomp to default in 1.27 (kubernetes/enhancements#3718).

The change in ac04154 is critical for performance as otherwise there is very significant perf degradation with seccomp enabled. The only alternative is for users to use 5.16 kernel (which is not a long term release; only 5.15 is) or alternatively tune their kernel flags.

@bobbypage
Copy link

/cc @saschagrunert

@rata
Copy link
Member

rata commented Feb 22, 2023

@bobbypage I'll be fine with it. Although, I'd prefer if we can make a smaller change to just enable SPEC_ALLOW by default (that is what you need, right?) and have the rest of this patch land in runc 1.2. That can be potentially way simpler (i hope :)). I've not tried to see if it is easy to create a backport a simpler patch just for the SPEC_ALLOW by default case, though.

Any maintainers has thoughts on this?

@cyphar cyphar mentioned this pull request Mar 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants