Add `features.md` to formalize the `runc features` JSON #1130

AkihiroSuda · 2021-12-09T09:04:26Z

Add features.md and features-linux.md, to formalize the runc features JSON that was introduced in runc v1.1.0.

A runtime caller MAY use this JSON to detect the features implemented by the runtime.

The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go
(opencontainers/runc#3296, opencontainers/runc#3310)

Differences from runc v1.1.0

Add .linux.intelRdt.enabled field
Add .linux.cgroup.rdma field
Add .linux.seccomp.knownFlags and .linux.seccomp.supportedFlags fields (Implemented in seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default runc#3588)

flouthoc · 2021-12-09T09:32:40Z

features-linux.md

@@ -0,0 +1,183 @@
+# <a name="linuxFeatures" />Linux Features Document


Is there a missing feature flag for IntelRDT , some runtimes supports restricting bandwidth using that. Also runtime-spec has a definition for it.

Not sure maybe something like

"intelrdt": { "supported": true }

Added .linux.intelRdt.enabled field

flouthoc · 2021-12-09T09:37:44Z

features-linux.md

+
+## <a name="linuxFeaturesCgroup" />Cgroup
+
+**`cgroup`** (object, OPTIONAL) represents the runtime's implementation status of cgroup managers.


Should there be any field to list down which controllers are implemented by runtime for instance its not necessary that a given runtime must or must not implement support for a specific controller. Example RDMA is not implemented by all runtimes so entry for RDMA becomes no-op.

Again not sure about just adding it here.

It's hard to define the set of the controller names.

e.g., "freezer" is a real controller in v1 but not a real controller in v2. And the name of the block I/O controller differs across V1 ("blkio") and V2 ("io").

@AkihiroSuda hmm I agree and fair point but just a small doubt if there is a feature info, container manager should be able to delegate task to the right runtime.

A practical use case would be.

Container Manger supports delegating container generation to multiple runtimes but if spec requests setting a particular controller example RDMA it queries all supported runtime and delegate spec to the runtime which implements RDMA for every other spec it keeps using the default runtime.

How about using an array to specify which controllers are supported by a cgroup manager instead? This would solve that the controller names are different. Also not all controllers might be supported by all cgroup manager implementations.

"cgroup": { "v1": ["cpu", "cpuset", "blkio"], "v2": ["io", "memory", "devices"] "systemd": ["cpu", "memory", "cpuset"], "systemdUser": ["cpu", "memory", "cpuset"] }

In youki we already have the concept of a pseudo controller for devices, unified and freezer, so in my opinion a name in that array must not always map to a real controller in all implementations.

Updated PR to add rdma property.

How about using an array to specify which controllers are supported by a cgroup manager instead?

v1 is dying and probably won't have new controllers, so I don't think we need to track v1 controller list (except rdma)

v2 may have new controllers, but probably we will support such new controllers via the unified struct without defining a new dedicated struct, so we don't need to track v2 controller list, either.

could we also add a "disabled": "true|false" to indicate whether the runtime supports disabling cgroups? crun does it with crun --cgroup-manager=disabled .... That is useful when running as a systemd service to reuse the existing cgroup.

flouthoc · 2021-12-09T09:44:01Z

features-linux.md

+
+```json
+"apparmor": {
+  "enabled": true


Not sure but should this be supported instead of enabled.

Suggested change

"enabled": true

"supported": true

The JSON is already implemented in runc (experimentally, though), so I'm reluctant to change the field names.

IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support".
An enterprise distributor may provide commercial "support" to subset of the "enabled" features.

Oh i see. This is a really small nit and i don't have a strong opinion on this. 😃

To me it just feels that apparmor is only enabled for a particular spec if spec defines apparmorProfile but by default its only supported by a runtime and not enabled for every spec.

Again i am thinking this too much and its not a big thing. 😅

I think enabled is okay, though I think that we shouldn't care what runc has developed at the moment -- runc features was only merged a day or two ago, so literally nobody is using at the moment (and we can change it until runc 1.2).

IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support". An enterprise distributor may provide commercial "support" to subset of the "enabled" features.

Perhaps "available" ? (IMO "enabled" feels like a configuration toggle where most/many of these will really be capabilities that are either usable or they aren't)

Ah, I guess https://github.com/opencontainers/runtime-spec/pull/1130/files#diff-445e6d0dc97b93a0f04d914ad7816114ddc175dc462c7eec7961921af17ff0d0R6-R7 is the context I was missing:

The features document is irrelevant to the actual availability of the features in the host operating system. Hence, the content of the feature document SHOULD be determined on the compilation time of the runtime, not on the execution time.

I redact my suggestion ("enabled" seems fine for what this describes), but then also wonder which specific problem this functionality is solving. 🙈

wonder which specific problem this functionality is solving.

Examples of specific problems solved in this PR:

A high-level engine can't be aware whether the low-level runtime supports seccomp, so a high-level engine can't avoid "seccomp not supported" error unless the user supplies a custom configuration.

A high-level engine can't be aware whether the low-level runtime supports rro mounts, so a high-level engine may accidentally have non-RRO mounts against the user's will. This may result in a severe vulnerability.

BTW we can have availability property that corresponds to actual availability of the feature on the host OS/hardware, in addition to the enabled property proposed in this PR, but that will be another PR.

As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having availability detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.

BTW we can have availability property that corresponds to actual availability of the feature on the host OS/hardware, in addition to the enabled property proposed in this PR, but that will be another PR.

As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having availability detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.

My original doubt was to replace enabled with supported or availability I don't think adding extra key just for availability would be good idea. So I agree with @AkihiroSuda we should not need extra key for this.

+1 for keeping enabled: true

features-linux.md

AkihiroSuda · 2021-12-16T11:39:59Z

Updated PR.

Differences from runc v1.1.0-rc.1

Add .linux.intelRdt.enabled field
Add .linux.cgroup.rdma field

utam0k · 2021-12-21T06:03:13Z

features.md

+A [runtime](glossary.md#runtime) MAY provide a JSON document about its implemented features to [runtime callers](glossary.md#runtime-caller).
+This JSON document is called ["features document"](glossary.md#features-document).
+
+The features document is irrelevant to the actual availability of the features in the host operating system.


Is it the role of the caller of the oci container runtime to ensure that it is actually available on the actual host? I felt it would be nice to check if the feature is even available on the host. What do you think?

Is it the role of the caller of the oci container runtime to ensure that it is actually available on the actual host?

Yes

I felt it would be nice to check if the feature is even available on the host. What do you think?

Out of the scope of this PR (and I don't see necessity)
#1130 (comment)

AkihiroSuda · 2022-01-12T05:50:36Z

@giuseppe Could you take a look from crun's perspective?

giuseppe · 2022-01-12T08:37:19Z

@giuseppe Could you take a look from crun's perspective?

from crun's perspective, it looks good.

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

AkihiroSuda · 2022-01-12T08:44:48Z

What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the --log-format option used by runc and crun), but that is not part of the OCI specs, so I am not sure if it could be added here?

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

giuseppe · 2022-01-12T10:14:26Z

That should be probably a custom Annotations field like "org.opencontainers.runc.log-formats":"[\"text\",\"json\"]".

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

giuseppe · 2022-01-12T10:15:34Z

the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for --log-driver and what options are available because you need to know the field in advance.

or are you suggesting that other OCI runtimes would use the same "org.opencontainers.runc.log-formats" key?

AkihiroSuda · 2022-01-12T10:23:09Z

@giuseppe

The runc CLI flags like --log-format are currently out of the scope of the OCI Specs, so they have to be defined as annotations under org.opencontainers.runc.* namespace.

Only runtimes with runc-compatible CLI should use these org.opencontainers.runc.* annotations.

We can add proper fields for log formats when the CLI behaviors get defined in the OCI Specs.

kolyshkin · 2022-01-28T22:57:00Z

This can potentially eliminate podman's containers.conf(5), parameters runtime_supports_*. Currently they have

# List of the OCI runtimes that support --format=json.  When json is supported
# engine will use it for reporting nicer errors.
#
#runtime_supports_json = ["crun", "runc", "kata", "runsc", "krun"]

# List of the OCI runtimes that supports running containers with KVM Separation.
#
#runtime_supports_kvm = ["kata", "krun"]

# List of the OCI runtimes that supports running containers without cgroups.
#
#runtime_supports_nocgroups = ["crun", "krun"]

As pointed out by @giuseppe above, runc (and crun) can add an annotation telling that --log-format is supported.

Something similar could be done with other properties from the above config.

AkihiroSuda · 2022-04-01T07:14:25Z

@opencontainers/runtime-spec-maintainers

Could you take a look?
Is there any remaining task?

AkihiroSuda · 2023-02-01T01:55:18Z

Rebased.

tianon · 2023-02-01T05:42:29Z

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

AkihiroSuda · 2023-02-01T06:01:28Z

Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache?

I'd suggest invoking this on every container creation, but the implementation can opt-in to cache it too.

tianon · 2023-02-01T06:15:10Z

Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment))

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. 🙈

AkihiroSuda · 2023-02-01T06:39:12Z

Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment))

"Check the error message and disable the unsupported feature" would work fine for just testing a single feature, but doesn't scale for checking multiple features.

Also, how do we propagate the machine-readable error message to the caller? Not sure we can just jsonify the stderr.

Wouldn't it be better to update the requirements of something like Moby to a higher version of runc that does support rro and then just implement them, especially since that's a potential security issue? I'm sorry, but I still feel like I'm missing something here. 🙈

Checking the version number is fine if the runtime caller only supports runc, but doesn't work well for supporting crun, youki, etc.

tianon · 2023-02-01T06:46:59Z

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

AkihiroSuda · 2023-02-01T09:46:31Z

Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience?

Moby, containerd, etc. already have this sort of inconsistency; e.g., they conditionally enable Apparmor depending on its availability on the host.
Same is highly likely to happen for other LSMs in future, such as Landlocks, when they are supported by the kernel and the OCI runtime.

Also, these high-level container engines may potentially have a feature to automatically translate ro mounts to rro to harden the "readonly" mounts, when it is supported by the kernel and the OCI runtime.

AkihiroSuda · 2023-02-01T10:27:23Z

The features should be also useful for calling VM-based OCI runtimes. https://github.com/opencontainers/runtime-spec/blob/v1.1.0-rc.1/config-vm.md
The high-level container engine may conditionally choose the VM disk image file, depending on the disk formats (raw, qcow2, etc.) and the firmwares (BIOS, UEFI) supported by the OCI runtime.

(The current PR does not cover the VM-based runtimes though. Can be covered in a separate PR).

AkihiroSuda · 2023-02-09T07:39:03Z

This KEP may also depend on the runc features command:

KEP-3857: Recursive Read-only (RRO) mounts kubernetes/enhancements#3858 (comment)

AkihiroSuda · 2023-02-28T07:00:57Z

@opencontainers/runtime-spec-maintainers

We have 3 LGTMs. Can we merge this?
Also, let's release the next RC right after deciding to merge this in v1.1 or not.

tianon · 2023-03-07T00:07:40Z

In case this is blocked on my comments (:heart:): I still don't feel like I actually understand fully what the use case here is (and why they're worth potentially invoking runc features for every container startup), but clearly several other maintainers do and I don't feel strongly enough to block it. 🙇

AkihiroSuda · 2023-03-07T05:49:52Z

what the use case here

The runtime features are expected to be eventually propagated to Kubernetes node annotations
(in Kube's own format) so that kube-scheduler can avoid using nodes that lack the features required by the Pods.

e.g., when a Pod is launched with recursivelyReadOnly: Required (proposed in kubernetes/enhancements#3858), the scheduler should avoid using nodes that lacks support for rro mount types.

why they're worth potentially invoking runc features for every container startup

No need to invoke runc features for every container startup.
High-level engines can just cache it, but they may face an inconsistency issue when the cache is not invalidated on upgrading the runc binary.

Add `features.md` and `features-linux.md`, to formalize the `runc features` JSON that was introduced in runc v1.1.0. A runtime caller MAY use this JSON to detect the features implemented by the runtime. The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go (opencontainers/runc PR 3296, opencontainers/runc PR 3310) Differences since runc v1.1.0: - Add `.linux.intelRdt.enabled` field - Add `.linux.cgroup.rdma` field - Add `.linux.seccomp.knownFlags` and `.linux.seccomp.supportedFlags` fields (Implemented in runc PR 3588) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>

giuseppe

just a nit, otherwise LGTM

glossary.md

giuseppe

LGTM

giuseppe · 2023-06-22T20:16:29Z

features-linux.md

+  "archs": [
+    "SCMP_ARCH_AARCH64",
+    "SCMP_ARCH_ARM",
+    "SCMP_ARCH_MIPS",
+    "SCMP_ARCH_MIPS64",
+    "SCMP_ARCH_MIPS64N32",
+    "SCMP_ARCH_MIPSEL",
+    "SCMP_ARCH_MIPSEL64",
+    "SCMP_ARCH_MIPSEL64N32",
+    "SCMP_ARCH_PPC",
+    "SCMP_ARCH_PPC64",
+    "SCMP_ARCH_PPC64LE",
+    "SCMP_ARCH_S390",
+    "SCMP_ARCH_S390X",
+    "SCMP_ARCH_X32",
+    "SCMP_ARCH_X86",
+    "SCMP_ARCH_X86_64"
+  ],


how is the runtime supposed to query this list from libseccomp? Is it expected to hardcode the list?

hardcode

Yes: https://github.com/opencontainers/runc/blob/v1.1.7/libcontainer/seccomp/config.go#L54-L71
But I guess it is not difficult to add a new function to libseccomp.

You can also keep the list nil.

AkihiroSuda requested review from crosbymichael, cyphar, dqminh, giuseppe, hqhq, mrunalp, tianon and vbatts as code owners December 9, 2021 09:04

AkihiroSuda mentioned this pull request Dec 9, 2021

Add runc features command opencontainers/runc#3296

Merged

AkihiroSuda force-pushed the features branch 3 times, most recently from ad7f9e4 to 0b75bf9 Compare December 9, 2021 09:29

flouthoc reviewed Dec 9, 2021

View reviewed changes

features-linux.md Show resolved Hide resolved

AkihiroSuda mentioned this pull request Dec 11, 2021

CHANGELOG: add an in-repo changelog file opencontainers/runc#3320

Merged

AkihiroSuda force-pushed the features branch from 0b75bf9 to 98ea9db Compare December 16, 2021 11:09

This was referenced Dec 20, 2021

Proposal: define a "ALL_CAPS" pseudo-capability to grant all capabilities #1071

Open

libct/capabilities: create capabilities map based on current environment opencontainers/runc#3240

Open

utam0k reviewed Jan 3, 2022

View reviewed changes

utam0k mentioned this pull request Apr 3, 2022

Support for feature subcommand youki-dev/youki#815

Open

AkihiroSuda force-pushed the features branch from 294269e to 40ac03c Compare February 1, 2023 01:54

AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023

AkihiroSuda mentioned this pull request Feb 10, 2023

config: clarify Linux mount options #1181

Merged

utam0k mentioned this pull request Mar 7, 2023

[Do Not Merge]: Track runc missing options and commands youki-dev/youki#1613

Closed

AkihiroSuda mentioned this pull request Mar 17, 2023

Prepare / Tag v1.1.0 release #1052

Closed

giuseppe approved these changes Mar 22, 2023

View reviewed changes

glossary.md Outdated Show resolved Hide resolved

giuseppe mentioned this pull request Mar 22, 2023

implement "crun features" containers/crun#1177

Closed

AkihiroSuda force-pushed the features branch from 40ac03c to 689874f Compare March 22, 2023 08:34

AkihiroSuda requested a review from giuseppe March 24, 2023 12:56

giuseppe approved these changes Mar 27, 2023

View reviewed changes

giuseppe merged commit dbab843 into opencontainers:main Mar 27, 2023

AkihiroSuda mentioned this pull request Mar 28, 2023

Release v1.1.0-rc.2 #1192

Merged

giuseppe reviewed Jun 22, 2023

View reviewed changes

AkihiroSuda mentioned this pull request Jun 26, 2023

Release v1.1.0 #1213

Merged

12 tasks

thaJeztah mentioned this pull request Sep 16, 2023

nsexec: cloned binary rework opencontainers/runc#3987

Merged

cyphar mentioned this pull request Sep 17, 2023

[rfc] how best to reduce runc's effects on cgroup memory limits opencontainers/runc#4021

Open

thaJeztah mentioned this pull request Sep 29, 2023

config-linux: add schemata field to IntelRdt #1230

Open

		@@ -0,0 +1,183 @@
		# <a name="linuxFeatures" />Linux Features Document


		## <a name="linuxFeaturesCgroup" />Cgroup

		`cgroup` (object, OPTIONAL) represents the runtime's implementation status of cgroup managers.

Add features.md to formalize the runc features JSON #1130

Add features.md to formalize the runc features JSON #1130

Conversation

AkihiroSuda commented Dec 9, 2021 • edited Loading

Differences from runc v1.1.0

flouthoc Dec 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flouthoc Dec 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda Dec 16, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented Dec 16, 2021

Differences from runc v1.1.0-rc.1

utam0k Dec 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AkihiroSuda commented Jan 12, 2022

giuseppe commented Jan 12, 2022

AkihiroSuda commented Jan 12, 2022

giuseppe commented Jan 12, 2022

giuseppe commented Jan 12, 2022

AkihiroSuda commented Jan 12, 2022

kolyshkin commented Jan 28, 2022

AkihiroSuda commented Apr 1, 2022

AkihiroSuda commented Feb 1, 2023

tianon commented Feb 1, 2023

AkihiroSuda commented Feb 1, 2023

tianon commented Feb 1, 2023

AkihiroSuda commented Feb 1, 2023 • edited Loading

tianon commented Feb 1, 2023

AkihiroSuda commented Feb 1, 2023

AkihiroSuda commented Feb 1, 2023

AkihiroSuda commented Feb 9, 2023

AkihiroSuda commented Feb 28, 2023

tianon commented Mar 7, 2023

AkihiroSuda commented Mar 7, 2023

giuseppe left a comment

Choose a reason for hiding this comment

giuseppe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Add `features.md` to formalize the `runc features` JSON #1130

Add `features.md` to formalize the `runc features` JSON #1130

AkihiroSuda commented Dec 9, 2021 •

edited

Loading

flouthoc Dec 9, 2021 •

edited

Loading

flouthoc Dec 9, 2021 •

edited

Loading

AkihiroSuda Dec 16, 2021 •

edited

Loading

utam0k Dec 21, 2021 •

edited

Loading

AkihiroSuda commented Feb 1, 2023 •

edited

Loading