-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add features.md
to formalize the runc features
JSON
#1130
Conversation
ad7f9e4
to
0b75bf9
Compare
@@ -0,0 +1,183 @@ | |||
# <a name="linuxFeatures" />Linux Features Document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a missing feature flag for IntelRDT
, some runtimes supports restricting bandwidth using that. Also runtime-spec
has a definition for it.
Not sure maybe something like
"intelrdt": {
"supported": true
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added .linux.intelRdt.enabled
field
|
||
## <a name="linuxFeaturesCgroup" />Cgroup | ||
|
||
**`cgroup`** (object, OPTIONAL) represents the runtime's implementation status of cgroup managers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be any field to list down which controllers are implemented by runtime
for instance its not necessary that a given runtime must or must not implement support for a specific controller. Example RDMA
is not implemented by all runtimes so entry for RDMA
becomes no-op
.
Again not sure about just adding it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to define the set of the controller names.
e.g., "freezer" is a real controller in v1 but not a real controller in v2. And the name of the block I/O controller differs across V1 ("blkio") and V2 ("io").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AkihiroSuda hmm I agree and fair point but just a small doubt if there is a feature info, container manager
should be able to delegate task to the right runtime.
A practical use case would be.
Container Manger
supports delegating container generation to multiple runtimes but ifspec
requests setting a particularcontroller
exampleRDMA
it queries all supported runtime and delegatespec
to theruntime
which implementsRDMA
for every otherspec
it keeps using thedefault runtime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using an array to specify which controllers are supported by a cgroup manager instead? This would solve that the controller names are different. Also not all controllers might be supported by all cgroup manager implementations.
"cgroup": {
"v1": ["cpu", "cpuset", "blkio"],
"v2": ["io", "memory", "devices"]
"systemd": ["cpu", "memory", "cpuset"],
"systemdUser": ["cpu", "memory", "cpuset"]
}
In youki we already have the concept of a pseudo controller for devices, unified and freezer, so in my opinion a name in that array must not always map to a real controller in all implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated PR to add rdma
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using an array to specify which controllers are supported by a cgroup manager instead?
-
v1 is dying and probably won't have new controllers, so I don't think we need to track v1 controller list (except
rdma
) -
v2 may have new controllers, but probably we will support such new controllers via the
unified
struct without defining a new dedicated struct, so we don't need to track v2 controller list, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we also add a "disabled": "true|false"
to indicate whether the runtime supports disabling cgroups? crun does it with crun --cgroup-manager=disabled ...
. That is useful when running as a systemd service to reuse the existing cgroup.
|
||
```json | ||
"apparmor": { | ||
"enabled": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure but should this be supported
instead of enabled
.
"enabled": true | |
"supported": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON is already implemented in runc (experimentally, though), so I'm reluctant to change the field names.
IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support".
An enterprise distributor may provide commercial "support" to subset of the "enabled" features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i see. This is a really small nit
and i don't have a strong opinion on this. 😃
To me it just feels that apparmor
is only enabled for a particular spec
if spec
defines apparmorProfile
but by default its only supported by a runtime and not enabled for every spec
.
Again i am thinking this too much and its not a big thing. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enabled
is okay, though I think that we shouldn't care what runc has developed at the moment -- runc features
was only merged a day or two ago, so literally nobody is using at the moment (and we can change it until runc 1.2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, I avoided "supported" to avoid confusion with enterprise commercial "support". An enterprise distributor may provide commercial "support" to subset of the "enabled" features.
Perhaps "available" ? (IMO "enabled" feels like a configuration toggle where most/many of these will really be capabilities that are either usable or they aren't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I guess https://github.com/opencontainers/runtime-spec/pull/1130/files#diff-445e6d0dc97b93a0f04d914ad7816114ddc175dc462c7eec7961921af17ff0d0R6-R7 is the context I was missing:
The features document is irrelevant to the actual availability of the features in the host operating system. Hence, the content of the feature document SHOULD be determined on the compilation time of the runtime, not on the execution time.
I redact my suggestion ("enabled" seems fine for what this describes), but then also wonder which specific problem this functionality is solving. 🙈
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wonder which specific problem this functionality is solving.
Examples of specific problems solved in this PR:
- A high-level engine can't be aware whether the low-level runtime supports seccomp, so a high-level engine can't avoid "seccomp not supported" error unless the user supplies a custom configuration.
- A high-level engine can't be aware whether the low-level runtime supports
rro
mounts, so a high-level engine may accidentally have non-RRO mounts against the user's will. This may result in a severe vulnerability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we can have availability
property that corresponds to actual availability of the feature on the host OS/hardware, in addition to the enabled
property proposed in this PR, but that will be another PR.
As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having availability
detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW we can have
availability
property that corresponds to actual availability of the feature on the host OS/hardware, in addition to theenabled
property proposed in this PR, but that will be another PR.As a maintainer of containerd/Moby and its relevant projects, I don't see necessity of having
availability
detection in the low-level runtime, though, because high-level engine implementations are already capable of detecting such feature availability on the host OS/hardware.
My original doubt was to replace enabled
with supported
or availability
I don't think adding extra key just for availability
would be good idea. So I agree with @AkihiroSuda we should not need extra key for this.
+1 for keeping enabled: true
0b75bf9
to
98ea9db
Compare
Updated PR. Differences from runc v1.1.0-rc.1
|
A [runtime](glossary.md#runtime) MAY provide a JSON document about its implemented features to [runtime callers](glossary.md#runtime-caller). | ||
This JSON document is called ["features document"](glossary.md#features-document). | ||
|
||
The features document is irrelevant to the actual availability of the features in the host operating system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the role of the caller of the oci container runtime to ensure that it is actually available on the actual host? I felt it would be nice to check if the feature is even available on the host. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the role of the caller of the oci container runtime to ensure that it is actually available on the actual host?
Yes
I felt it would be nice to check if the feature is even available on the host. What do you think?
Out of the scope of this PR (and I don't see necessity)
#1130 (comment)
@giuseppe Could you take a look from crun's perspective? |
from crun's perspective, it looks good. What could also be useful for us, would have a way to know what log-driver is supported by a runtime (the |
That should be probably a custom |
the problem with custom annotations is that it won't really help to understand if a generic OCI runtime $FOO has support for |
or are you suggesting that other OCI runtimes would use the same |
The runc CLI flags like Only runtimes with runc-compatible CLI should use these We can add proper fields for log formats when the CLI behaviors get defined in the OCI Specs. |
This can potentially eliminate podman's containers.conf(5), parameters
As pointed out by @giuseppe above, runc (and crun) can add an annotation telling that --log-format is supported. Something similar could be done with other properties from the above config. |
@opencontainers/runtime-spec-maintainers Could you take a look? |
Rebased. |
Do you have an example of how you'd expect to see this used by a higher-level runtime like containerd or moby? Should they invoke this on every container creation? Once at their daemon startup and cache it for the lifetime? Periodically refresh their cache? |
I'd suggest invoking this on every container creation, but the implementation can opt-in to cache it too. |
Isn't that a bit heavy just for a slightly better/earlier error message? (#1130 (comment)) Wouldn't it be better to update the requirements of something like Moby to a higher version of |
"Check the error message and disable the unsupported feature" would work fine for just testing a single feature, but doesn't scale for checking multiple features. Also, how do we propagate the machine-readable error message to the caller? Not sure we can just jsonify the stderr.
Checking the version number is fine if the runtime caller only supports runc, but doesn't work well for supporting crun, youki, etc. |
Wouldn't most of these things be things that the user asked for explicitly, so erroring all the way up is the appropriate response? Do you have an example of a workflow where something like Moby or containerd would want to enable a feature, check this API, and then dynamically no longer want to enable that feature? Doesn't that create a bit of an unexpected/inconsistent user experience? |
Moby, containerd, etc. already have this sort of inconsistency; e.g., they conditionally enable Apparmor depending on its availability on the host. Also, these high-level container engines may potentially have a feature to automatically translate |
The (The current PR does not cover the VM-based runtimes though. Can be covered in a separate PR). |
This KEP may also depend on the |
@opencontainers/runtime-spec-maintainers We have 3 LGTMs. Can we merge this? |
In case this is blocked on my comments (:heart:): I still don't feel like I actually understand fully what the use case here is (and why they're worth potentially invoking |
The runtime features are expected to be eventually propagated to Kubernetes node annotations e.g., when a Pod is launched with
No need to invoke |
Add `features.md` and `features-linux.md`, to formalize the `runc features` JSON that was introduced in runc v1.1.0. A runtime caller MAY use this JSON to detect the features implemented by the runtime. The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go (opencontainers/runc PR 3296, opencontainers/runc PR 3310) Differences since runc v1.1.0: - Add `.linux.intelRdt.enabled` field - Add `.linux.cgroup.rdma` field - Add `.linux.seccomp.knownFlags` and `.linux.seccomp.supportedFlags` fields (Implemented in runc PR 3588) Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nit, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
"archs": [ | ||
"SCMP_ARCH_AARCH64", | ||
"SCMP_ARCH_ARM", | ||
"SCMP_ARCH_MIPS", | ||
"SCMP_ARCH_MIPS64", | ||
"SCMP_ARCH_MIPS64N32", | ||
"SCMP_ARCH_MIPSEL", | ||
"SCMP_ARCH_MIPSEL64", | ||
"SCMP_ARCH_MIPSEL64N32", | ||
"SCMP_ARCH_PPC", | ||
"SCMP_ARCH_PPC64", | ||
"SCMP_ARCH_PPC64LE", | ||
"SCMP_ARCH_S390", | ||
"SCMP_ARCH_S390X", | ||
"SCMP_ARCH_X32", | ||
"SCMP_ARCH_X86", | ||
"SCMP_ARCH_X86_64" | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the runtime supposed to query this list from libseccomp? Is it expected to hardcode the list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcode
Yes: https://github.com/opencontainers/runc/blob/v1.1.7/libcontainer/seccomp/config.go#L54-L71
But I guess it is not difficult to add a new function to libseccomp.
You can also keep the list nil.
Add
features.md
andfeatures-linux.md
, to formalize therunc features
JSON that was introduced in runc v1.1.0.A runtime caller MAY use this JSON to detect the features implemented by the runtime.
The spec corresponds to https://github.com/opencontainers/runc/blob/v1.1.0/types/features/features.go
(opencontainers/runc#3296, opencontainers/runc#3310)
Differences from runc v1.1.0
.linux.intelRdt.enabled
field.linux.cgroup.rdma
field.linux.seccomp.knownFlags
and.linux.seccomp.supportedFlags
fields (Implemented in seccomp: refactor flags support; add flags to features, set SPEC_ALLOW by default runc#3588)