-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conformance tests to validate OCI devices are disallowed #2973
Comments
/assign @dgerd |
Looking at how this works today, Kubernetes does not support passing Devices through directly to the OCI interface ( see kubernetes/kubernetes#5607 ). There is support for specifying devices using Device Plugins ( https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/#examples ), but these seem to be enabled on the kubelet and not on the Pod. Device Plugins have custom ResourceNames that can be requested in the resource block similar to cpu and memory. However, looking at examples, when resources are not requested they still may be made available to the container (https://github.com/NVIDIA/k8s-device-plugin#running-gpu-jobs). This results in the following questions from the statement in the runtime contract:
My current understanding of the problem and the work to meet the requirements is:
@evankanderson Any thoughts here? https://github.com/opencontainers/runtime-spec/blob/master/config-linux.md#default-devices |
Hmm, I'm wondering whether we should change that MUST to a SHOULD (thinking specifically about the GPU accelerator case). In any case, I think the developer "MUST" indicates that containers which violate those expectations may experience undefined behavior (aka "nasal demons"). |
The problem I have with nasal demons on MUST/SHOULD NOT statements is that they become extremely hard to test in the context of conformance. If an instance of Knative allows this behavior is it still conformant? My assumption is yes it is, but there is no useful test that I can come up with here as allowing and disallowing both seem valid. If we believe that GPUs are an important use-case to cover then I am not sure that adding a test or a webhook validation is the right step forward, but rather I think this issue should be to remove the statement from the runtime contract or move the statement to a new document that contains developer best practices. |
I'm not sure how to capture this in a testable way, but the intent of the developer requirements is to place "outer bounds" on what conformance should test. I.e. a conformance test could validate the following devices per the OCI spec: /dev/
It would be unreasonable for a test to attempt to run CUDA code, as a conformant implementation might not expose the appropriate device (or even have access to the appropriate nvidia hardware). I'd certainly be open to rephrasing this area and other areas of the specification if you have a suggestion about how to make this testable. |
Looking at your example it seems like we could turn the statement around to express requirements of the platform rather than requirements of the developer. Statements about the platform seem easier to grok and programmatically verify in the context of conformance. Is there a difference in your mind between the two statements below given the context above: OCI "Default" Devices MUST be provided and Developers SHOULD NOT use OCI devices to request additional devices beyond the OCI specification "Default Devices". Given that we will not actively block or limit device specification in our API, both statements to me leave the requirement of exposing additional OCI devices up to the platform provider. |
I'd change the second to also be a platform provider constraint.
Operators and platform providers MUST ensure that the OCI linux "Default
Devices" are available to the container. Additional device nodes beyond the
default are OPTIONAL and SHOULD be documented by the platform provider.
…On Fri, Mar 15, 2019 at 12:50 PM Dan Gerdesmeier ***@***.***> wrote:
Looking at your example it seems like we could turn the statement around
to express requirements of the platform rather than requirements of the
developer. Statements about the platform seem easier to grok and
programmatically verify in the context of conformance. Is there a
difference in your mind between the two statements below given the context
above:
OCI "Default" Devices MUST be provided
and
Developers SHOULD NOT use OCI devices to request additional devices beyond
the OCI specification "Default Devices".
Given that we will not actively block or limit device specification in our
API, both statements to me leave the requirement of exposing additional OCI
devices up to the platform provider.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2973 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHlyN3b2OHQtQ-c6JegyUJfUQHJucZ9Uks5vW_migaJpZM4aPUJX>
.
--
Evan Anderson <argent@google.com>
|
This change makes numerous cleanups to the runtime contract in an attempt to improve the readability of the document and make the document more useful for the intended auidence. * Moves developer facing statements to a new `runtime-user-guide`. Focuses `runtime-contract` on operator/platform-provider. * Add links to Conformance tests that test Runtime Contract statements. * Corrects, updates, or removes statements to more accurately represent today's Knative runtime. * Updates to informative or removes most untestable statements * Copies in important OCI runtime requirements we previously referenced * Removes reference to OCI specification that didn't bring new requirements. Ref: knative#2539, knative#2973, knative#4014, knative#4027
#4035 updates this to the wording suggested by Evan. We now need to implement the tests that check the presence of the default devices. |
This adds checks for the default OCI devices to our conformance test for filesystem validation. This test also refactors where the file paths to check are located to reduce the number of transformations and simplify adding additional paths. Fixes knative#2973
* Add additional filesystem checks for OCI devices This adds checks for the default OCI devices to our conformance test for filesystem validation. This test also refactors where the file paths to check are located to reduce the number of transformations and simplify adding additional paths. Fixes #2973 * Fix comments * Code review comments
* Add additional filesystem checks for OCI devices This adds checks for the default OCI devices to our conformance test for filesystem validation. This test also refactors where the file paths to check are located to reduce the number of transformations and simplify adding additional paths. Fixes knative#2973 * Fix comments * Code review comments
Expected Behavior
Only default OCI devices are allowed by the knative runtime contract. We should error when a non-default device is requested.
https://github.com/knative/serving/blob/master/docs/runtime-contract.md#devices
Actual Behavior
We need a conformance test to validate this behavior.
The text was updated successfully, but these errors were encountered: