Skip to content

Latest commit

 

History

History
1335 lines (1043 loc) · 56.9 KB

File metadata and controls

1335 lines (1043 loc) · 56.9 KB

KEP-4639: OCI VolumeSource

Release Signoff Checklist

Items marked with (R) are required prior to targeting to a milestone / release.

  • (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
  • (R) KEP approvers have approved the KEP status as implementable
  • (R) Design details are appropriately documented
  • (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
    • e2e Tests for all Beta API Operations (endpoints)
    • (R) Ensure GA e2e tests meet requirements for Conformance Tests
    • (R) Minimum Two Week Window for GA e2e tests to prove flake free
  • (R) Graduation criteria is in place
  • (R) Production readiness review completed
  • (R) Production readiness review approved
  • "Implementation History" section is up-to-date for milestone
  • User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
  • Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes

Summary

The proposed enhancement adds a new VolumeSource to Kubernetes that supports OCI images and/or OCI artifacts. This allows users to package files and share them among containers in a pod without including them in the main image, thereby reducing vulnerabilities and simplifying image creation.

While OCI images are well-supported by Kubernetes and CRI, extending support to OCI artifacts involves recognizing additional media types within container runtimes, implementing custom lifecycle management, resolution of artifact registry referrers use pattern for artifacts, and ensuring appropriate validation and security measures.

Motivation

Supporting OCI images and artifacts directly as a VolumeSource allows Kubernetes to focus on OCI standards as well as allows to store and distribute any content using OCI registries. This allows the project to grow into use cases which go beyond running particular images.

Goals

  • Introduce a new VolumeSource type that allows mounting OCI images and/or artifacts.
  • Simplify the process of sharing files among containers in a pod.
  • Providing a runtime guideline of how artifact files and directories should be mounted.

Non-Goals

  • This proposal does not aim to replace existing VolumeSource types.
  • This proposal does not address other use cases for OCI objects beyond directory sharing among containers in a pod.
  • Mounting thousands of images and artifacts in a single pod.
  • The enhancement leaves single file use case out for now and restrict the mount output to directories.
  • The runtimes (CRI-O, containerd, others) will have to agree on the implementation of how artifacts are manifested as directories. We don't want to over-spec on selecting based on media types or other attributes now and can consider that for later.
    • We don't want to tie too strongly to how models are hosted on a particular provider so we are flexible to adapt to different ways models and their configurations are stored.
    • If some file, say a VM format such as a qcow file, is stored as an artifact, we don't want the runtime to be the entity responsible for interpreting and correctly processing it to its final consumable state. That could be delegated to the consumer or perhaps to some hooks and is out of scope for alpha.
  • Manifest list use cases are left out for now and will be restricted to matching architecture like we do today for images. In the future (if there are use cases) we will consider support for lists with items separated by quantization or format or other attributes. However, that is out of scope for now as it is easily worked around by creating a different image/artifact for each instance/format/quantization of a model.

Proposal

We propose to add a new VolumeSource that supports OCI images and/or artifacts. This VolumeSource will allow users to mount an OCI object directly into a pod, making the files within the image accessible to the containers without the need to include them in the main image and to be able to host them in OCI compatible registries.

User Stories (Optional)

Story 1

As a Kubernetes user, I want to share a configuration file among multiple containers in a pod without including the file in my main image, so that I can minimize security risks and image size.

Beside that, I want:

  • to package this file in an OCI object to take advantage of OCI distribution.
  • the image to be downloaded with the same credentials that kubelet using for other images.
  • to be able to use image pull secrets when downloading the image if an image is from the registry that requires image pull secrets.
  • to be able to update the configuration if the artifact is referenced by a moving tag like latest. To achieve that, I just have to restart the pod and specify a pullPolicy of Always.

Story 2

As a DevOps engineer, I want to package and distribute binary artifacts using OCI image and distribution specification standards and mount them directly into my Kubernetes pods, so that I can streamline my CI/CD pipeline. This allows me to maintain a small set of base images by attaching the CI/CD artifacts to them. Beside that, I want to package the artifacts in an OCI object to take advantage of OCI distribution.

Story 3

As a data scientist, MLOps engineer, or AI developer, I want to mount large language model weights or machine learning model weights in a pod alongside a model-server, so that I can efficiently serve them without including them in the model-server container image. I want to package these in an OCI object to take advantage of OCI distribution and ensure efficient model deployment. This allows to separate the model specifications/content from the executables that process them.

Story 4

As a security engineer, I want to use a public image for a malware scanner and mount in a volume of private (commercial) malware signatures, so that I can load those signatures without baking my own combined image (which might not be allowed by the copyright on the public image). Those files work regardless of the OS or version of the scanning software.

Notes/Constraints/Caveats (Optional)

  • This enhancement assumes that the cluster has access to the OCI registry.
  • The implementation must handle image pull secrets and other registry authentication mechanisms.
  • Performance considerations must be taken into account, especially for large images or artifacts.

Vocabulary: OCI Images, Artifacts, and Objects

OCI Image (spec):

  • A container image that conforms to the Open Container Initiative (OCI) Image Specification. It includes a filesystem bundle and metadata required to run a container.
  • Consists of multiple layers (each layer being a tarball), a manifest (which lists the layers), and a config file (which provides configuration data such as environment variables, entry points, etc.).
  • Use Case: Used primarily for packaging and distributing containerized applications.

OCI Artifact (guidance):

  • An artifact describes any content that is stored and distributed using the OCI image format. It includes not just container images but also other types of content like Helm charts, WASM modules, machine learning models, etc.
  • Artifacts use the same image manifest and layer structure but may contain different types of data within those layers. The artifact manifest can have media types that differ from those in standard container images.
  • Use Case: Allows the distribution of non-container content using the same infrastructure and tools developed for OCI images.

OCI Object:

  • Umbrella term encompassing both OCI images and OCI artifacts. It represents any object that conforms to the OCI specifications for storage and distribution and can be represented as file or filesystem by an OCI container runtime.

Risks and Mitigations

  • Security Risks::
    • Allowing direct mounting of OCI objects introduces potential attack vectors. Mitigation includes thorough security reviews and limiting access to trusted registries. Limiting to OCI artifacts (non-runnable content) and read-only mode will lessen the security risk.
    • Path traversal attacks are a high risk for introducing security vulnerabilities. Container Runtimes should re-use their existing implementations to merge layers as well as secure join symbolic links in the container storage prevent such issues.
  • Compatibility Risks: Existing webhooks watching for the images used by the pod using some policies will need to be updated to expect the image to be specified as a VolumeSource.
  • Performance Risks: Large images or artifacts could impact performance. Mitigation includes optimizations in the implementation and providing guidance on best practices for users.

Design Details

The new VolumeSource will be defined in the Kubernetes API, and the implementation will involve updating components (CRI, Kubelet) to support this source type. Key design aspects include:

  • API changes to introduce the new VolumeSource type.
  • Modifications to the Kubelet to handle mounting OCI images and artifacts.
  • Handling image pull secrets and registry authentication.
  • The regular OCI images (that are used to create container rootfs today) can be setup similarly as a directory and mounted as a volume source.
  • For OCI artifacts, we want to convert and represent them as a directory with files. A single file could also be nested inside a directory.

Kubernetes API

The following code snippet illustrates the proposed API change:

apiVersion: v1
kind: Pod
metadata:
  name: example-pod
spec:
  volumes:
  - name: oci-volume
    image:
      reference: "example.com/my-image:latest"
      pullPolicy: IfNotPresent
  containers:
  - name: my-container
    image: busybox
    volumeMounts:
    - mountPath: /data
      name: oci-volume

This means we extend the VolumeSource by:

// Represents the source of a volume to mount.
// Only one of its members may be specified.
type VolumeSource struct {
	// …

	// image …
	Image *ImageVolumeSource `json:"image,omitempty" protobuf:"bytes,30,opt,name=image"
}

And add the corresponding ImageVolumeSource type:

// ImageVolumeSource represents a image volume resource.
type ImageVolumeSource struct {
	// Required: Image or artifact reference to be used.
	// …
	Reference string `json:"reference" protobuf:"bytes,1,opt,name=reference"`

	// Policy for pulling OCI objects.
	// …
	PullPolicy PullPolicy `json:"pullPolicy,omitempty" protobuf:"bytes,2,opt,name=pullPolicy,casttype=PullPolicy"`
}

The same will apply to pkg/apis/core/types.VolumeSource, which is the internal API compared to the external one from staging. The API Validation validation will be extended to disallow the subPath/subPathExpr field as well as making the reference mandatory:

// …

if source.OCI != nil {
	if numVolumes > 0 {
		allErrs = append(allErrs, field.Forbidden(fldPath.Child("oci"), "may not specify more than 1 volume type"))
	} else {
		numVolumes++
		allErrs = append(allErrs, validateImageVolumeSource(source.OCI, fldPath.Child("oci"))...)
	}
}

// …
func validateImageVolumeSource(oci *core.ImageVolumeSource, fldPath *field.Path) field.ErrorList {
	allErrs := field.ErrorList{}
	if len(oci.Reference) == 0 {
		allErrs = append(allErrs, field.Required(fldPath.Child("reference"), ""))
	}
	allErrs = append(allErrs, validatePullPolicy(oci.PullPolicy, fldPath.Child("pullPolicy"))...)
	return allErrs
}
// …

// Disallow subPath/subPathExpr for image volumes
if v, ok := volumes[mnt.Name]; ok && v.OCI != nil {
	if mnt.SubPath != "" {
		allErrs = append(allErrs, field.Invalid(idxPath.Child("subPath"), mnt.SubPath, "not allowed in image volume sources"))
	}
	if mnt.SubPathExpr != "" {
		allErrs = append(allErrs, field.Invalid(idxPath.Child("subPathExpr"), mnt.SubPathExpr, "not allowed in image volume sources"))
	}
}

// …

Kubelet and Container Runtime Interface (CRI) support for OCI artifacts

Kubelet and the Container Runtime Interface (CRI) currently handle OCI images. To support OCI artifacts, potential enhancements may be required:

Extended Media Type Handling in the container runtime:

  • Update container runtimes to recognize and handle new media types associated with OCI artifacts.
  • Ensure that pulling and storing these artifacts is as efficient and secure as with OCI images.

Lifecycling and Garbage Collection:

  • Reuse the existing kubelet logic for managing the lifecycle of OCI objects.
  • Extending the existing image garbage collection to not remove an OCI volume image if a pod is still referencing it.

Artifact-Specific Configuration:

  • Introduce new configuration options to handle the unique requirements of different types of OCI artifacts.

Artifacts as Subject Referrers:

  • Introduce new refer to image and filter for artifact type criterion/options for to be mounted artifact(s).
  • Certain types of OCI artifacts include a subject reference. That reference identifies the artifact/image for which this artifact refers. For example a signature artifact could refer to a platform index, for certifying the platform images, or to an SBOM artifact that refers to a platform matched image. These artifacts may or may not be located on the same registry/repository. The new referrers API allows for discovering artifacts from a requested repository.
  • How Kubernetes and especially runtimes should support OCI referrers is not part of the alpha feature and will be considered in future graduations.

Validation:

  • Extend validation and security checks to cover new artifact types.
  • Disallow subPath/subPathExpr mounting through the API validation

Storage Optimization in the container runtime:

  • Develop optimized storage solutions tailored for different artifact types, potentially integrating with existing storage solutions or introducing new mechanisms.

kubelet

While the container runtime will be responsible of pulling and storing the OCI objects in the same way as for images, the kubelet still has to manage the full lifecycle of them. This means that some parts of the existing kubelet code can be reused, for example:

Pull Policy

While the imagePullPolicy is working on container level, the introduced pullPolicy is a pod level construct. This means that we can support the same values IfNotPresent, Always and Never, but will only pull once per pod.

Technically it means that we need to pull in SyncPod for OCI objects on a pod level and not for each container during EnsureImageExists before they get started.

If users want to re-pull artifacts when referencing moving tags like latest, then they need to restart / evict the pod.

The AlwaysPullImages admission plugin needs to respect the pull policy as well and has to set the field accordingly.

Registry authentication

For registry authentication purposes the same logic will be used as for the container image.

CRI

The CRI API is already capable of managing container images via the ImageService. Those RPCs will be re-used for managing OCI artifacts, while the Mount message will be extended to mount an OCI object using the existing ImageSpec on container creation:

// Mount specifies a host volume to mount into a container.
message Mount {
    // …

    // Mount an image reference (image ID, with or without digest), which is a
    // special use case for image volume mounts. If this field is set, then
    // host_path should be unset. All OCI mounts are per feature definition
    // readonly. The kubelet does an PullImage RPC and evaluates the returned
    // PullImageResponse.image_ref value, which is then set to the
    // ImageSpec.image field. Runtimes are expected to mount the image as
    // required.
    // Introduced in the OCI Volume Source KEP: https://kep.k8s.io/4639
    ImageSpec image = 9;
}

This allows to re-use the existing kubelet logic for managing the OCI objects, with the caveat that the new VolumeSource won't be isolated in a dedicated plugin as part of the existing volume manager.

Runtimes are already aware of the correct SELinux parameters during container creation and will re-use them for the OCI object mounts.

The kubelet will use the returned PullImageResponse.image_ref on pull and sets it to Mount.image.image together with the other fields for Mount.image. The runtime will then mount the OCI object directly on container creation assuming it's already present on disk. The runtime also manages the lifecycle of the mount, for example to remove the OCI bind mount on container removal as well as the object mount on the RemoveImage RPC.

The kubelet tracks the information about which OCI object is used by which sandbox and therefore manages the lifecycle of them for garbage collection purposes.

The overall flow for container creation will look like this:

sequenceDiagram
    participant K as kubelet
    participant C as Container Runtime
    Note left of K: During pod sync
    Note over K,C: CRI
    K->>+C: RPC: PullImage
    Note right of C: Pull OCI object
    C-->>-K: PullImageResponse.image_ref
    Note left of K: Add mount points<br/> to container<br/>creation request
    K->>+C: RPC: CreateContainer
    Note right of C: Mount OCI object
    Note right of C: Add OCI bind mounts<br/>from OCI object<br/>to container
    C-->>-K: CreateContainerResponse
Loading
  1. Kubelet Initiates Image Pull:

    • During pod setup, the kubelet initiates the pull for the OCI object based on the volume source.
  2. Runtime Handles Mounting:

    • The runtime returns the image reference information to the kubelet.
  3. Redirecting of the Mountpoint:

    • The kubelet uses the returned image reference to build the container creation request for each container using that mount.
    • The kubelet initiates the container creation and the runtime creates the required OCI object mount as well as bind mounts to the target location. This is the current implemented behavior for all other mounts and should require no actual container runtime code change.
  4. Lifecycle Management:

    • The container runtime manages the lifecycle of the mounts, ensuring they are created during pod setup and cleaned up upon sandbox removal.
  5. Tracking and Coordination:

    • During image garbage collection, the runtime provides the kubelet with the necessary mount information to ensure proper cleanup.
  6. SELinux Context Handling:

    • The runtime applies SELinux labels to the volume mounts based on the security context provided by the kubelet, ensuring consistent enforcement of security policies.
  7. Pull Policy Implementation:

    • The pullPolicy at the pod level will determine when the OCI object is pulled, with options for IfNotPresent, Always, and Never.
    • IfNotPresent: Prevents redundant pulls and uses existing images when available.
    • Always: Ensures the latest images are used, for example, with development and testing environments.
    • Never: Ensures only pre-pulled images are used, for example, in air-gapped or controlled environments.
  8. Security and Performance Optimization:

    • Implement thorough security checks to mitigate risks such as path traversal attacks.
    • Optimize performance for handling large OCI artifacts, including caching strategies and efficient retrieval methods.

Container Runtimes

Container runtimes need to support the new Mount.image field, otherwise the feature cannot be used. Pods using the new VolumeSource combined with a not supported container runtime version will fail to run on the node, because the Mount.host_path field is not set for those mounts.

For security reasons, volume mounts should set the [noexec] and ro (read-only) options by default.

Filesystem representation

Container Runtimes are expected to manage a mountpoint, which is a single directory containing the unpacked (in case of tarballs) and merged layer files from the image or artifact. If an OCI artifact has multiple layers (in the same way as for container images), then the runtime is expected to merge them together. Duplicate files from distinct layers will be overwritten from the higher indexed layer.

Runtimes are expected to be able to handle layers as tarballs (like they do for images right now) as well as plain single files. How the runtimes implement the expected output and which media types they want to support is deferred to them for now. Kubernetes only defines the expected output as a single directory containing the (unpacked) content.

Example using ORAS

Assuming the following directory structure:

./
├── dir/
│  └── file
└── file
$ cat dir/file
layer0

$ cat file
layer1

Then we can manually create two distinct layers by:

tar cfvz layer0.tar dir
tar cfvz layer1.tar file

We also need a config.json, ideally indicating the requested architecture:

jq --null-input '.architecture = "amd64" | .os = "linux"' > config.json

Now using ORAS to push the distinct layers:

oras push --config config.json:application/vnd.oci.image.config.v1+json \
    localhost:5000/image:v1 \
    layer0.tar:application/vnd.oci.image.layer.v1.tar+gzip \
    layer1.tar:application/vnd.oci.image.layer.v1.tar+gzip
✓ Uploaded  layer1.tar                                                                                                                               129/129  B 100.00%   73ms
  └─ sha256:0c26e9128651086bd9a417c7f0f3892e3542000e1f1fe509e8fcfb92caec96d5
✓ Uploaded  application/vnd.oci.image.config.v1+json                                                                                                   47/47  B 100.00%  126ms
  └─ sha256:4a2128b14c6c3699084cd60f24f80ae2c822f9bd799b24659f9691cbbfccae6b
✓ Uploaded  layer0.tar                                                                                                                               166/166  B 100.00%  132ms
  └─ sha256:43ceae9994ffc73acbbd123a47172196a52f7d1d118314556bac6c5622ea1304
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                                                                                               752/752  B 100.00%   40ms
  └─ sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9
Pushed [registry] localhost:5000/image:v1
ArtifactType: application/vnd.oci.image.config.v1+json
Digest: sha256:7728cb2fa5dc31ad8a1d05d4e4259d37c3fc72e1fbdc0e1555901687e34324e9

The resulting manifest looks like:

oras manifest fetch localhost:5000/image:v1 | jq .
{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:4a2128b14c6c3699084cd60f24f80ae2c822f9bd799b24659f9691cbbfccae6b",
    "size": 47
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:43ceae9994ffc73acbbd123a47172196a52f7d1d118314556bac6c5622ea1304",
      "size": 166,
      "annotations": {
        "org.opencontainers.image.title": "layer0.tar"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:0c26e9128651086bd9a417c7f0f3892e3542000e1f1fe509e8fcfb92caec96d5",
      "size": 129,
      "annotations": {
        "org.opencontainers.image.title": "layer1.tar"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2024-06-14T07:49:06Z"
  }
}

ORAS (and other tools) are also able to push multiple files or directories within a single layer. This should be supported by container runtimes in the same way.

SELinux

Traditionally, the container runtime is responsible of applying SELinux labels to volume mounts, which are inherited from the securityContext of the pod or container on container creation. The same will apply to OCI volume mounts.

Test Plan

[ ] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.

Prerequisite testing updates
Unit tests
  • <package>: <date> - <test coverage>
Integration tests
  • :
e2e tests
  • :

Graduation Criteria

Upgrade / Downgrade Strategy

Version Skew Strategy

Production Readiness Review Questionnaire

Feature Enablement and Rollback

How can this feature be enabled / disabled in a live cluster?
  • Feature gate (also fill in values in kep.yaml)
    • Feature gate name: ImageVolume
    • Components depending on the feature gate:
      • kube-apiserver (API validation)
      • kubelet (volume mount)
Does enabling the feature change any default behavior?

Yes, it makes the new VolumeSource API functional.

Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, by disabling the feature gate. Existing workloads will not be affected by the change.

To clear old volumes, all workloads using the VolumeSource needs to be recreated after restarting the kubelets. The kube-apiserver does only the API validation whereas the kubelets serve the implementation. This means means that a restart of the kubelet as well as the workload would be enough to disable the feature functionality.

What happens if we reenable the feature if it was previously rolled back?

It will make the API functional again. If the feature gets re-enabled only for a subset of kubelets and a user runs a scalable deployment or daemonset, then the volume source will be only available for some pod instances.

Are there any tests for feature enablement/disablement?

Yes, unit tests for the alpha release for each component. End-to-end (serial node) tests will be targeted for beta.

Rollout, Upgrade and Rollback Planning

How can a rollout or rollback fail? Can it impact already running workloads?
What specific metrics should inform a rollback?
Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Monitoring Requirements

How can an operator determine if the feature is in use by workloads?
How can someone using this feature know that it is working for their instance?
  • Events
    • Event Reason:
  • API .status
    • Condition name:
    • Other field:
  • Other (treat as last resort)
    • Details:
What are the reasonable SLOs (Service Level Objectives) for the enhancement?
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
  • Metrics
    • Metric name:
    • [Optional] Aggregation method:
    • Components exposing the metric:
  • Other (treat as last resort)
    • Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?

Dependencies

Does this feature depend on any specific services running in the cluster?

Scalability

Will enabling / using this feature result in any new API calls?
Will enabling / using this feature result in introducing new API types?
Will enabling / using this feature result in any new calls to the cloud provider?
Will enabling / using this feature result in increasing size or count of the existing API objects?
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?

Troubleshooting

How does this feature react if the API server and/or etcd is unavailable?
What are other known failure modes?
What steps should be taken if SLOs are not being met to determine the problem?

Implementation History

Drawbacks

Alternatives

No enhancement

Currently, a shared volume approach can be used. This involves packaging file to share in an image that includes a shell in its base layer. An init container can be used to copy files from an image to a shared volume using shell commands. This volume can be made accessible to all containers in the pod.

An OCI VolumeSource eliminates the need for a shell and an init container by allowing the direct mounting of OCI objects as volumes, making it easier to modularize. For example, in the case of LLMs and model-servers, it is useful to package them in separate images, so various models can plug into the same model-server image. An OCI VolumeSource not only simplifies file copying but also allows container native distribution, authentication, and version control for files.

The volume-populators API extension allows you to populate a volume with data from an external data source when the volume is created. This is a good solution for restoring a volume from a snapshot or initializing a volume with data from a database backup. However, it does not address the desire to use OCI distribution, versioning, and signing for mounted data.

The proposed in-tree OCI VolumeSource provides a direct and integrated approach to mount OCI artifacts, leveraging the existing OCI infrastructure for packaging, distribution, and security.

Custom CSI Plugin

See https://github.com/warm-metal/container-image-csi-driver

An out-of-tree CSI plugin can provide flexibility and modularity, but there are trade-offs to consider:

  • Complexity of managing an external CSI plugin. This includes handling the installation, configuration, and updates of the CSI driver, which adds an additional operational burden. For a generic, vendor-agnostic, and widely-adopted solution this would not make sense.
  • Supporting the image pull secrets as well as credentials provider will be tricky and needs to be reimplemented with the separate API calls.
  • External CSI plugins implement their own lifecycle management and garbage collection mechanisms, yet these already exist in-tree for OCI images.
    • The kubelet has max parallel image pull constant to maintain the reasonable load on a disk and network. This will not be respected by CSI driver and the only point of integration may be if we move this constant down to runtime.
    • The kubelet has GC logic that is not cleaning up images immediately in case they will be reused. Also GC logic has it's own thresholds and behavior on eviction. It will be nice to have those integrated.
    • The kubelet exposes metrics on image pulls and we have KEP in place to improve it even further. Having CSI exposing those metrics will require customer to integrate with one more source of data.
  • Performance: There is additional overhead with an out-of-tree CSI plugin, especially in scenarios requiring frequent image pulls or large volumes of data.

Advantages of In-Tree OCI VolumeSource

  1. Leverage Existing Mechanisms:

    • No New Data Types or Objects: OCI images are already a core part of the Kubernetes ecosystem. Extending support for OCI artifacts, many of the same mechanisms will be reused. This ensures consistency and reduces complexity, as both adhere to the same OCI image format.
    • Existing Lifecycle Management and Garbage Collection: Kubernetes has efficient lifecycle management and garbage collection mechanisms for volumes and container images. The in-tree OCI VolumeSource will utilize these existing mechanisms.
  2. Integration with Kubernetes:

    • Optimal Performance: Deep integration with the scheduler and kubelet ensures optimal performance and resource management. This integration allows the OCI VolumeSource to benefit from all existing optimizations and features.
    • Unified Interface: Users interact with a consistent and unified interface for managing volumes, reducing the learning curve and potential for configuration errors.
  3. Simplified Maintenance and Updates:

    • Core Project Maintenance: In-tree features are maintained and updated as part of the core project. It makes sense for widely-used and vendor agnostic features to utilize the core testing infrastructure, release cycles, and security updates.

Conclusion

The in-tree implementation of an OCI VolumeSource offers significant advantages by leveraging existing core mechanisms, ensuring deep integration, and simplifying management. This approach avoids the complexity, duplication, and other potential inefficiencies of out-of-tree CSI plugins, providing a more reliable solution for mounting OCI images and artifacts.