Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CDI #77

Merged
merged 1 commit into from
Sep 26, 2023
Merged

Add support for CDI #77

merged 1 commit into from
Sep 26, 2023

Conversation

almaslennikov
Copy link
Contributor

@almaslennikov almaslennikov force-pushed the cdi-support branch 3 times, most recently from 5c02bd9 to dad2e4e Compare August 4, 2023 09:35
@almaslennikov
Copy link
Contributor Author

@abdallahyas We don't have any copyrights in the repo, should we copyright all files or drop the check?

@abdallahyas
Copy link
Contributor

@abdallahyas We don't have any copyrights in the repo, should we copyright all files or drop the check?

@almaslennikov we can not drop because it is the company policy, so we can copy the copy right message from any other project and add it to only the newly added files. The CI will only check newly added files, and would ignore the modified files

cmd/k8s-rdma-shared-dp/main.go Outdated Show resolved Hide resolved
pkg/cdi/cdi.go Outdated Show resolved Hide resolved
@almaslennikov
Copy link
Contributor Author

@e0ne addressed your comments and rebased the PR, PTAL

Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks for addressing my comments

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @almaslennikov appologies for the delayed review !

go.mod Show resolved Hide resolved
images/cdi/k8s-rdma-shared-dev-plugin-cdi-ds.yaml Outdated Show resolved Hide resolved
pkg/cdi/cdi.go Show resolved Hide resolved
pkg/cdi/cdi.go Show resolved Hide resolved
pkg/cdi/cdi.go Outdated Show resolved Hide resolved
pkg/resources/server.go Outdated Show resolved Hide resolved
pkg/resources/server.go Show resolved Hide resolved
pkg/resources/server.go Show resolved Hide resolved
pkg/resources/server.go Outdated Show resolved Hide resolved
pkg/resources/server_test.go Show resolved Hide resolved
@adrianchiris
Copy link
Collaborator

i have deployed the changes, something is off with the cdi spec, expect further feedback :)

@adrianchiris
Copy link
Collaborator

adrianchiris commented Aug 20, 2023

I have deployed device plugin in cdi mode.

the cdi specification generated is as follows:

cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
    deviceNodes:
    - hostPath: /dev/infiniband/issm0
      path: /dev/infiniband/issm0
      permissions: rw
    - hostPath: /dev/infiniband/umad0
      path: /dev/infiniband/umad0
      permissions: rw
    - hostPath: /dev/infiniband/uverbs0
      path: /dev/infiniband/uverbs0
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
    - hostPath: /dev/infiniband/issm1
      path: /dev/infiniband/issm1
      permissions: rw
    - hostPath: /dev/infiniband/umad1
      path: /dev/infiniband/umad1
      permissions: rw
    - hostPath: /dev/infiniband/uverbs1
      path: /dev/infiniband/uverbs1
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
  name: "0"
- containerEdits:
    deviceNodes:
    - hostPath: /dev/infiniband/issm0
      path: /dev/infiniband/issm0
      permissions: rw
    - hostPath: /dev/infiniband/umad0
      path: /dev/infiniband/umad0
      permissions: rw
    - hostPath: /dev/infiniband/uverbs0
      path: /dev/infiniband/uverbs0
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
    - hostPath: /dev/infiniband/issm1
      path: /dev/infiniband/issm1
      permissions: rw
    - hostPath: /dev/infiniband/umad1
      path: /dev/infiniband/umad1
      permissions: rw
    - hostPath: /dev/infiniband/uverbs1
      path: /dev/infiniband/uverbs1
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
  name: "1"
kind: rdma/net-rdma

the list of devices goes on up until rdmaHcaMax num is reached.

A couple of comments on this one:

  1. the device kind is composed of vendor prefix and device class. in our case we should use nvidia.com as vendor prefix.
    later on it may be configurable if desired. currently, resource prefix is used as vendor and the default is rdma. this does not really fit in CDI kind.

  2. device class used in this PR net-rdma fits well.

  3. devices here are eventually PCI devices. so i think CDI device spec should reflect the actual device, its name should be the PCI device address. having duplicate devices (notice also there is duplicate mounts per device) here does not make much sense since eventually we can provide the same CDI device name in multiple device plugin Allocate calls. that means, we would a CDI device per PCI device selected. on Allocate call we may return either one or multiple annotations pointing to one or several CDI devices depending on the actual devices that are part of the pool.

  4. current code does not support more than one resource pool since you are writing to the same file /var/run/cdi/rdma.yaml
    filename can be a combination of resource prefix + resource pool name. this will ensure uniqueness on the node as prefix+pool should be unique and a device may only be part of a single pool.

example of a cdi spec file as i envision it:

a resource pool with 2 devices:

cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
    deviceNodes:
    - hostPath: /dev/infiniband/issm0
      path: /dev/infiniband/issm0
      permissions: rw
    - hostPath: /dev/infiniband/umad0
      path: /dev/infiniband/umad0
      permissions: rw
    - hostPath: /dev/infiniband/uverbs0
      path: /dev/infiniband/uverbs0
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
  name: "0000:03:00.0"
- containerEdits:
    deviceNodes:
    - hostPath: /dev/infiniband/issm1
      path: /dev/infiniband/issm1
      permissions: rw
    - hostPath: /dev/infiniband/umad1
      path: /dev/infiniband/umad1
      permissions: rw
    - hostPath: /dev/infiniband/uverbs1
      path: /dev/infiniband/uverbs1
      permissions: rw
    - hostPath: /dev/infiniband/rdma_cm
      path: /dev/infiniband/rdma_cm
      permissions: rw
  name: "0000:03:00.1"
kind: nvidia.com/net-rdma

@almaslennikov almaslennikov force-pushed the cdi-support branch 3 times, most recently from 1e36e9a to 2a25547 Compare September 1, 2023 19:00
pkg/cdi/cdi.go Outdated Show resolved Hide resolved
pkg/resources/resources_manager.go Show resolved Hide resolved
@almaslennikov almaslennikov force-pushed the cdi-support branch 3 times, most recently from e37dc5f to c19a164 Compare September 19, 2023 07:16
deployment/k8s/overlay/cdi.yaml Outdated Show resolved Hide resolved
https://github.com/container-orchestrated-devices/container-device-interface

Add copyrights to all edited files
Switch to kustomize for deployment artifacts
Update Readme

Signed-off-by: amaslennikov <amaslennikov@nvidia.com>
@e0ne e0ne merged commit fe7f371 into Mellanox:master Sep 26, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants