Skip to content

Commit

Permalink
Add NodeFeatureRules to label Nvidia NICs
Browse files Browse the repository at this point in the history
This patch does the following:
 - Add "Nvidia NICs PCI" NFR to lable the node with `pci-15b3.present": "true"`
   if any of Nvidia NICs is exist
 - Add nfd.deployNodeFeatureRules flag and set it true by default to allow creating NFRs above
 - Remove the networking PCI class from the source configuration
 - Enhance READE documentation

Signed-off-by: Waleed Mousa <waleedm@nvidia.com>
  • Loading branch information
root authored and wmousa committed Jul 16, 2023
1 parent 802bb33 commit dec8fc4
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 11 deletions.
5 changes: 2 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,15 @@ NFD is used to label nodes with the following labels:
- RDMA capability
- GPU features*

>__NOTE__: We use nodeFeatureRules to label PCI vendor and device and it's enabled by default using `nfd.deployNodeFeatureRules` flag
__Example NFD worker configurations:__

```yaml
config:
sources:
pci:
deviceClassWhitelist:
- "02"
- "0200"
- "0207"
- "0300"
- "0302"
deviceLabelFields:
Expand Down
15 changes: 13 additions & 2 deletions deployment/network-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,9 +101,19 @@ $ kubectl -n network-operator get pods
By default the network operator
deploys [Node Feature Discovery (NFD)](https://github.com/kubernetes-sigs/node-feature-discovery)
in order to perform node labeling in the cluster to allow proper scheduling of Network Operator resources. If the nodes
where already labeled by other means, it is possible to disable the deployment of NFD by setting
`nfd.enabled=false` chart parameter.
where already labeled by other means (either deployed from master or deployed within another deployment), it is possible to disable the deployment of NFD by setting
`nfd.enabled=false` chart parameter and make sure that the installed version is `v0.13.2` or newer and has NodeFeatureApi enabled.

##### Deploy NFD from master with NodeFeatureApi enabled
```
$ export NFD_NS=node-feature-discovery
$ helm repo add nfd https://kubernetes-sigs.github.io/node-feature-discovery/charts
$ helm repo update
$ helm install nfd/node-feature-discovery --namespace $NFD_NS --create-namespace --generate-name --set enableNodeFeatureApi='true'
```
For additional information , refer to the official [NVD deployment with Helm](https://kubernetes-sigs.github.io/node-feature-discovery/v0.13/deployment/helm.html)

##### Deploy Network Operator without Node Feature Discovery
```
$ helm install --set nfd.enabled=false -n network-operator --create-namespace --wait network-operator mellanox/network-operator
```
Expand Down Expand Up @@ -335,6 +345,7 @@ parameters.
| Name | Type | Default | description |
|------------------------------------------------------|--------|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `nfd.enabled` | bool | `True` | deploy Node Feature Discovery |
| `nfd.deployNodeFeatureRules` | bool | `True` | deploy Node Feature Rules to label the nodes |
| `sriovNetworkOperator.enabled` | bool | `False` | deploy SR-IOV Network Operator |
| `upgradeCRDs` | bool | `True` | enable CRDs upgrade with helm pre-install and pre-upgrade hooks |
| `sriovNetworkOperator.configDaemonNodeSelectorExtra` | object | `{"node-role.kubernetes.io/worker": ""}` | Additional nodeSelector for sriov-network-operator config daemon. These values will be added in addition to default values managed by the network-operator. |
Expand Down
16 changes: 16 additions & 0 deletions deployment/network-operator/templates/nodefeaturerules.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{{- if .Values.nfd.deployNodeFeatureRules }}
apiVersion: nfd.k8s-sigs.io/v1alpha1
kind: NodeFeatureRule
metadata:
name: nvidia-nics-rules
spec:
rules:
- name: "Nvidia NICs PCI"
labels:
"pci-15b3.present": "true"
matchFeatures:
- feature: pci.device
matchExpressions:
vendor: {op: In, value: ["15b3"]}
class: {op: In, value: ["0200", "0207"]}
{{- end }}
4 changes: 1 addition & 3 deletions deployment/network-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

nfd:
enabled: true
deployNodeFeatureRules: true

psp:
enabled: false
Expand Down Expand Up @@ -52,9 +53,6 @@ node-feature-discovery:
sources:
pci:
deviceClassWhitelist:
- "02"
- "0200"
- "0207"
- "0300"
- "0302"
deviceLabelFields:
Expand Down
4 changes: 1 addition & 3 deletions hack/templates/values/values.template
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@

nfd:
enabled: true
deployNodeFeatureRules: true

psp:
enabled: false
Expand Down Expand Up @@ -52,9 +53,6 @@ node-feature-discovery:
sources:
pci:
deviceClassWhitelist:
- "02"
- "0200"
- "0207"
- "0300"
- "0302"
deviceLabelFields:
Expand Down

0 comments on commit dec8fc4

Please sign in to comment.