Skip to content

Commit

Permalink
Add support in Helm and docs
Browse files Browse the repository at this point in the history
Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
  • Loading branch information
ykulazhenkov committed Nov 1, 2023
1 parent 461668f commit 05bcf3e
Show file tree
Hide file tree
Showing 7 changed files with 120 additions and 41 deletions.
52 changes: 33 additions & 19 deletions deployment/network-operator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ $ helm install --set nfd.enabled=false -n network-operator --create-namespace --
> __Note:__ The labels which Network Operator depends on may change between releases.
> __Note:__ By default the operator is deployed without an instance of `NicClusterPolicy` and `MacvlanNetwork`
custom resources. The user is required to create it later with configuration matching the cluster or use chart parameters to deploy it together with the operator.
> custom resources. The user is required to create it later with configuration matching the cluster or use chart parameters to deploy it together with the operator.
#### Deploy development version of Network Operator

Expand Down Expand Up @@ -411,23 +411,37 @@ imagePullSecrets:

#### Mellanox OFED driver

| Name | Type | Default | Description |
| ---- | ---- | ------- |---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ofedDriver.deploy` | bool | `false` | deploy Mellanox OFED driver container |
| `ofedDriver.repository` | string | `mellanox` | Mellanox OFED driver image repository |
| `ofedDriver.image` | string | `mofed` | Mellanox OFED driver image name |
| `ofedDriver.version` | string | `5.9-0.5.6.0` | Mellanox OFED driver version |
| `ofedDriver.imagePullSecrets` | list | `[]` | An optional list of references to secrets to use for pulling any of the Mellanox OFED driver image |
| `ofedDriver.env` | list | `[]` | An optional list of [environment variables](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#envvar-v1-core) passed to the Mellanox OFED driver image |
| `ofedDriver.repoConfig.name` | string | `` | Private mirror repository configuration configMap name |
| `ofedDriver.certConfig.name` | string | `` | Custom TLS key/certificate configuration configMap name |
| `ofedDriver.terminationGracePeriodSeconds` | int | 300 | Mellanox OFED termination grace periods in seconds|
| `ofedDriver.startupProbe.initialDelaySeconds` | int | 10 | Mellanox OFED startup probe initial delay |
| `ofedDriver.startupProbe.periodSeconds` | int | 20 | Mellanox OFED startup probe interval |
| `ofedDriver.livenessProbe.initialDelaySeconds` | int | 30 | Mellanox OFED liveness probe initial delay |
| `ofedDriver.livenessProbe.periodSeconds` | int | 30 | Mellanox OFED liveness probe interval |
| `ofedDriver.readinessProbe.initialDelaySeconds` | int | 10 | Mellanox OFED readiness probe initial delay |
| `ofedDriver.readinessProbe.periodSeconds` | int | 30 | Mellanox OFED readiness probe interval |
| Name | Type | Default | Description |
|-------------------------------------------------------------|--------|-----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ofedDriver.deploy` | bool | `false` | deploy Mellanox OFED driver container |
| `ofedDriver.repository` | string | `mellanox` | Mellanox OFED driver image repository |
| `ofedDriver.image` | string | `mofed` | Mellanox OFED driver image name |
| `ofedDriver.version` | string | `5.9-0.5.6.0` | Mellanox OFED driver version |
| `ofedDriver.initContainer.enable` | bool | `true` | deploy init container |
| `ofedDriver.initContainer.repository` | string | `ghcr.io/mellanox` | init container image repository |
| `ofedDriver.initContainer.image` | string | `network-operator-init-container` | init container image name |
| `ofedDriver.initContainer.version` | string | `v0.0.1` | init container image version |
| `ofedDriver.imagePullSecrets` | list | `[]` | An optional list of references to secrets to use for pulling any of the Mellanox OFED driver image |
| `ofedDriver.env` | list | `[]` | An optional list of [environment variables](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#envvar-v1-core) passed to the Mellanox OFED driver image |
| `ofedDriver.repoConfig.name` | string | `` | Private mirror repository configuration configMap name |
| `ofedDriver.certConfig.name` | string | `` | Custom TLS key/certificate configuration configMap name |
| `ofedDriver.terminationGracePeriodSeconds` | int | 300 | Mellanox OFED termination grace periods in seconds |
| `ofedDriver.startupProbe.initialDelaySeconds` | int | 10 | Mellanox OFED startup probe initial delay |
| `ofedDriver.startupProbe.periodSeconds` | int | 20 | Mellanox OFED startup probe interval |
| `ofedDriver.livenessProbe.initialDelaySeconds` | int | 30 | Mellanox OFED liveness probe initial delay |
| `ofedDriver.livenessProbe.periodSeconds` | int | 30 | Mellanox OFED liveness probe interval |
| `ofedDriver.readinessProbe.initialDelaySeconds` | int | 10 | Mellanox OFED readiness probe initial delay |
| `ofedDriver.readinessProbe.periodSeconds` | int | 30 | Mellanox OFED readiness probe interval |
| `ofedDriver.upgradePolicy.autoUpgrade` | bool | `false` | global switch for automatic upgrade feature |
| `ofedDriver.upgradePolicy.maxParallelUpgrades` | int | 1 | how many nodes can be upgraded in parallel, 0 means no limit, all nodes will be upgraded in parallel |
| `ofedDriver.upgradePolicy.safeLoad` | bool | `false` | cordon and drain (if enabled) a node before loading the driver on it |
| `ofedDriver.upgradePolicy.drain.enable` | bool | `true` | drain a node before the driver restart |
| `ofedDriver.upgradePolicy.drain.force` | bool | `false` | use force drain (check `kubectl drain` doc for details) |
| `ofedDriver.upgradePolicy.drain.podSelector` | string | "" | drain only pods matching this selector |
| `ofedDriver.upgradePolicy.drain.timeoutSeconds` | int | 300 | timeout for drain operation |
| `ofedDriver.upgradePolicy.drain.deleteEmptyDir` | bool | `false` | continue even if there are pods using emptyDir |
| `ofedDriver.upgradePolicy.waitForCompletion.podSelector` | string | not set | specifies a label selector for the pods to wait for completion before starting the driver upgrade |
| `ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds` | int | not set | specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite |

#### RDMA Device Plugin

Expand Down Expand Up @@ -605,7 +619,7 @@ optionally deployed components:
| `nvIpam.enableWebhook` | bool | `false` | Enable deployment of the validataion webhook for IPPool CRD |

> __Note__: Supported X.509 certificate management system should be available in the cluster to enable the validation webhook.
Currently supported systems are [certmanager](https://cert-manager.io/) and
> Currently supported systems are [certmanager](https://cert-manager.io/) and
[Openshift certificate management](https://docs.openshift.com/container-platform/4.13/security/certificates/service-serving-certificate.html)


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,13 @@ spec:
image: {{ .Values.ofedDriver.image }}
repository: {{ .Values.ofedDriver.repository }}
version: {{ .Values.ofedDriver.version }}
{{- if .Values.ofedDriver.initContainer }}
initContainer:
enable: {{ .Values.ofedDriver.initContainer.enable }}
repository: {{ .Values.ofedDriver.initContainer.repository }}
image: {{ .Values.ofedDriver.initContainer.image }}
version: {{ .Values.ofedDriver.initContainer.version }}
{{- end }}
{{- if .Values.ofedDriver.env }}
env:
{{ toYaml .Values.ofedDriver.env | nindent 6 }}
Expand Down Expand Up @@ -61,12 +68,20 @@ spec:
upgradePolicy:
autoUpgrade: {{ .Values.ofedDriver.upgradePolicy.autoUpgrade | default false }}
maxParallelUpgrades: {{ .Values.ofedDriver.upgradePolicy.maxParallelUpgrades | default 0 }}
safeLoad: {{ .Values.ofedDriver.upgradePolicy.safeLoad | default false }}
{{- if .Values.ofedDriver.upgradePolicy.drain }}
drain:
enable: {{ .Values.ofedDriver.upgradePolicy.drain.enable | default true }}
force: {{ .Values.ofedDriver.upgradePolicy.drain.force | default false }}
podSelector: {{ .Values.ofedDriver.upgradePolicy.drain.podSelector | quote }}
timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.drain.timeoutSeconds }}
deleteEmptyDir: {{ .Values.ofedDriver.upgradePolicy.drain.deleteEmptyDir | default false}}
{{- end }}
{{- if .Values.ofedDriver.upgradePolicy.waitForCompletion }}
waitForCompletion:
podSelector: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.podSelector | default ""}}
timeoutSeconds: {{ .Values.ofedDriver.upgradePolicy.waitForCompletion.timeoutSeconds | default 0 }}
{{- end }}
{{- end }}
{{- end }}
{{- if .Values.rdmaSharedDevicePlugin.deploy }}
Expand Down
13 changes: 12 additions & 1 deletion deployment/network-operator/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,11 @@ ofedDriver:
image: mofed
repository: nvcr.io/nvstaging/mellanox
version: 23.10-0.4.1.0
initContainer:
enable: true
repository: ghcr.io/mellanox
image: network-operator-init-container
version: v0.0.1
# imagePullSecrets: []
# env, if defined will pass environment variables to the OFED container
# env:
Expand All @@ -180,7 +185,6 @@ ofedDriver:
# Custom ssl key/certificate configuration
certConfig:
name: ""

startupProbe:
initialDelaySeconds: 10
periodSeconds: 20
Expand All @@ -197,6 +201,8 @@ ofedDriver:
# how many nodes can be upgraded in parallel (default: 1)
# 0 means no limit, all nodes will be upgraded in parallel
maxParallelUpgrades: 1
# cordon and drain (if enabled) a node before loading the driver on it
safeLoad: false
# options for node drain (`kubectl drain`) before the driver reload
# if auto upgrade is enabled but drain.enable is false,
# then driver POD will be reloaded immediately without
Expand All @@ -208,6 +214,11 @@ ofedDriver:
# It's recommended to set a timeout to avoid infinite drain in case non-fatal error keeps happening on retries
timeoutSeconds: 300
deleteEmptyDir: false
waitForCompletion:
# specifies a label selector for the pods to wait for completion
# podSelector: "app=myapp"
# specify the length of time in seconds to wait before giving up for workload to finish, zero means infinite
# timeoutSeconds: 300

rdmaSharedDevicePlugin:
deploy: true
Expand Down
Loading

0 comments on commit 05bcf3e

Please sign in to comment.