Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network Operator deployment support in vanilla K8S air gap environment #375

Merged
merged 2 commits into from
Jul 7, 2022

Conversation

e0ne
Copy link
Collaborator

@e0ne e0ne commented Jul 3, 2022

No description provided.

@e0ne
Copy link
Collaborator Author

e0ne commented Jul 3, 2022

/retest-nic_operator_kind

Copy link
Collaborator

@ykulazhenkov ykulazhenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think overall logic is fine, I added few nit comments.

Q: do we need to support use-case when user updates content of configMap with repo or CA configuration?

api/v1alpha1/nicclusterpolicy_types.go Show resolved Hide resolved
pkg/state/state_ofed.go Outdated Show resolved Hide resolved
pkg/state/state_ofed.go Show resolved Hide resolved
pkg/state/state_ofed.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e0ne should we have under docs/ a document about disconnected env covering these configurations ? or do you think its better to just have it in the official docs ? as its not regarded as a "common" flow

for filename := range cm.Data {
filenames = append(filenames, filename)
}
// sort so volume mounts are added to spec in deterministic order
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you extend this comment to explain why this is needed ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

api/v1alpha1/nicclusterpolicy_types.go Show resolved Hide resolved
pkg/state/state_ofed.go Show resolved Hide resolved
@@ -339,6 +339,8 @@ imagePullSecrets:
| `ofedDriver.version` | string | `5.5-1.0.3.2` | Mellanox OFED driver version |
| `ofedDriver.imagePullSecrets` | list | `[]` | An optional list of references to secrets to use for pulling any of the Mellanox OFED driver image |
| `ofedDriver.env` | list | `[]` | An optional list of [environment variables](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/#envvar-v1-core) passed to the Mellanox OFED driver image |
| `ofedDriver.repoConfig.configMapName` | string | `` | Private mirror repository configuration configMap name |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i feel like these two should be named the same : either name or configMapName WDYT ?

need to align thoroughout the PR (helm, CRD, code)

Copy link
Collaborator

@ykulazhenkov ykulazhenkov Jul 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is to align with NVIDIA gpu-operator, they have such names for parameters

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these two parameters are very similar, and should be named the same for consistency.

we do not have a strict requirement to have the same api as gpu operator. where we can improve / provide a more consistent api , i think we should.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to 'name' in both places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@e0ne please also change here in the README

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -295,7 +295,7 @@ func (s *stateOFED) getManifestObjects(
if err != nil {
return nil, fmt.Errorf("ERROR: failed to create ConfigMap VolumeMounts for custom repo config: %v", err)
}
volume := s.createConfigMapVolume(cr.Spec.OFEDDriver.CertConfig.Name, itemsToInclude)
volume := s.createConfigMapVolume(cr.Spec.OFEDDriver.RepoConfig.ConfigMapName, itemsToInclude)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to be in previous commit

@adrianchiris
Copy link
Collaborator

adrianchiris commented Jul 5, 2022

do we need to support use-case when user updates content of configMap with repo or CA configuration

ATM this is not a defined use-case.
however, keeping in mind that these are used only during driver compilation (pre-step to install kernel-headers) then even if these change and driver is already compiled (or rather after kernel-header install phase) then it will have no effect.

in case driver container/node restarted, it would have new data (repo/creds)

@e0ne e0ne marked this pull request as draft July 5, 2022 12:04
These options are broken at the moment and was never used.

To pass environment variable to OFED driver you can use
ofedDriver.env parameters.

HTTP_PROXY/HTTPS_PROXY environment variabes are not confiurable
for operator and nv-peer-mem pod anymore.

Signed-off-by: Ivan Kolodiazhny <ikolodiazhny@nvidia.com>
@@ -405,6 +412,12 @@ spec:
- initialDelaySeconds
- periodSeconds
type: object
repoConfig:
description: 'Optional: Custom repo configuration for OFED container'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: here you use OFED container and above you use driver container. pick one :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -69,6 +81,10 @@ type OFEDDriverSpec struct {
Env []v1.EnvVar `json:"env,omitempty"`
// Ofed auto-upgrade settings
OfedUpgradePolicy *OfedUpgradePolicySpec `json:"upgradePolicy,omitempty"`
// Optional: Custom certificates configuration for driver container
CertConfig *DriverCertConfigSpec `json:"certConfig,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question.

can this also be used for package manager related certificates (e.g when signing repo packages with custom gpg key)

i guess not, so if later on we will need to support it, then we can add RepoCertConfig.

i think the description here should be updated to explicitly state that these are SSL certificates

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -55,6 +55,18 @@ type PodProbeSpec struct {
PeriodSeconds int `json:"periodSeconds"`
}

// DriverCertConfigSpec defines custom certificates configuration for driver container
type DriverCertConfigSpec struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two are essentially the same.

i.e they (*ConfigSpec) point to a configmap

maybe define single type: ConfigMapSpec
an use it as the type of both CertConfig and RepoConfig below

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's rename it

@@ -405,6 +412,12 @@ spec:
- initialDelaySeconds
- periodSeconds
type: object
repoConfig:
description: 'Optional: Custom repo configuration for OFED container'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above , use either "OFED container" or "driver container"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

pkg/state/state_ofed.go Outdated Show resolved Hide resolved
pkg/state/state_ofed.go Outdated Show resolved Hide resolved
pkg/state/state_ofed.go Outdated Show resolved Hide resolved

err = s.mountAdditionalVolumesFromConfigMap(&additionalVolMounts, cr.Spec.OFEDDriver.RepoConfig.Name, destinationDir)
if err != nil {
return nil, fmt.Errorf("failed to custom repositories configuration: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@e0ne e0ne force-pushed the proxy branch 2 times, most recently from d012260 to b201d8e Compare July 6, 2022 12:33
@@ -152,6 +239,22 @@ func (s *stateOFED) GetWatchSources() map[string]*source.Kind {
return wr
}

func (s *stateOFED) mountAdditionalVolumesFromConfigMap(
Copy link
Collaborator

@adrianchiris adrianchiris Jul 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this method doesnt actually mount anything but rather generate volumes and volume mounts. also it mainly revolves around additionalVolumeMounts type so perhaps we should have a method in that struct.

suggestion:

define:
(am *additionalMounts) FromConfigMap(cm *v1.ConfigMap, destDir) error

which will populate additionalMounts from configmap.

then you just need to rename this method to handleAdditionalMounts which will get configmap and call am.FromConfigMap(....)

or this logic (Getting config map and populating additionalMount struct can be called directly from getManifestObjects

wdyt ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also please add docstring to added methods above :) if we go this route.

@adrianchiris
Copy link
Collaborator

@e0ne PTAL

@e0ne e0ne marked this pull request as draft July 7, 2022 10:55
@e0ne e0ne marked this pull request as ready for review July 7, 2022 11:02
func (a *additionalVolumeMounts) FromConfigMap(configMap *v1.ConfigMap, destDir string) error {
volumeMounts, itemsToInclude, err := a.createConfigMapVolumeMounts(configMap, destDir)
if err != nil {
return fmt.Errorf("ERROR: failed to create ConfigMap VolumeMounts for custom repo config: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think its clear its an error so no need for ERROR: prefix.

also error should not reference custom repo config but rather something generic like:
"failed to create Volume mounts from configmap. %v, err"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

objKey := client.ObjectKey{Namespace: namespace, Name: configMapName}
err := s.client.Get(context.TODO(), objKey, configMap)
if err != nil {
return fmt.Errorf("ERROR: could not get ConfigMap %s from client: %v", configMapName, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Drop ERROR: prefix the object returned here is an error


err = volMounts.FromConfigMap(configMap, destDir)
if err != nil {
return fmt.Errorf("ERROR: could not create volume mounts for ConfigMap: %s", configMapName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Drop ERROR: prefix the object returned here is an error

This patch adds certConfig and custom repo configuration
options for OFED driver.

Signed-off-by: Ivan Kolodiazhny <ikolodiazhny@nvidia.com>
@adrianchiris
Copy link
Collaborator

LGTM, thanks @e0ne !

just need CI to pass and this can be merged

@e0ne e0ne merged commit 4a67542 into Mellanox:master Jul 7, 2022
@e0ne e0ne mentioned this pull request Jul 8, 2022
28 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants