Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update all dependencies for the operator #685

Merged
merged 2 commits into from
Nov 27, 2023

Conversation

ykulazhenkov
Copy link
Collaborator

@ykulazhenkov ykulazhenkov commented Nov 21, 2023

This PR includes following changes:

  • all dependencies are upgraded to the latest version
  • controler-runtime package changed the API, this PR also includes migration to the new API
  • k8s-operator-libs package changed the API, this PR removes logic which is not required anymore and does migration to the new API
  • move from deprecated wait.Poll function to wait.PollUntilContextTimeout

@ykulazhenkov ykulazhenkov changed the title Update all dependencies for the operator chore: update all dependencies for the operator Nov 21, 2023
@ykulazhenkov
Copy link
Collaborator Author

This PR is prerequisite for #600 (k8s-operator-libs update required)

@ykulazhenkov
Copy link
Collaborator Author

/retest-nic_operator_helm

@ykulazhenkov ykulazhenkov added the on hold This enhancement is currently on hold pending additional clarification and evaluation label Nov 22, 2023
pkg/consts/consts.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM,

one small comment RE comment in new const once addressed can be merged

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
The function is replaced with wait.PollUntilContextTimeout

Signed-off-by: Yury Kulazhenkov <ykulazhenkov@nvidia.com>
@ykulazhenkov ykulazhenkov removed the on hold This enhancement is currently on hold pending additional clarification and evaluation label Nov 27, 2023
@adrianchiris adrianchiris merged commit a255cf7 into Mellanox:master Nov 27, 2023
15 checks passed
e0ne added a commit that referenced this pull request Dec 5, 2023
On Node startup, the OFED container takes some time to compile and load
the driver.
During that time, workloads might get scheduled on that Node.
When OFED is loaded, all existing PODs that use NVIDIA NICs will lose
their network interfaces.
Some such PODs might silently fail or hang.
To avoid such a situation, before the OFED container is loaded, 
the Node should get Cordoned and Drained to ensure all workloads are
rescheduled.
The Node should be un-cordoned when the driver is ready on it.

The safe driver loading feature is implemented as a part of the upgrade
flow,
meaning safe driver loading is a special scenario of the upgrade
procedure,
where we upgrade from the inbox driver to the containerized OFED.

depends on: #685
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants