Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requeue if pod not available in AutoUpgrade #387

Merged
merged 1 commit into from
Jul 18, 2022

Conversation

rollandf
Copy link
Member

@rollandf rollandf commented Jul 12, 2022

In the auto upgrade flow, the upgrade state of the nodes is built
based on the Ofed pods.

However, there can be a stage where pods are not available as
the Upgrade flow is deleting them and new ones were not created
yet.

In that case, the corresponding node will not be considered in the
upgrade logic at this specific time when the pod is not re-created.

This can affect the "MaxParallelUpgrades" feature.

With this commit, in case a Ofed DaemonSet has one or more unavailable
pods, the upgrade logic will not run and the notification will be requeue
to be handle later.

Signed-off-by: Fred Rolland frolland@nvidia.com

@rollandf
Copy link
Member Author

From DaemonSet documentation:

// The number of nodes that should be running the
// daemon pod and have none of the daemon pod running and available
// (ready for at least spec.minReadySeconds)
// +optional
NumberUnavailable int32 `json:"numberUnavailable,omitempty" protobuf:"varint,8,opt,name=numberUnavailable"`

@rollandf rollandf force-pushed the unavailable branch 2 times, most recently from 181616e to 230227f Compare July 12, 2022 09:16
Copy link
Collaborator

@e0ne e0ne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -152,6 +153,13 @@ func (r *UpgradeReconciler) BuildState(ctx context.Context) (*upgrade.ClusterUpg

r.Log.V(consts.LogLevelDebug).Info("Got driver daemon sets", "length", len(daemonSets))

for _, ds := range daemonSets {
Copy link
Collaborator

@adrianchiris adrianchiris Jul 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, the DS status may not yet be updated by k8s daemonset controller. so we may be called after pod delete but before ds got a chance to update its status, leading to the same issue.

i think we should rely on ds status DesiredNumberScheduled (which just maps to the number of nodes that should run a ds pod) as this is not expected to change (scale up / down) in the middle of an active update flow.

then we can list the pods and count the ones associated with daemonset (have owner reference) and dont have deletion timestap set. which means that for sure the ds has claimed the pod

if the two are the same it means that pods that were deleted got rescheduled by the daemon.

Daemonset controller has some logic to get the status fields updated so i suspect we will not always see the actual state.

https://github.com/kubernetes/kubernetes/blob/a1c8e9386af844757333733714fa1757489735b3/pkg/controller/daemon/daemon_controller.go#L1118

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please see comment, i think we should do some changes

@rollandf
Copy link
Member Author

/retest-all

1 similar comment
@rollandf
Copy link
Member Author

/retest-all

Copy link
Collaborator

@adrianchiris adrianchiris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment, otherwise LGTM

controllers/upgrade_controller.go Show resolved Hide resolved
In the auto upgrade flow, the upgrade state of the nodes is built
based on the Ofed pods.

However, there can be a stage where pods are not available as
the Upgrade flow is deleting them and new ones were not created
yet.

In that case, the corresponding node will not be considered in the
upgrade logic at this specific time when the pod is not re-created.

This can affect the "MaxParallelUpgrades" feature.

With this commit, in case a Ofed DaemonSet has one or more unavailable
pods, the upgrade logic will not run and the notification will be requeue
to be handle later.

Signed-off-by: Fred Rolland <frolland@nvidia.com>
@adrianchiris adrianchiris merged commit c841259 into Mellanox:master Jul 18, 2022
@e0ne e0ne mentioned this pull request Jul 18, 2022
25 tasks
@rollandf rollandf deleted the unavailable branch February 28, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants