Skip to content

Commit

Permalink
Increase termination grace period for raft members
Browse files Browse the repository at this point in the history
It's important to give them a good chance to communicate to the leader
that they are leaving. If the leader is offline for a bit (pod deleted),
the message may not have enough time to propagate and commit into the
leader database (the default is 30 seconds), which may leave the cluster
in divergent state.
  • Loading branch information
booxter committed Mar 20, 2024
1 parent 8642d07 commit 5eee807
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion pkg/ovndbcluster/statefulset.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,19 @@ func StatefulSet(
volumeMounts = append(volumeMounts, svc.CreateVolumeMounts(serviceName)...)
}

// NOTE(ihar) ovndb pods leave the raft cluster on delete; it's important
// that they are not interrupted and have a good chance to propagate the
// leave message to the leader. In general case, this should happen near
// instantly. But if the leader pod is itself down / restarting, it may take
// it some time to recover and start processing messages from other members.
// The default value of 30 seconds is sometimes not enough. In local testing,
// 60 seconds seems enough, but we'll take a significantly more conservative
// approach here and set it to 5 minutes.
//
// If the leader is not back even after 5 minutes, we'll give up
// nevertheless, and manual cluster recovery will be needed.
terminationGracePeriodSeconds := int64(300)

statefulset := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: serviceName,
Expand All @@ -133,7 +146,8 @@ func StatefulSet(
Labels: labels,
},
Spec: corev1.PodSpec{
ServiceAccountName: instance.RbacResourceName(),
TerminationGracePeriodSeconds: &terminationGracePeriodSeconds,
ServiceAccountName: instance.RbacResourceName(),
Containers: []corev1.Container{
{
Name: serviceName,
Expand Down

0 comments on commit 5eee807

Please sign in to comment.