-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve errors for StatefulSet restart with updateStrategy: OnDelete #876
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,9 @@ def status | |
end | ||
|
||
def deploy_succeeded? | ||
success = observed_generation == current_generation && | ||
desired_replicas == status_data['readyReplicas'].to_i && | ||
status_data['currentRevision'] == status_data['updateRevision'] | ||
if update_strategy == ONDELETE | ||
# Gem cannot monitor update since it doesn't occur until delete | ||
unless @success_assumption_warning_shown | ||
|
@@ -27,11 +30,10 @@ def deploy_succeeded? | |
"Consider switching to rollingUpdate.") | ||
@success_assumption_warning_shown = true | ||
end | ||
else | ||
success &= desired_replicas == status_data['currentReplicas'].to_i | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Edit: It turns out There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How was the CI test in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @timothysmith0609 Great question! I had to look into this a bit. The brief answer is certain combinations of operations can cause the I've pushed a version of this test that hangs (in the same way I saw in real life) here: 5c4acd0 You can observe locally this is the state this sequence of events leaves the stateful set in:
Notice the absence of I did not push this version of the test here because it's hanging due to the previously mentioned bug so the only way to run this test is with If you prefer, I can drop these changes from this PR since IRL I will have to run without verify result anyway for now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for the explanation!
May as well keep the changes for now, to avoid repeating the same work later. |
||
end | ||
observed_generation == current_generation && | ||
status_data['currentRevision'] == status_data['updateRevision'] && | ||
desired_replicas == status_data['readyReplicas'].to_i && | ||
desired_replicas == status_data['currentReplicas'].to_i | ||
success | ||
end | ||
|
||
def deploy_failed? | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -213,7 +213,7 @@ def patch_daemonset_with_restart(record) | |
def delete_statefulset_pods(record) | ||
pods = kubeclient.get_pods(namespace: record.metadata.namespace) | ||
pods.select! do |pod| | ||
pod.metadata.ownerReferences.find { |ref| ref.uid == record.metadata.uid } | ||
pod.metadata&.ownerReferences&.find { |ref| ref.uid == record.metadata.uid } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixes the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤦 Thank you for fixing that |
||
end | ||
pods.each { |pod| kubeclient.delete_pod(pod.metadata.name, pod.metadata.namespace) } | ||
end | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -36,7 +36,8 @@ def test_restart_by_annotation | |
end | ||
|
||
def test_restart_statefulset_on_delete_restarts_child_pods | ||
result = deploy_fixtures("hello-cloud", subset: "stateful_set.yml") do |fixtures| | ||
result = deploy_fixtures("hello-cloud", subset: ["configmap-data.yml", "unmanaged-pod-1.yml.erb", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Deploying a bare pod in the integration test without the fix produces the expected error:
|
||
"stateful_set.yml"], render_erb: true) do |fixtures| | ||
statefulset = fixtures["stateful_set.yml"]["StatefulSet"].first | ||
statefulset["spec"]["updateStrategy"] = { "type" => "OnDelete" } | ||
end | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is currently broken in upstream Kubernetes for
OnDelete
, but if we remove the check the method will return immediately, before the pods are actually restarted successfully.We could (re)implement some of the controller logic to check for individual pods but I believe running with
--no-verify-result
until the upstream fix is the better option.Upstream fix: kubernetes/kubernetes#106059