Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove shutdown-manager liveness probe #4967

Merged
merged 3 commits into from
Jan 10, 2023

Conversation

skriss
Copy link
Member

@skriss skriss commented Jan 9, 2023

The probe can currently cause problems when it fails by causing the shutdown-manager container to be restarted by itself, which then results in the envoy container getting stuck in a "DRAINING" state indefinitely.

Not having the probe is less bad overall because envoy pods are less likely to get stuck in "DRAINING", and the worst case without it is that shutdown-manager is truly unresponsive during a pod termination, in which case the envoy container will simply terminate without first draining active connections.

Updates #4851.

Signed-off-by: Steve Kriss krisss@vmware.com

The probe can currently cause problems when it fails
by causing the shutdown-manager container to be restarted
by itself, which then results in the envoy container
getting stuck in a "DRAINING" state indefinitely.

Not having the probe is less bad overall because envoy pods
are less likely to get stuck in "DRAINING", and the
worst case without it is that shutdown-manager is truly
unresponsive during a pod termination, in which case
the envoy container will simply terminate without first
draining active connections.

Updates projectcontour#4851.

Signed-off-by: Steve Kriss <krisss@vmware.com>
@skriss skriss added the release-note/minor A minor change that needs about a paragraph of explanation in the release notes. label Jan 9, 2023
@skriss skriss requested a review from a team as a code owner January 9, 2023 19:56
@skriss skriss requested review from tsaarni, stevesloka and sunjayBhatia and removed request for a team January 9, 2023 19:56
Signed-off-by: Steve Kriss <krisss@vmware.com>
@codecov
Copy link

codecov bot commented Jan 9, 2023

Codecov Report

Merging #4967 (fd854c8) into main (56ae876) will decrease coverage by 0.03%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4967      +/-   ##
==========================================
- Coverage   77.63%   77.60%   -0.04%     
==========================================
  Files         140      140              
  Lines       16885    16871      -14     
==========================================
- Hits        13109    13093      -16     
- Misses       3519     3521       +2     
  Partials      257      257              
Impacted Files Coverage Δ
...nternal/provisioner/objects/dataplane/dataplane.go 84.81% <ø> (-0.52%) ⬇️
internal/sorter/sorter.go 97.95% <0.00%> (-1.03%) ⬇️

This change is to mitigate a problem where when the liveness probe fails, the shutdown-manager container is restarted by itself.
This ultimately has the unintended effect of causing the envoy container to be stuck indefinitely in a "DRAINING" state and not serving traffic.

Overall, not having the liveness probe on the shutdown-manager container is less bad because envoy pods are less likely to get stuck in "DRAINING", and the worst case without it is that shutdown-manager is truly unresponsive during a pod termination, in which case the envoy container will simply terminate without first draining active connections.
Copy link
Member

@sunjayBhatia sunjayBhatia Jan 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit on the wording, might be good to explicitly add that the termination/restart of the container caused by the lack of liveness probe will ensure the envoy pod is back in the set of ready envoys to load balance traffic to

Signed-off-by: Steve Kriss <krisss@vmware.com>
@skriss skriss merged commit de4c25c into projectcontour:main Jan 10, 2023
@skriss skriss deleted the pr-rm-sdm-liveness branch January 13, 2023 19:36
yangyy93 pushed a commit to projectsesame/contour that referenced this pull request Feb 16, 2023
The probe can currently cause problems when it fails
by causing the shutdown-manager container to be restarted
by itself, which then results in the envoy container
getting stuck in a "DRAINING" state indefinitely.

Not having the probe is less bad overall because envoy pods
are less likely to get stuck in "DRAINING", and the
worst case without it is that shutdown-manager is truly
unresponsive during a pod termination, in which case
the envoy container will simply terminate without first
draining active connections.

Updates projectcontour#4851.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: yy <yang.yang@daocloud.io>
yangyy93 pushed a commit to projectsesame/contour that referenced this pull request Feb 16, 2023
The probe can currently cause problems when it fails
by causing the shutdown-manager container to be restarted
by itself, which then results in the envoy container
getting stuck in a "DRAINING" state indefinitely.

Not having the probe is less bad overall because envoy pods
are less likely to get stuck in "DRAINING", and the
worst case without it is that shutdown-manager is truly
unresponsive during a pod termination, in which case
the envoy container will simply terminate without first
draining active connections.

Updates projectcontour#4851.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Signed-off-by: yy <yang.yang@daocloud.io>
vmw-yingy pushed a commit to vmw-yingy/contour that referenced this pull request Feb 28, 2023
The probe can currently cause problems when it fails
by causing the shutdown-manager container to be restarted
by itself, which then results in the envoy container
getting stuck in a "DRAINING" state indefinitely.

Not having the probe is less bad overall because envoy pods
are less likely to get stuck in "DRAINING", and the
worst case without it is that shutdown-manager is truly
unresponsive during a pod termination, in which case
the envoy container will simply terminate without first
draining active connections.

Updates projectcontour#4851.

Signed-off-by: Steve Kriss <krisss@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/minor A minor change that needs about a paragraph of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants