feat: remove cc_deployment_updater. #1637

mook-as · 2020-12-01T20:50:52Z

Description

The cc_deployment_updater job has an incorrect readiness probe; when it is in HA, it can get jammed and never become ready (see below). Rather than properly fixing it (by moving it into a separate instance group and marking it as active/passive), just drop the job completely as it does not appear to be necessary.

Motivation and Context

(Partially) fixes #1589 — we'll also want to pull in #1632 for upgrades.

How Has This Been Tested?

Finished 🐱 locally.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code has security implications.
My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

Additional Information:

Details on the readiness probe problems:

The probe was unaware that it was supposed to be active/passive, and just checks the logs to ensure all instances are sleeping once every five seconds. This only happened to work on passive nodes because the test incorrectly succeeded when an instance has never been ready (because jq happily accepted empty input and returned success). This means that if a pod was ever active, then turned passive again (e.g. because it was being restarted and another instance became active), it would have a stale line that matched the probe and therefore start getting marked as failed.

The cc_deployment_updater job has an incorrect readiness probe; when it is in HA, it can get jammed and never become ready (see below). Rather than properly fixing it (by moving it into a separate instance group and marking it as active/passive), just drop the job completely as it does not appear to be useful at all. Details on the readiness probe problems: The probe was unaware that it was supposed to be active/passive, and just checks the logs to ensure all instances are sleeping once every five seconds. This only happened to work on passive nodes because the test incorrectly succeeded when an instance has never been ready (because jq happily accepted empty input and returned success). This means that if a pod was ever active, then turned passive again (e.g. because it was being restarted and another instance became active), it would have a stale line that matched the probe and therefore start getting marked as failed.

jandubois

"Delete all the things" is my new metal band!

mook-as added the Type: Bug Something isn't working label Dec 1, 2020

mook-as requested a review from jandubois December 1, 2020 20:50

jandubois approved these changes Dec 1, 2020

View reviewed changes

mook-as mentioned this pull request Dec 2, 2020

feat: mechanism for hooking and fix for deployment updater upgrade #1632

Merged

7 tasks

jandubois merged commit 024937c into cloudfoundry-incubator:master Dec 2, 2020

mook-as mentioned this pull request Dec 7, 2020

Find out whether we should remove the cc deployment updater #1634

Closed

gaktive added this to the 2.7.0 milestone Dec 10, 2020

mook-as deleted the marky/delete-deployment-updater branch May 17, 2022 22:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: remove cc_deployment_updater. #1637

feat: remove cc_deployment_updater. #1637

mook-as commented Dec 1, 2020

jandubois left a comment

feat: remove cc_deployment_updater. #1637

feat: remove cc_deployment_updater. #1637

Conversation

mook-as commented Dec 1, 2020

Description

Motivation and Context

How Has This Been Tested?

Types of changes

Checklist:

Additional Information:

jandubois left a comment

Choose a reason for hiding this comment