Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

feat: remove cc_deployment_updater. #1637

Conversation

mook-as
Copy link
Contributor

@mook-as mook-as commented Dec 1, 2020

Description

The cc_deployment_updater job has an incorrect readiness probe; when it is in HA, it can get jammed and never become ready (see below). Rather than properly fixing it (by moving it into a separate instance group and marking it as active/passive), just drop the job completely as it does not appear to be necessary.

Motivation and Context

(Partially) fixes #1589 — we'll also want to pull in #1632 for upgrades.

How Has This Been Tested?

Finished 🐱 locally.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code has security implications.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

Additional Information:

Details on the readiness probe problems:

The probe was unaware that it was supposed to be active/passive, and just checks the logs to ensure all instances are sleeping once every five seconds. This only happened to work on passive nodes because the test incorrectly succeeded when an instance has never been ready (because jq happily accepted empty input and returned success). This means that if a pod was ever active, then turned passive again (e.g. because it was being restarted and another instance became active), it would have a stale line that matched the probe and therefore start getting marked as failed.

The cc_deployment_updater job has an incorrect readiness probe;
when it is in HA, it can get jammed and never become ready (see below).
Rather than properly fixing it (by moving it into a separate instance
group and marking it as active/passive), just drop the job completely as
it does not appear to be useful at all.

Details on the readiness probe problems:

The probe was unaware that it was supposed to be active/passive, and
just checks the logs to ensure all instances are sleeping once every
five seconds.  This only happened to work on passive nodes because the
test incorrectly succeeded when an instance has never been ready
(because jq happily accepted empty input and returned success).  This
means that if a pod was ever active, then turned passive again (e.g.
because it was being restarted and another instance became active), it
would have a stale line that matched the probe and therefore start
getting marked as failed.
@mook-as mook-as added the Type: Bug Something isn't working label Dec 1, 2020
Copy link
Member

@jandubois jandubois left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Delete all the things" is my new metal band!

@jandubois jandubois merged commit 024937c into cloudfoundry-incubator:master Dec 2, 2020
@gaktive gaktive added this to the 2.7.0 milestone Dec 10, 2020
@mook-as mook-as deleted the marky/delete-deployment-updater branch May 17, 2022 22:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Type: Bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HA upgrades failing
3 participants