fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

ShuangMen · 2020-12-21T09:34:23Z

Description

kubecf with multi-az upgrade failed from version v2.6.1 to v2.7.1.

Motivation and Context

How Has This Been Tested?

run helm upgrade kubecf with the updated code, upgrade process moves on and all schedulers get upgrade successfully.

$ k get pod -n kubecf
NAME                                     READY   STATUS                  RESTARTS   AGE
api-z0-0                                 17/17   Running                 1          17m
api-z1-0                                 17/17   Running                 8          17m
auctioneer-0                             6/6     Running                 1          18m
bosh-dns-55f949b56d-6vbbq                1/1     Running                 0          4d20h
bosh-dns-55f949b56d-tgg5w                1/1     Running                 0          4d20h
cc-worker-z0-0                           6/6     Running                 0          18m
cc-worker-z1-0                           6/6     Running                 0          18m
cf-apps-dns-59f9f659f5-t94mh             1/1     Running                 0          27m
coredns-quarks-6db68476bd-ks6cj          1/1     Running                 0          3h32m
coredns-quarks-6db68476bd-pz5dn          1/1     Running                 0          3h32m
database-0                               2/2     Running                 0          27m
database-seeder-7a19efc54ebbb714-pqtbg   0/2     Completed               0          31d
database-seeder-d49344d80353dd73-gmljj   0/2     Completed               0          31d
diego-api-z0-0                           9/9     Running                 2          17m
diego-api-z1-0                           9/9     Running                 2          17m
diego-cell-z0-0                          0/12    Init:CrashLoopBackOff   6          17m
diego-cell-z1-0                          0/12    Init:CrashLoopBackOff   7          17m
doppler-z0-0                             6/6     Running                 0          18m
doppler-z1-0                             6/6     Running                 0          18m
log-api-z0-0                             9/9     Running                 0          17m
log-api-z1-0                             9/9     Running                 0          18m
log-cache-0                              10/10   Running                 0          17m
nats-z0-0                                7/7     Running                 0          18m
nats-z1-0                                7/7     Running                 0          18m
router-z0-0                              7/7     Running                 0          18m
router-z1-0                              7/7     Running                 4          18m
scheduler-z0-0                           12/12   Running                 1          17m
scheduler-z1-0                           12/12   Running                 1          17m
singleton-blobstore-z0-0                 8/8     Running                 0          18m
uaa-z0-0                                 8/8     Running                 0          18m
uaa-z1-0                                 8/8     Running                 0          18m

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code has security implications.
My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

ShuangMen · 2020-12-23T03:07:39Z

hi, could some one review this ?
@mook-as

ShuangMen · 2020-12-30T09:14:17Z

add some check on the existence of scheduler statefulset for the case of multi-az and multi-cluster kubecf.

jandubois · 2021-01-04T23:09:22Z

chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh

@@ -18,8 +18,18 @@ spec:
      - name: cc-deployment-updater-cc-deployment-updater
        readinessProbe: ~
 '
+set +o pipefail
+scheduler_list=$(kubectl get statefulsets --namespace "$NAMESPACE" | grep scheduler | cut -d " " -f 1)
+set -o pipefail


I would prefer to use the kubectl command to specify exactly the output we need instead of post-processing it with grep/awk/sed/cut etc, which can break when the default output format changes. E.g.

scheduler_list=$(kubectl get statefulsets \ --namespace "${NAMESPACE}" \ --selector quarks.cloudfoundry.org/quarks-statefulset-name=scheduler \ --no-headers=true \ --output custom-columns=:metadata.name \ )

@ShuangMen I just realized that quarks.cloudfoundry.org//quarks-statefulset-name might not be the correct selector; I don't have a multi-az cluster to test.

I think quarks.cloudfoundry.org/instance-group-name is actually the correct label.

Sorry about the confusion!

@jandubois, thanks. I update the code based on your comments.

jandubois · 2021-01-04T23:11:33Z

chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh

+  exit 0
+fi
+
+for i in ${scheduler_list}; do


I don't care too much, but if you already change the PR, why not use a more descriptive name than i? I think scheduler would be an obvious choice.

@jandubois, thanks. I update the code based on your comments.

jandubois · 2021-01-04T23:21:28Z

chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh

+fi
+
+for i in ${scheduler_list}; do
+  kubectl patch statefulset --namespace "$NAMESPACE" $i --patch "$patch"


Please always run make lint before submitting a PR (it will not pass CI if it fails the lint step).

In ./chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh line 31: kubectl patch statefulset --namespace "$NAMESPACE" $i --patch "$patch" ^-- SC2086: Double quote to prevent globbing and word splitting.

In addition we generally put variables in curly braces as well (although that is not enforced by shellcheck, so we often miss it). The final style would ideally be:

for scheduler in ${scheduler_list}; do kubectl patch statefulset --namespace "${NAMESPACE}" "${scheduler}" --patch "${patch}" done

@jandubois, thanks for your good advice, I've updated the code.

fix upgrade can't find multi-az scheduler

53718db

ShuangMen changed the title ~~fix upgrade can't find multi-az scheduler~~ fix: kubecf upgrade failure due to can't find multi-az scheduler Dec 21, 2020

deal with the case of no scheduler statefulset

40064d1

jandubois self-requested a review January 4, 2021 22:47

jandubois suggested changes Jan 4, 2021

View reviewed changes

ShuangMen added 2 commits January 5, 2021 13:59

update the code based on the reviewers comments

fa11bd3

update the code based on the reviewers comments

f28ea6f

jandubois approved these changes Jan 5, 2021

View reviewed changes

jandubois merged commit 7a5aa1c into cloudfoundry-incubator:master Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

ShuangMen commented Dec 21, 2020 •

edited

Loading

ShuangMen commented Dec 23, 2020

ShuangMen commented Dec 30, 2020

jandubois Jan 4, 2021

jandubois Jan 4, 2021

ShuangMen Jan 5, 2021

jandubois Jan 4, 2021

ShuangMen Jan 5, 2021

jandubois Jan 4, 2021

ShuangMen Jan 5, 2021 •

edited

Loading

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

Conversation

ShuangMen commented Dec 21, 2020 • edited Loading

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

ShuangMen commented Dec 23, 2020

ShuangMen commented Dec 30, 2020

jandubois Jan 4, 2021

Choose a reason for hiding this comment

jandubois Jan 4, 2021

Choose a reason for hiding this comment

ShuangMen Jan 5, 2021

Choose a reason for hiding this comment

jandubois Jan 4, 2021

Choose a reason for hiding this comment

ShuangMen Jan 5, 2021

Choose a reason for hiding this comment

jandubois Jan 4, 2021

Choose a reason for hiding this comment

ShuangMen Jan 5, 2021 • edited Loading

Choose a reason for hiding this comment

ShuangMen commented Dec 21, 2020 •

edited

Loading

ShuangMen Jan 5, 2021 •

edited

Loading