Skip to content
This repository has been archived by the owner on Oct 22, 2021. It is now read-only.

fix: kubecf upgrade failure due to can't find multi-az scheduler #1663

Merged
merged 4 commits into from
Jan 5, 2021

Conversation

ShuangMen
Copy link
Contributor

@ShuangMen ShuangMen commented Dec 21, 2020

Description

kubecf with multi-az upgrade failed from version v2.6.1 to v2.7.1.

Motivation and Context

#1662

How Has This Been Tested?

run helm upgrade kubecf with the updated code, upgrade process moves on and all schedulers get upgrade successfully.

$ k get pod -n kubecf
NAME                                     READY   STATUS                  RESTARTS   AGE
api-z0-0                                 17/17   Running                 1          17m
api-z1-0                                 17/17   Running                 8          17m
auctioneer-0                             6/6     Running                 1          18m
bosh-dns-55f949b56d-6vbbq                1/1     Running                 0          4d20h
bosh-dns-55f949b56d-tgg5w                1/1     Running                 0          4d20h
cc-worker-z0-0                           6/6     Running                 0          18m
cc-worker-z1-0                           6/6     Running                 0          18m
cf-apps-dns-59f9f659f5-t94mh             1/1     Running                 0          27m
coredns-quarks-6db68476bd-ks6cj          1/1     Running                 0          3h32m
coredns-quarks-6db68476bd-pz5dn          1/1     Running                 0          3h32m
database-0                               2/2     Running                 0          27m
database-seeder-7a19efc54ebbb714-pqtbg   0/2     Completed               0          31d
database-seeder-d49344d80353dd73-gmljj   0/2     Completed               0          31d
diego-api-z0-0                           9/9     Running                 2          17m
diego-api-z1-0                           9/9     Running                 2          17m
diego-cell-z0-0                          0/12    Init:CrashLoopBackOff   6          17m
diego-cell-z1-0                          0/12    Init:CrashLoopBackOff   7          17m
doppler-z0-0                             6/6     Running                 0          18m
doppler-z1-0                             6/6     Running                 0          18m
log-api-z0-0                             9/9     Running                 0          17m
log-api-z1-0                             9/9     Running                 0          18m
log-cache-0                              10/10   Running                 0          17m
nats-z0-0                                7/7     Running                 0          18m
nats-z1-0                                7/7     Running                 0          18m
router-z0-0                              7/7     Running                 0          18m
router-z1-0                              7/7     Running                 4          18m
scheduler-z0-0                           12/12   Running                 1          17m
scheduler-z1-0                           12/12   Running                 1          17m
singleton-blobstore-z0-0                 8/8     Running                 0          18m
uaa-z0-0                                 8/8     Running                 0          18m
uaa-z1-0                                 8/8     Running                 0          18m

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code has security implications.
  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.

@ShuangMen ShuangMen changed the title fix upgrade can't find multi-az scheduler fix: kubecf upgrade failure due to can't find multi-az scheduler Dec 21, 2020
@ShuangMen
Copy link
Contributor Author

hi, could some one review this ?
@mook-as

@ShuangMen
Copy link
Contributor Author

add some check on the existence of scheduler statefulset for the case of multi-az and multi-cluster kubecf.

@jandubois jandubois self-requested a review January 4, 2021 22:47
@@ -18,8 +18,18 @@ spec:
- name: cc-deployment-updater-cc-deployment-updater
readinessProbe: ~
'
set +o pipefail
scheduler_list=$(kubectl get statefulsets --namespace "$NAMESPACE" | grep scheduler | cut -d " " -f 1)
set -o pipefail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to use the kubectl command to specify exactly the output we need instead of post-processing it with grep/awk/sed/cut etc, which can break when the default output format changes. E.g.

scheduler_list=$(kubectl get statefulsets \
    --namespace "${NAMESPACE}" \
    --selector quarks.cloudfoundry.org/quarks-statefulset-name=scheduler \
    --no-headers=true \
    --output custom-columns=:metadata.name \
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShuangMen I just realized that quarks.cloudfoundry.org//quarks-statefulset-name might not be the correct selector; I don't have a multi-az cluster to test.

I think quarks.cloudfoundry.org/instance-group-name is actually the correct label.

Sorry about the confusion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jandubois, thanks. I update the code based on your comments.

exit 0
fi

for i in ${scheduler_list}; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care too much, but if you already change the PR, why not use a more descriptive name than i? I think scheduler would be an obvious choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jandubois, thanks. I update the code based on your comments.

fi

for i in ${scheduler_list}; do
kubectl patch statefulset --namespace "$NAMESPACE" $i --patch "$patch"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please always run make lint before submitting a PR (it will not pass CI if it fails the lint step).

In ./chart/hooks/pre-upgrade/remove-deployment-updater-readiness.sh line 31:
  kubectl patch statefulset --namespace "$NAMESPACE" $i --patch "$patch"
                                                     ^-- SC2086: Double quote to prevent globbing and word splitting.

In addition we generally put variables in curly braces as well (although that is not enforced by shellcheck, so we often miss it). The final style would ideally be:

for scheduler in ${scheduler_list}; do
  kubectl patch statefulset --namespace "${NAMESPACE}" "${scheduler}" --patch "${patch}"
done

Copy link
Contributor Author

@ShuangMen ShuangMen Jan 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jandubois, thanks for your good advice, I've updated the code.

@jandubois jandubois merged commit 7a5aa1c into cloudfoundry-incubator:master Jan 5, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants