Skip to content

Commit

Permalink
Add back the scripted wait for the webhook (#169)
Browse files Browse the repository at this point in the history
When we upgrade the operator-sdk to 1.18.0, we thought we were getting a
fix for the webhook ready probe.  So we removed the scripted wait and
instead relied on the --wait option through helm.  However, we still saw
cases where the helm command returns after the wait but the webhook
still wasn't 100% ready.  We are adding back the scripted wait to
stabilize things until we can investigate this more.
  • Loading branch information
spilchen committed Mar 10, 2022
1 parent 8a86b26 commit da07f9d
Show file tree
Hide file tree
Showing 2 changed files with 102 additions and 0 deletions.
2 changes: 2 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -390,8 +390,10 @@ uninstall: manifests kustomize ## Uninstall CRDs from the K8s cluster specified
deploy-operator: manifests kustomize ## Using helm or olm, deploy the operator in the K8s cluster
ifeq ($(DEPLOY_WITH), helm)
helm install --wait -n $(NAMESPACE) $(HELM_RELEASE_NAME) $(OPERATOR_CHART) --set image.name=${OPERATOR_IMG} --set logging.dev=${DEV_MODE} $(HELM_OVERRIDES)
scripts/wait-for-webhook.sh -n $(NAMESPACE) -t 60
else ifeq ($(DEPLOY_WITH), olm)
scripts/deploy-olm.sh -n $(NAMESPACE) $(OLM_TEST_CATALOG_SOURCE)
scripts/wait-for-webhook.sh -n $(NAMESPACE) -t 60
else ifeq ($(DEPLOY_WITH), random)
ifeq ($(shell (( $$RANDOM % 2 )); echo $$?),0)
DEPLOY_WITH=helm $(MAKE) deploy-operator
Expand Down
100 changes: 100 additions & 0 deletions scripts/wait-for-webhook.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
#!/bin/bash

# (c) Copyright [2021-2022] Micro Focus or one of its affiliates.
# Licensed under the Apache License, Version 2.0 (the "License");
# You may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# A script that will wait for the webhook to be fully setup. There is a small
# timing window where the pod with the webhook is up and ready, but the webhook
# is not yet able to accept connections. See this issue for more details:
# https://github.com/vertica/vertica-kubernetes/issues/30

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
REPO_DIR=$(dirname $SCRIPT_DIR)
TIMEOUT=30

function usage() {
echo "usage: $(basename $0) [-n <namespace>] [-t <seconds>]"
echo
echo "Options:"
echo " -n <namespace> Check the webhook in this namespace."
echo " -t <seconds> Specify the timeout in seconds [defaults: $TIMEOUT]"
exit 1
}

while getopts "n:t:h" opt
do
case $opt in
n)
NAMESPACE_OPT="-n $OPTARG"
;;
t)
TIMEOUT=$OPTARG
;;
h)
usage
;;
\?)
echo "ERROR: unrecognized option: -$opt"
usage
;;
esac
done

# First ensure the service object for the webhook exists.
trap "echo 'Failed waiting for webhook service object to exist'" 0 2 3 15
set -o errexit
timeout $TIMEOUT bash -c -- "\
while ! kubectl get $NAMESPACE_OPT svc --no-headers verticadb-operator-webhook-service 2> /dev/null | grep -cq 'service'; \
do \
sleep 0.1; \
done"
set +o errexit
trap 1> /dev/null

# Next, to validate the webhook exists, we will continually create/delete a
# VerticaDB. If it succeeds, then we assume the webhook is up and running.
# This depends on the webhook config having the 'failurePolicy: Fail' set.

SELECTOR_KEY=vertica.com/use
SELECTOR_VAL=wait-for-webhook
SELECTOR=$SELECTOR_KEY=$SELECTOR_VAL

MANIFEST=$(mktemp)

cat <<EOF > $MANIFEST
apiVersion: vertica.com/v1beta1
kind: VerticaDB
metadata:
generateName: wait-for-webhook-
labels:
$SELECTOR_KEY: $SELECTOR_VAL
spec:
image: "vertica/vertica-k8s:latest"
initPolicy: ScheduleOnly
subclusters:
- name: sc1
size: 1
EOF

# Delete old manifests, but likely won't be there so eat the error.
kubectl delete $NAMESPACE_OPT vdb -l $SELECTOR 2> /dev/null 1> /dev/null || :

trap "kubectl delete $NAMESPACE_OPT vdb -l $SELECTOR; rm $MANIFEST" 0 2 3 15 # Ensure deletion on script exit"

timeout $TIMEOUT bash -c -- "\
while ! kubectl create $NAMESPACE_OPT -f $MANIFEST 2> /dev/null; \
do \
sleep 0.1; \
done" &
pid=$!
wait $pid

0 comments on commit da07f9d

Please sign in to comment.