Skip to content
This repository has been archived by the owner on Nov 7, 2018. It is now read-only.

Ensure cluster is in a green state before stopping a pod #134

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

deimosfr
Copy link
Contributor

The timeout is set to 8h before releasing the hook and forcing ES node
to shutdown.

The timeout is set to 8h before releasing the hook and forcing ES node
to shutdown.
@pires
Copy link
Owner

pires commented Sep 26, 2017

What happens if I'm deleting the deployment?

@deimosfr
Copy link
Contributor Author

Good question, I didn't test. Anyway, I think you can force a delete to bypass hooks.

@otrosien
Copy link

👍 .. any plans merging this?

@deimosfr
Copy link
Contributor Author

Works for me

Copy link
Owner

@pires pires left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase master.

preStop:
httpGet:
path: /_cluster/health?wait_for_status=green&timeout=28800s
port: 9300
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be port: 9200?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without terminationGracePeriodSeconds: 28800 the preStop command will be terminated after 30s even if it's still waiting for the elasticsearch api endpoint to timeout.

@otrosien
Copy link

@deimosfr what about actively deallocating shards off that node as part of the lifecycle hook? (e.g. setting exclude._ip and waiting for the node to become empty?

@pires
Copy link
Owner

pires commented Apr 17, 2018

With validation webhooks, it may be possible but it's a far-fetched thing to do here. Maybe an operator feature request?

@psalaberria002
Copy link

psalaberria002 commented May 18, 2018

Regarding the deallocation of shards in the preStop hook, does anyone have a working example? It would be a nice feature to have.

Could something like https://github.com/kayrus/elk-kubernetes/blob/master/docker/elasticsearch/pre-stop-hook.sh be used?

@zhujinhe
Copy link

zhujinhe commented Aug 1, 2018

It is not working in my case, the data pod scaled from 3 to 1 without waiting for status to be "green".

@mat1010
Copy link

mat1010 commented Aug 1, 2018

@psalaberria002 @zhujinhe
There are multiple ways to achieve this.

1. Delocate all shards before proceeding with the next one with preStart and postStop lifecycle hooks.

Here's my - slightly modified - working example which I originally took from https://github.com/helm/charts/blob/5cc1fd6c37f834949cf67c89fe23cf654a9bef77/incubator/elasticsearch/templates/configmap.yaml#L118

It's modified due to the fact that we are using the x-pack security features and therefore need encryption + authentication. You might remove the encryption (https://localhost) and the authentication part -u ${SOME_USER}:${SOME_PASSWORD}

Depending of the network performance and amount of data in the cluster this approach can take very long and decreases the cluster performance a lot due to the relocation of all shards prior each restart.

configmap.yaml:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ .Values.data.name }}-cm
  labels:
    app: {{ .Values.data.name }}
data:
  pre-stop-hook.sh: |-
    #!/bin/bash
    set -uo
    echo "Prepare to migrate data of the node ${NODE_NAME}"
    echo "Move all data from node ${NODE_NAME}"
    curl -k -u ${SOME_USER}:${SOME_PASSWORD} -s -XPUT -H 'Content-Type: application/json' 'https://localhost:9200/_cluster/settings' -d "{
      \"transient\" :{
          \"cluster.routing.allocation.exclude._name\" : \"${NODE_NAME}\"
      }
    }"
    echo ""
    while true ; do
      echo -e "Wait for node ${NODE_NAME} to become empty"
      SHARDS_ALLOCATION=$(curl -k -u ${SOME_USER}:${SOME_PASSWORD}} -s -XGET 'https://localhost:9200/_cat/shards')
      if ! echo "${SHARDS_ALLOCATION}" | grep -E "${NODE_NAME}" | grep -v .security-*; then
        echo -e "${NODE_NAME} has been evecuated"
        break
      fi
      sleep 1
    done
  post-start-hook.sh: |-
    #!/bin/bash
    set -uo
    while true; do
      curl -k -u ${SOME_USER}:${SOME_PASSWORD} -XGET "https://localhost:9200/_cluster/health"
      if [[ "$?" == "0" ]]; then
        break
      fi
      echo -e "${NODE_NAME} not reachable, retrying ..."
      sleep 1
    done
    echo ""
    CLUSTER_SETTINGS=$(curl -k -u ${SOME_USER}:${SOME_PASSWORD} -s -XGET "https://localhost:9200/_cluster/settings")
    if echo "${CLUSTER_SETTINGS}" | grep -E "${NODE_NAME}"; then
      echo -e "Activate node ${NODE_NAME}"
      curl -k -u elastic:${ES_BOOTSTRAP_PW} -s -XPUT -H 'Content-Type: application/json' "https:///localhost:9200/_cluster/settings" -d "{
        \"transient\" :{
          \"cluster.routing.allocation.exclude._name\" : null
        }
      }"
    fi
    echo -e "Node ${NODE_NAME} is ready to be used"

deployment.yaml:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  namespace: {{ .Release.Namespace }}
  name: {{ .Values.data.name }}
  labels:
    app: {{ .Values.data.name }}
spec:
  serviceName: {{ .Values.data.name }}
  replicas: {{ .Values.data.deployment.replicas }}
  revisionHistoryLimit: {{ .Values.data.deployment.revisionHistoryLimit }}
  podManagementPolicy: {{ .Values.data.deployment.podManagementPolicy }}
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: {{ .Values.data.name }}
      annotations:
    spec:
      serviceAccount: {{ .Values.serviceAccount }}
      securityContext:
        runAsUser:  {{ .Values.userId }}
        fsGroup:  {{ .Values.groupId }}
      imagePullSecrets:
        - name: {{ .Values.data.deployment.imagePullSecretName }}
      securityContext:
        runAsUser:  {{ .Values.userId }}
        fsGroup:  {{ .Values.groupId }}
      initContainers:
        - name: {{ .Values.data.deployment.initContainers.increaseMapCount.name }}
          image: "{{ .Values.image.os.repository }}:{{ .Values.image.os.tag }}"
          imagePullPolicy: {{ .Values.image.os.pullPolicy }}
          command:
            - sh
            - -c
            - 'echo 262144 > /proc/sys/vm/max_map_count'
          securityContext:
            privileged: {{ .Values.data.deployment.initContainers.increaseMapCount.securityContext.privileged }}
            runAsUser: {{ .Values.data.deployment.initContainers.increaseMapCount.securityContext.runAsUser }}
      containers:
      - name: {{ .Values.data.shortName }}
        image: "{{ .Values.image.elasticsearch.repository }}:{{ .Values.image.elasticsearch.tag }}"
        imagePullPolicy: {{ .Values.image.elasticsearch.pullPolicy }}
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /usr/bin/curl -k -u ${USERNAME}:${PASSWORD} "https://localhost:9200/_cluster/health?local=true"
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash","/pre-stop-hook.sh"]
          postStart:
            exec:
              command: ["/bin/bash","/post-start-hook.sh"]
        volumeMounts:
        - name: lifecycle-hooks
          mountPath: /pre-stop-hook.sh
          subPath: pre-stop-hook.sh
        - name: lifecycle-hooks
          mountPath: /post-start-hook.sh
          subPath: post-start-hook.sh
      terminationGracePeriodSeconds: 86400
      volumes:
      - name: lifecycle-hooks
        configMap:
          name: {{ .Values.data.name }}-cm

The deployment.yaml is not the full file, it has only the required parts for the lifecycle hooks.

2. Just ensure the containers are only being stopped as long the cluster is in green state

Use a readiness probe or a preStop hook.

readinessProbe:

readinessProbe:
  exec:
    command: 
    - /bin/bash 
    - -c 
    - /usr/bin/curl -k -u ${SOME_USER}:${SOME_PASSWORD} "https://localhost:9200/_cluster/health?wait_for_status=green&timeout=30s" | grep -v \"timed_out:\"true

preStop:

        lifecycle:
          preStop:
            exec:
              command: 
              - /bin/bash 
              - -c 
              - /usr/bin/curl -k -u ${SOME_USER}:${SOME_PASSWORD} "https://localhost:9200/_cluster/health?wait_for_status=green&timeout=28800s

It's important to also set terminationGracePeriodSeconds: 28800 otherwise the container will be killed after 30s since this is the default timeout.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants