Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor the opsDefinition to support task orchestration. #6735

Merged
merged 4 commits into from
Mar 5, 2024

Conversation

wangyelei
Copy link
Contributor

@wangyelei wangyelei commented Mar 4, 2024

This PR refactor the OpsDefinition to support task orchestration.

We provide the following three action operators(maybe mores):

  1. workload: will create a workload to execute the action. available types: [Job, Pod]
  2. exec: will run a pod to do kubectl exec for selected pod.
  3. resourceModifier: patch your inputed CR. (TODO)
    ...

And these code misses some test cases, I will I will add in the next PR.

Here's an example for switchover with lorry:

1. switchover opsDefinition:

apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsDefinition
metadata:
  name: switchover
spec:
  preConditions:
    - rule:
        expression: '{{ or (eq .component.status.phase "Running") (eq .component.status.phase "Abnormal") }}'
        message: "Component is not in Running/Abnormal status."
  targetPodTemplates:
    - name: availablePod
      podSelector:
        availability: Available
        selectionPolicy: Any
      vars:
        - name: TARGET_POD_IP
          valueFrom:
            envRef:
              envName: KB_POD_IP
        - name: LORRY_HTTP_PORT
          valueFrom:
            envRef:
              envName: LORRY_HTTP_PORT
  parametersSchema:
    openAPIV3Schema:
      properties:
        primary:
          description: "old primary instance name(pod Name)."
          type: string
        candidate:
          description: |
            candidate instance name(pod Name). if candidate is not empty, will promote it to primary. 
            otherwise promote a randomly selected pod to primary.
          type: string
      type: object
  actions:
    - name: switchover
      failurePolicy: Fail
      parameters:
        - primary
        - candidate
      workload:
        targetPodTemplate: availablePod
        type: Job
        backoffLimit: 0
        podSpec:
          containers:
          - name: switchover
            image: docker.io/apecloud/kubeblocks-tools:latest
            imagePullPolicy: IfNotPresent
            command:
              - sh
              - -c
              - |
                set -e
                # do switchover
                url="http://${TARGET_POD_IP}:${LORRY_HTTP_PORT}/v1.0/switchover" 
                params="{\"parameters\": {\"primary\":\"${primary}\",\"candidate\":\"${candidate}\"}}"
                echo "curl ${url}, parameters: ${params}"
                res=`curl -s -X POST -H 'Content-Type: application/json' "${url}" -d "${params}"`
                echo "curl result: ${res}"

                # check if switchover successfully.
                echo "INFO: start to check if switchover successfully, timeout is 60s"
                executedUnix=$(date +%s)
                while true; do
                  sleep 5
                  if [ ! -z ${candidate} ]; then
                     # if candidate specified, only check it
                     role=$(kubectl get pod ${candidate} -ojson | jq -r '.metadata.labels["kubeblocks.io/role"]')
                     if [ "$role" == "primary" ] || [ "$role" == "leader" ] || [ "$role" == "master" ]; then
                        echo "INFO: switchover successfully, ${candidate} is ${role}"
                        exit 0
                     fi
                  else
                    # check if the candidate instance has been promote to primary
                    pods=$(kubectl get pod -l apps.kubeblocks.io/component-name=${KB_COMP_NAME},app.kubernetes.io/instance=${KB_CLUSTER_NAME} | awk 'NR > 1 {print $1}')
                    for podName in ${pods}; do
                       if [ "${podName}" != "${primary}" ];then
                         role=$(kubectl get pod ${podName} -ojson | jq -r '.metadata.labels["kubeblocks.io/role"]')
                         if [ "$role" == "primary" ] || [ "$role" == "leader" ] || [ "$role" == "master" ]; then
                            echo "INFO: switchover successfully, ${podName} is ${role}"
                            exit 0
                         fi
                       fi
                    done
                  fi
                  currentUnix=$(date +%s)
                  diff_time=$((${currentUnix}-${executedUnix}))
                  if [ ${diff_time} -ge 60 ]; then
                    echo "ERROR: switchover failed."
                    exit 1
                  fi
                done

2. do switchover for mysql cluster

apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  finalizers:
    - opsrequest.kubeblocks.io/finalizer
  generateName: ops-switchover-
spec:
  clusterRef: mysql
  customSpec:
    opsDefinitionRef: switchover
    # serviceAccountName: your-sa
    components:
      - name: mysql
        parameters:
          - name: primary
            value: "mysql-mysql-0"
          - name: candidate
            value: "mysql-mysql-2"
  type: Custom

Note: for workload action, you can use sh /scripts/patch-extras-status.sh '[{"name":"test"}]' to record your custom information in opsRequest.status.extras.

@github-actions github-actions bot added the size/XXL Denotes a PR that changes 1000+ lines. label Mar 4, 2024
@wangyelei wangyelei linked an issue Mar 4, 2024 that may be closed by this pull request
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 63.29588% with 98 lines in your changes are missing coverage. Please review.

Project coverage is 66.46%. Comparing base (3edae67) to head (2720726).
Report is 3 commits behind head on main.

Files Patch % Lines
controllers/apps/operations/custom_workflow.go 55.88% 37 Missing and 8 partials ⚠️
controllers/apps/opsrequest_controller.go 43.24% 16 Missing and 5 partials ⚠️
controllers/apps/operations/custom.go 82.97% 10 Missing and 6 partials ⚠️
pkg/controllerutil/pod_utils.go 0.00% 10 Missing ⚠️
pkg/common/utils.go 0.00% 5 Missing ⚠️
controllers/apps/operations/ops_progress_util.go 92.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6735      +/-   ##
==========================================
+ Coverage   66.37%   66.46%   +0.09%     
==========================================
  Files         307      308       +1     
  Lines       36848    36770      -78     
==========================================
- Hits        24458    24440      -18     
+ Misses      10296    10233      -63     
- Partials     2094     2097       +3     
Flag Coverage Δ
unittests 66.46% <63.29%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wangyelei wangyelei merged commit ac1ebe9 into main Mar 5, 2024
47 checks passed
@wangyelei wangyelei deleted the feature/improve_opsdef branch March 5, 2024 04:56
@github-actions github-actions bot added this to the Release 0.8.2 milestone Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/user-interaction feature refactor size/XXL Denotes a PR that changes 1000+ lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Features] OpsDefinition supports task orchestration
4 participants