Skip to content

Commit

Permalink
Merge pull request #380 from ii2day/release-v0.2
Browse files Browse the repository at this point in the history
Release v0.2.1
  • Loading branch information
weizhoublue authored Nov 29, 2023
2 parents 8b1b53d + 640dde7 commit 1a071cf
Show file tree
Hide file tree
Showing 718 changed files with 11,732 additions and 5,167 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/badge.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
- name: Create Lines-of-Code-Badge
if: ${{ env.BADGE_CODELINE_ID != '' }}
uses: schneegans/dynamic-badges-action@v1.6.0
uses: schneegans/dynamic-badges-action@v1.7.0
with:
auth: ${{ secrets.WELAN_PAT }}
gistID: ${{ env.BADGE_CODELINE_ID }}
Expand All @@ -53,7 +53,7 @@ jobs:

- name: Create Comments-Badge
if: ${{ env.BADGE_COMMENT_LINE != '' }}
uses: schneegans/dynamic-badges-action@v1.6.0
uses: schneegans/dynamic-badges-action@v1.7.0
with:
auth: ${{ secrets.WELAN_PAT }}
gistID: ${{ env.BADGE_COMMENT_LINE }}
Expand All @@ -66,7 +66,7 @@ jobs:

- name: Create E2E-Badge
if: ${{ env.BADGE_E2ECOVER_ID != '' }}
uses: schneegans/dynamic-badges-action@v1.6.0
uses: schneegans/dynamic-badges-action@v1.7.0
with:
auth: ${{ secrets.E2ECOVER_II2DAY }}
gistID: ${{ env.BADGE_E2ECOVER_ID }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-image-base.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ jobs:

- name: Release build ${{ matrix.name }}
if: ${{ env.RUN_EXIST == 'false' }}
uses: docker/build-push-action@v5.0.0
uses: docker/build-push-action@v5.1.0
continue-on-error: false
id: docker_build_release
with:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/call-e2e.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ jobs:
- name: Update Badge
if: ${{ env.RUN_PERFORMANCE_RESULT != '' && inputs.ipfamily == 'dual' && env.PERFORMANCE_BADGE_ID != '' }}
uses: schneegans/dynamic-badges-action@v1.6.0
uses: schneegans/dynamic-badges-action@v1.7.0
with:
auth: ${{ secrets.WELAN_PAT }}
gistID: ${{ env.PERFORMANCE_BADGE_ID }}
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/call-release-image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ jobs:
echo "RUN_IMAGE_SUFFIX=${tmp}-${{ matrix.name }}" >> $GITHUB_ENV
- name: Build Image ${{ matrix.name }} and push
uses: docker/build-push-action@v5.0.0
uses: docker/build-push-action@v5.1.0
if: ${{ env.RUN_PUSH == 'true' }}
id: docker_build_and_push
with:
Expand All @@ -128,7 +128,7 @@ jobs:
RACE=${{ inputs.race }}
- name: Build Image ${{ matrix.name }} and output docker
uses: docker/build-push-action@v5.0.0
uses: docker/build-push-action@v5.1.0
if: ${{ env.RUN_PUSH != 'true' }}
id: docker_build_and_save
with:
Expand Down
19 changes: 18 additions & 1 deletion .github/workflows/call-release-pages.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,25 @@ jobs:
fetch-depth: 0
ref: ${{ env.REF }}

- name: Set main branch docs to dev (latest)
id: main_docs
if: ${{ env.REF == 'main' }}
run: |
pip install mkdocs==1.5.2 mike==1.1.2 mkdocs-material==9.2.8
git config user.email "robot@example.com"
git config user.name "robot"
cp ./docs/mkdocs.yml ./
mike deploy --rebase -b ${{ env.MERGE_BRANCH }} dev -t "dev (${{ env.REF }})"
rm -rf ./site && rm -rf ./mkdocs.yml
git checkout -f ${{ env.MERGE_BRANCH }}
rm -rf ./charts && rm -rf ./index.yaml && rm -rf ./changelogs
tar -czvf ./site.tar.gz *
ls
echo "push document version `dev` from branch ${{ env.REF }}."
- name: Extract Version
id: extract
if: ${{ env.REF != 'main' }}
run: |
if ! grep -E "^[[:space:]]*v[0-9]+.[0-9]+.[0-9]+[[:space:]]*$" VERSION &>/dev/null ; then
echo "not a release version, skip generating doc."
Expand Down Expand Up @@ -82,7 +99,7 @@ jobs:
- name: build doc site
id: build_doc
if: ${{ env.SKIP_ALL_JOB != 'true' }}
if: ${{ env.SKIP_ALL_JOB != 'true' && env.REF != 'main' }}
run: |
git checkout ${{ env.REF }}
ls
Expand Down
43 changes: 29 additions & 14 deletions README-zh_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,31 +10,46 @@

**简体中文** | [**English**](./README.md)

## Introduction
## 介绍

kdoctor 是一个 kubernetes 数据面测试项目,通过压力注入的方式,实现对集群进行功能、性能的主动式巡检
kdoctor 是一个基于主动式压力注入的 Kubernetes 数据面测试组件,对集群进行功能、性能的测试。通过调研和抽象运维人员的常见需求,kdoctor 将网络、存储、应用等运维任务以云原生的方式实现。此外,还采用了基于 CRD 的设计,能够对接观测性组件

传统的集群巡检,通过采集指标、日志、应用状态等信息来确认集群和应用的状态,实现被动式巡检。但是在一些特殊场景下,这种方式可能不能实现预期的巡检目的、时效性、集群范围,运维人员就需要采用手动方式给集群注入一些压力,进行主动式巡检,当集群规模很大、巡检频率高或巡检流程复杂时,手工方式难以持久实施。这些场景包括:
**kdoctor 主要包含以下 3 个类型的任务:**
* [AppHttpHealthy](./docs/reference/apphttphealthy-zh_CN.md): 根据任务配置,使用 HTTP、HTTPS 协议对集群内外指定访问地址进行连通性检查,支持 PUT、GET、POST 等多种请求方式。
* [NetReach](./docs/reference/netreach-zh_CN.md): 根据任务配置对集群内 Pod IP、ClusterIP、NodePort、Loadbalancer IP、Ingress IP, 甚至是 Pod 多网卡、双栈 IP进行连通性巡检。
* [NetDns](./docs/reference/netdns-zh_CN.md): 根据任务配置,对集群内外的指定 DNS Server 进行连通性检测,支持 UDP、TCP、TCP-TLS 协议。

* 部署大规模集群后,希望确认所有节点间 POD 的网络连通性,避免某个节点存在网络故障,发现网络中是否存在偶发丢包问题,而通信渠道非常多,包括 pod IP、clusterIP、nodePort、loadbalancer ip、ingress ip, 甚至是 POD 多网卡、双栈IP
**kdoctor 较传统的测试组件有哪些优势:**
* 通过下发 CRD 配置巡检任务需求,使用者只需要关注巡检目标、巡检频率、发压参数以及期望巡检结果。
* 通过读取任务配置,以 Deployment 或 DaemonSet 的方式运行发压 agent,以达到多台发压机器的效果。
* 根据任务的 spec 配置,使用 default agent 或创建新的 agent 执行任务,以达到资源重复利用和任务资源隔离。
* 绑定相对应的资源目标,如 ingress 、service,每一个 agent Pod 根据任务配置相互访问绑定的资源,根据请求结果得出结论。
* 发压 client 通过性能调优,大大降低了发压请求时的资源消耗。
* 巡检报告通过日志、聚合 API、文件落盘等方式输出。

* 希望主动检测所有节点间上的 POD 能够正常访问 coredns 服务,希望确认 coredns 服务的资源配置和副本数量正确,其服务性能能够支持预期的最大访问量
## 架构

* 磁盘是易耗品,例如 etcd 等应用对磁盘性能是比较敏感的,在日常运维工作中,管理员希望周期地确认所有节点的本地磁盘是正常的,文件读写的吞吐量和延时是符合预期的
<div style="text-align:center">
<img src="./docs/images/arch.png" alt="Your Image Description">
</div>

* 给某个服务主动注入压力,它可能是镜像仓库、mysql 或者 api-server,以配合 BUG 复现,或确认服务性能
组件构成:
* kdoctor controller: 以 Deployment 形式常驻,实施 CR 监控,任务创建,任务报告汇聚等。
* kdoctor agent: 以 Deployment 或 DaemonSet 形式按需动态创建,任务的执行者。

kdoctor 是一个 kubernetes 数据面测试项目,来源于生产运维过程中的实践场景,通过压力注入的方式,实现对集群进行功能、性能的主动式巡检。 kdoctor 可以应用于:
## 快速开始

* 生产环境的部署检查、日常运维等场景,能避免了人工巡检的工作负担。
**安装**
* 参考[安装 kdoctor](./docs/usage/install-zh_CN.md)[kind 快速开始](./docs/usage/get-started-kind-zh_CN.md)

* 能应用 E2E 测试、bug 复现、混沌测试等,减少编程工作。
**开始任务**
* [开始任务 AppHttpHealthy](./docs/usage/apphttphealthy-zh_CN.md)
* [开始任务 NetReach](./docs/usage/netreach-zh_CN.md)
* [开始任务 NetDNS](./docs/usage/netdns-zh_CN.md)

## 架构

## 快速开始
## 参与开发

## 核心功能
可参考 [开发搭建文档](./docs/develop/contributing.md).

## License

Expand Down
45 changes: 28 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,33 +12,44 @@

## Introduction

kdoctor is a cloud native project of data plane test. Through the pressure injection, it realizes the active inspection for the function and performance of the cluster.
kdoctor is a Kubernetes data plane testing component that conducts functional and performance tests on clusters using proactive pressure injection. It addresses the operational needs of network, storage, and applications by adopting a cloud-native approach based on extensive research and abstraction. With its CRD design, kdoctor can seamlessly integrate with observability components.

For the traditional operation and maintenance , the status of clusters and applications is confirmed by collecting information such as metrics, logs, and application status,
which could be called passive inspection. However, in some special scenarios, this method may not meet the expected purpose, timeliness, or cluster range,
administrators need to manually inject some pressure into the cluster and checkout the cluster status, which could be called active inspection.
When the cluster scale is large, or the inspection frequency is high, or the inspection process is complicated, it is hard to implement manually. These scenarios include:
**kdoctor mainly offers three types of tasks:**
* [AppHttpHealthy](./docs/reference/apphttphealthy.md): according to the task configuration, perform connectivity checks using HTTP and HTTPS protocols on specified addresses within or outside the cluster, supporting various request methods such as PUT, GET, and POST.
* [NetReach](./docs/reference/netreach.md): conduct connectivity inspections on Pod IP, ClusterIP, NodePort, LoadBalancer IP, Ingress IP, and even Pods with multiple network interfaces or dual-stack IPs.
* [NetDns](./docs/reference/netdns.md): perform connectivity checks on designated DNS servers within or outside the cluster, supporting UDP, TCP, and TCP-TLS protocols.

* After deploying a large-scale cluster, administrators want to confirm the network connectivity between all nodes, to find out network failures on a certain
node, or occasional packet loss. In addition, there are many communication ways including POD IP, clusterIP, nodePort, loadbalancerIP, ingress, or even POD multiple network interface, dual-stack IP.
**Advantages of kdoctor over traditional testing components:**
* By configuring inspection tasks through CRDs, users only need to focus on the inspection targets, frequency, pressure parameters, and expected results.
* Pressure-injecting agents are dynamically run as Deployments or DaemonSets, achieving the effect of multiple pressure-injecting machines.
* The execution of tasks utilizes default agents or newly created agents based on the task's spec configurations, enabling resource reuse and task resource isolation.
* Agents are bound to corresponding resource targets such as ingress and service. Each agent Pod mutually accesses the bound resources according to the task configuration, deriving conclusions from the request results.
* Through performance optimization, the pressure-injecting client significantly reduces resource consumption during requests.
* Inspection reports can be generated through various means, including logging, aggregated APIs, and file storage.

* It is desired to make sure that PODs on all nodes can access the coredns service, or the resource configuration and the replica number of the coredns are enough to support expected access pressure.
## Architecture

* Disks are consumables and applications like etcd are sensitive to disk performance. In daily maintenance, administrators want to periodically confirm that local disks performance of all nodes are normal.
<div style="text-align:center">
<img src="./docs/images/arch.png" alt="Your Image Description">
</div>

* Actively inject pressure on a service like registry, mysql or api-server, to cooperate with BUG reproduce, or to confirm service performance
Components:
* kdoctor agent: kdoctor controller: a persistent Deployment responsible for CR monitoring, task creation, and task report aggregation.
* kdoctor agent: dynamically created on-demand as Deployments or DaemonSets to execute tasks.

kdoctor is a cloud native project of data plane test, which is derived from practices of the production operation and maintenance. Through the pressure injection, it realizes the active inspection for the function and performance of the cluster. kdoctor can be applied to scenarios:
## Quick Start

* inspection after creating new cluster, daily operation and maintenance.
**Install**
* Refer to [Install kdoctor](./docs/usage/install.md)[kind Quick Start](./docs/usage//get-started-kind.md)

* E2E testing, bug reproduction, chaos testing.
**Task Get Started**
* [AppHttpHealthy Get Started](./docs/usage/apphttphealthy.md)
* [NetReach Get Started](./docs/usage/netreach.md)
* [NetDNS Get Started](./docs/usage/netdns.md)

## Architecture

## Quick Start
## Contribution

## Feature
Refer to the [Contribution doc](./docs/develop/contributing.md).

## License

Expand Down
3 changes: 2 additions & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
v0.2.0
v0.2.1

5 changes: 3 additions & 2 deletions charts/Chart.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
apiVersion: v2
name: kdoctor
home: "https://kdoctor-io.github.io/kdoctor"
icon: https://raw.githubusercontent.com/kdoctor-io/kdoctor/main/docs/images/kdoctor.svg
# application or library
type: application
# no need to modify this version , CI will auto update it with /VERSION
version: 0.2.0
version: 0.2.1
# This field is informational, and has no impact on chart version calculations .
# Leaving it unquoted can lead to parsing issues in some cases
# no need to modify this version , CI will auto update it with /VERSION
appVersion: "0.2.0"
appVersion: "0.2.1"
kubeVersion: ">= 1.16.0-0"
description: kdoctor
sources:
Expand Down
18 changes: 10 additions & 8 deletions charts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,16 @@
| ----------------------------------------------------------------------- | ----------------------------------------------------------------------- | ------------------------------------ |
| `feature.enableIPv4` | enable ipv4 | `true` |
| `feature.enableIPv6` | enable ipv6 | `true` |
| `feature.nethttp_defaultRequest_Qps` | qps for kind nethttp | `10` |
| `feature.nethttp_defaultRequest_MaxQps` | qps for kind nethttp | `100` |
| `feature.nethttp_defaultConcurrency` | concurrency for kind nethttp | `50` |
| `feature.nethttp_defaultMaxIdleConnsPerHost` | max idle connect for kind nethttp | `50` |
| `feature.nethttp_defaultRequest_DurationInSecond` | Duration In Second for kind nethttp | `2` |
| `feature.nethttp_defaultRequest_PerRequestTimeoutInMS` | PerRequest Timeout In MS for kind nethttp | `500` |
| `feature.nethttp_defaultFail_MeanDelayInMs` | mean delay in ms for kind nethttp | `2000` |
| `feature.netdns_defaultConcurrency` | concurrency for kind netdns | `50` |
| `feature.netReachRequestMaxQPS` | qps for kind NetReach | `20` |
| `feature.netReachMaxConcurrency` | concurrency for kind NetReach | `10` |
| `feature.appHttpHealthyMaxConcurrency` | concurrency for kind AppHttpHealthy | `20` |
| `feature.appHttpHealthyRequestMaxQPS` | qps for kind AppHttpHealthy | `100` |
| `feature.netHttpDefaultRequestQPS` | qps for kind NetHttp | `10` |
| `feature.netHttpDefaultMaxIdleConnsPerHost` | max idle connect for kind NetHttp | `50` |
| `feature.netHttpDefaultRequestDurationInSecond` | Duration In Second for kind NetHttp | `2` |
| `feature.netHttpDefaultRequestPerRequestTimeoutInMS` | PerRequest Timeout In MS for kind NetHttp | `500` |
| `feature.netDnsMaxConcurrency` | concurrency for kind NetDns | `20` |
| `feature.netDnsRequestMaxQPS` | qps for kind NetDns | `100` |
| `feature.agentDefaultTerminationGracePeriodMinutes` | agent termination after minutes | `60` |
| `feature.taskPollIntervalInSecond` | the interval to poll the task in controller and agent pod | `5` |
| `feature.multusPodAnnotationKey` | the multus annotation key for ip status | `k8s.v1.cni.cncf.io/networks-status` |
Expand Down
3 changes: 0 additions & 3 deletions charts/crds/kdoctor.io_netdnses.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1132,12 +1132,10 @@ spec:
type: string
durationInSecond:
default: 2
format: int64
minimum: 1
type: integer
perRequestTimeoutInMS:
default: 5
format: int64
minimum: 1
type: integer
protocol:
Expand All @@ -1149,7 +1147,6 @@ spec:
type: string
qps:
default: 5
format: int64
minimum: 1
type: integer
type: object
Expand Down
Loading

0 comments on commit 1a071cf

Please sign in to comment.