feat: support specified instance scale down #6958

free6om · 2024-04-02T11:33:59Z

Use Cases

Node Failure

When a physical fault occurs on a specific node, it is necessary to rebuild a replica and subsequently take the affected pod on that node offline.

Data Corruption

When the data of a particular pod is corrupted, it is necessary to rebuild a replica and subsequently take the affected pod offline.

Instance Unavailability

When a pod experiences availability issues such as slow or unresponsive behavior, the best practice is to create a new replica and subsequently take the affected pod offline.

Cluster API

Add the OfflineInstances field to spec.componentSpecs in the Cluster API to describe the instances to be taken offline.

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
# ...
spec:
  componentSpecs:
  - name: "foo"
    offlineInstances: [ "foo-2", "foo-3"]
# ...

OpsRequest API

Add the offlineInstances field to Ops to override the field in the Cluster.

apiVersion: apps.kubeblocks.io/alpha1
kind: OpsRequest
# ...
spec:
  # ...
  horizontalScaling:
  - componentName: "foo"
    replicas: 2
    offlineInstances:
    - "foo-2"
    - "foo-3"
# ...

Test

Case 1: Specify Instance Offline

Create a 3-instance cluster and use Ops to specify taking the instance with ordinal 1 offline.
Expected result:

The Cluster Spec should include OfflineInstances:

apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
# ...
spec:
  componentSpecs:
  - name: "foo"
    offlineInstances: ["foo-1"]
# ...

The RSM should actually generate 2 instances with ordinal 0 and 2.

apis/apps/v1alpha1/cluster_types.go

controllers/apps/operations/horizontal_scaling.go

apis/apps/v1alpha1/cluster_types.go

codecov · 2024-04-07T15:05:33Z

Codecov Report

Attention: Patch coverage is 65.01767% with 99 lines in your changes are missing coverage. Please review.

Project coverage is 65.77%. Comparing base (1b6ef23) to head (1d29e88).
Report is 4 commits behind head on main.

Files	Patch %	Lines
pkg/controller/rsm2/instance_util.go	78.76%	20 Missing and 11 partials ⚠️
pkg/controller/component/rsm_convertor.go	0.00%	29 Missing ⚠️
controllers/apps/operations/horizontal_scaling.go	60.00%	14 Missing and 4 partials ⚠️
pkg/controller/rsm2/reconciler_revision_update.go	56.52%	6 Missing and 4 partials ⚠️
...g/controller/rsm2/reconciler_instance_alignment.go	80.00%	2 Missing and 2 partials ⚠️
pkg/controller/rsm2/reconciler_update.go	50.00%	3 Missing and 1 partial ⚠️
pkg/controller/builder/builder_component.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6958      +/-   ##
==========================================
- Coverage   65.96%   65.77%   -0.19%     
==========================================
  Files         340      340              
  Lines       41356    41391      +35     
==========================================
- Hits        27279    27225      -54     
- Misses      11754    11835      +81     
- Partials     2323     2331       +8

Flag	Coverage Δ
unittests	`65.77% <65.01%> (-0.19%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

wangyelei · 2024-04-08T02:19:53Z

apis/apps/v1alpha1/cluster_types.go

+	//
+	// The sum of replicas across all InstanceTemplates should not exceed the total number of Replicas specified for the Component.
+	// Any remaining replicas will be generated using the default template and will follow the default naming rules.
+	//
 	// +optional
 	Instances []InstanceTemplate `json:"instances,omitempty"`


you add patchStrategy:"merge,retainKeys" patchMergeKey:"name" to verify that template name is the unique key

fixed in: 1d29e88

apis/apps/v1alpha1/opsrequest_types.go

free6om · 2024-04-08T07:10:02Z

/cherry-pick release-0.9

github-actions · 2024-04-08T07:10:23Z

🤖 says: cherry pick action finished successfully 🎉!
See: https://github.com/apecloud/kubeblocks/actions/runs/8595993678

(cherry picked from commit f88b651)

free6om added 10 commits March 27, 2024 21:24

feat: support ordinal start in instance template

1fd8e7a

hscale ops done

6c82b2c

fix broken ut

91943bb

fix broken ut

c233f8c

remove validation rule

ea1ef7d

simplify InstanceTemplate API

ae2ebeb

support offline filed

950fdc7

fix broken ut

bf78593

fix broken ut

3cf90b5

fix broken ut

9dca4d0

free6om added this to the Release 0.9.0 milestone Apr 2, 2024

free6om self-assigned this Apr 2, 2024

free6om requested review from nayutah, ldming, heng4fun, wangyelei and Y-Rookie as code owners April 2, 2024 11:34

apecloud-bot added ci feature labels Apr 2, 2024

github-actions bot added the size/XXL Denotes a PR that changes 1000+ lines. label Apr 2, 2024

nayutah reviewed Apr 3, 2024

View reviewed changes

apis/apps/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

controllers/apps/operations/horizontal_scaling.go Outdated Show resolved Hide resolved

controllers/apps/operations/horizontal_scaling.go Outdated Show resolved Hide resolved

nayutah reviewed Apr 3, 2024

View reviewed changes

apis/apps/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

free6om added 8 commits April 3, 2024 17:16

make InstanceTemplate name required

d9f4e86

WIP

37dcb23

update API

eeaba53

update doc

b08c65c

workload done

1b6c592

ops done

3b78e56

fix lint error

f8632cc

make manifests

022a484

free6om added 6 commits April 7, 2024 19:29

Merge branch 'main' into support/specified-pod-scale-in

995cdac

fix empty ns

136f6cc

rename replica to instance

6790fea

fix revisions

fe95826

fix rsm ready status

ea22206

Merge branch 'main' into support/specified-pod-scale-in

7181c7f

current instances

4ca635d

nayutah approved these changes Apr 8, 2024

View reviewed changes

wangyelei reviewed Apr 8, 2024

View reviewed changes

free6om added 2 commits April 8, 2024 14:28

move OfflineInstances to Spec

12a1832

add mergeKey

1d29e88

wangyelei approved these changes Apr 8, 2024

View reviewed changes

free6om merged commit f88b651 into main Apr 8, 2024
57 checks passed

free6om deleted the support/specified-pod-scale-in branch April 8, 2024 07:09

github-actions bot pushed a commit that referenced this pull request Apr 8, 2024

feat: support specified instances scale down (#6958)

63a15fc

(cherry picked from commit f88b651)

free6om mentioned this pull request Apr 10, 2024

[Features] support hscale with --offlineinstances apecloud/kbcli#312

Open

TalktoCrystal pushed a commit that referenced this pull request Apr 11, 2024

feat: support specified instances scale down (#6958)

c255c54

This was referenced Apr 15, 2024

[BUG]kb crash after stop cluster after update monitor #6922

Closed

[BUG]Hscale out pika-group ops is always Running for pods are not ready in Components: [etcd] #6998

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support specified instance scale down #6958

feat: support specified instance scale down #6958

free6om commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 7, 2024 •

edited

Loading

wangyelei Apr 8, 2024 •

edited

Loading

free6om Apr 8, 2024

free6om commented Apr 8, 2024

github-actions bot commented Apr 8, 2024

feat: support specified instance scale down #6958

feat: support specified instance scale down #6958

Conversation

free6om commented Apr 2, 2024 • edited Loading

Use Cases

Node Failure

Data Corruption

Instance Unavailability

Cluster API

OpsRequest API

Test

Case 1: Specify Instance Offline

codecov bot commented Apr 7, 2024 • edited Loading

Codecov Report

wangyelei Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

free6om Apr 8, 2024

Choose a reason for hiding this comment

free6om commented Apr 8, 2024

github-actions bot commented Apr 8, 2024

free6om commented Apr 2, 2024 •

edited

Loading

codecov bot commented Apr 7, 2024 •

edited

Loading

wangyelei Apr 8, 2024 •

edited

Loading