Suspend `ReplaceUnhealthy` process for AWS tikv auto-scaling-group #962

tennix · 2019-09-27T10:35:02Z

Feature Request

Is your feature request related to a problem? Please describe:

The AWS Terraform script uses auto-scaling-group for all components (pd/tikv/tidb/monitor), when an ec2 instance fails the health check, the ec2 instance will be replaced. This is helpful for stateless applications or applications using EBS volumes to store data.

But TiKV pod uses instance store to store its data. When the instance is replaced, all the data on the instance store will be lost. TiKV has to resync all data to the newly added instance. Though TiDB is a distributed database and can work when a node fails, the cost to resync data is quite big if the dataset is large. Besides, the ec2 instance may recover to a healthy status by rebooting.

So for TiKV it's preferred to disable the auto-scaling-group's replace behavior.

Auto-scaling-group scaling process can be suspended and resumed according to its documentation. And Terraform also supports setting this field for auto-scaling-group.

tennix added the cloud/aws Amazon Web Services label Oct 11, 2019

aylei self-assigned this Oct 15, 2019

aylei mentioned this issue Oct 15, 2019

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-group #1014

Merged

cofyc closed this as completed in #1014 Oct 18, 2019

sre-bot mentioned this issue Oct 18, 2019

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-grou… #1027

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspend `ReplaceUnhealthy` process for AWS tikv auto-scaling-group #962

Suspend `ReplaceUnhealthy` process for AWS tikv auto-scaling-group #962

tennix commented Sep 27, 2019

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-group #962

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-group #962

Comments

tennix commented Sep 27, 2019

Feature Request

Suspend `ReplaceUnhealthy` process for AWS tikv auto-scaling-group #962

Suspend `ReplaceUnhealthy` process for AWS tikv auto-scaling-group #962