Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-group #962

Closed
tennix opened this issue Sep 27, 2019 · 0 comments · Fixed by #1014
Closed

Suspend ReplaceUnhealthy process for AWS tikv auto-scaling-group #962

tennix opened this issue Sep 27, 2019 · 0 comments · Fixed by #1014
Assignees
Labels
cloud/aws Amazon Web Services

Comments

@tennix
Copy link
Member

tennix commented Sep 27, 2019

Feature Request

Is your feature request related to a problem? Please describe:

The AWS Terraform script uses auto-scaling-group for all components (pd/tikv/tidb/monitor), when an ec2 instance fails the health check, the ec2 instance will be replaced. This is helpful for stateless applications or applications using EBS volumes to store data.

But TiKV pod uses instance store to store its data. When the instance is replaced, all the data on the instance store will be lost. TiKV has to resync all data to the newly added instance. Though TiDB is a distributed database and can work when a node fails, the cost to resync data is quite big if the dataset is large. Besides, the ec2 instance may recover to a healthy status by rebooting.

So for TiKV it's preferred to disable the auto-scaling-group's replace behavior.

Auto-scaling-group scaling process can be suspended and resumed according to its documentation. And Terraform also supports setting this field for auto-scaling-group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cloud/aws Amazon Web Services
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants