Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple clusters management in EKS #616

Merged
merged 13 commits into from
Jul 3, 2019
1 change: 1 addition & 0 deletions deploy/aws/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ credentials/
terraform.tfstate
terraform.tfstate.backup
.terraform.tfstate.lock.info
kubeconfig_*.yaml
142 changes: 126 additions & 16 deletions deploy/aws/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ Before deploying a TiDB cluster on AWS EKS, make sure the following requirements

The default setup will create a new VPC and a t2.micro instance as bastion machine, and an EKS cluster with the following ec2 instances as worker nodes:

* 3 m5d.xlarge instances for PD
* 3 i3.2xlarge instances for TiKV
* 2 c4.4xlarge instances for TiDB
* 1 c5.xlarge instance for monitor
* 3 m5.large instances for PD
* 3 c5d.4xlarge instances for TiKV
* 2 c5.4xlarge instances for TiDB
* 1 c5.2xlarge instance for monitor

Use the following commands to set up the cluster:

Expand Down Expand Up @@ -76,7 +76,7 @@ monitor_endpoint = http://abd299cc47af411e98aae02938da0762-1989524000.us-east-2.
region = us-east-2
tidb_dns = abd2e3f7c7af411e98aae02938da0762-17499b76b312be02.elb.us-east-2.amazonaws.com
tidb_port = 4000
tidb_version = v3.0.0-rc.1
tidb_version = v3.0.0
```

> **Note:** You can use the `terraform output` command to get the output again.
Expand All @@ -86,7 +86,7 @@ tidb_version = v3.0.0-rc.1
To access the deployed TiDB cluster, use the following commands to first `ssh` into the bastion machine, and then connect it via MySQL client (replace the `<>` parts with values from the output):

``` shell
ssh -i credentials/k8s-prod-<cluster_name>.pem ec2-user@<bastion_ip>
ssh -i credentials/<cluster_name>.pem ec2-user@<bastion_ip>
mysql -h <tidb_dns> -P <tidb_port> -u root
```

Expand Down Expand Up @@ -118,25 +118,25 @@ The initial Grafana login credentials are:

To upgrade the TiDB cluster, edit the `variables.tf` file with your preferred text editor and modify the `tidb_version` variable to a higher version, and then run `terraform apply`.

For example, to upgrade the cluster to version 3.0.0-rc.1, modify the `tidb_version` to `v3.0.0-rc.2`:
For example, to upgrade the cluster to version 3.0.0-rc.1, modify the `tidb_version` to `v3.0.0`:

```
variable "tidb_version" {
description = "tidb cluster version"
default = "v3.0.0-rc.2"
default = "v3.0.0"
}
```

> **Note**: The upgrading doesn't finish immediately. You can watch the upgrading process by `kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb --watch`.

## Scale

To scale the TiDB cluster, edit the `variables.tf` file with your preferred text editor and modify the `tikv_count` or `tidb_count` variable to your desired count, and then run `terraform apply`.
To scale the TiDB cluster, edit the `variables.tf` file with your preferred text editor and modify the `default_cluster_tikv_count` or `default_cluster_tidb_count` variable to your desired count, and then run `terraform apply`.

For example, to scale out the cluster, you can modify the number of TiDB instances from 2 to 3:

```
variable "tidb_count" {
variable "default_cluster_tidb_count" {
default = 4
}
```
Expand All @@ -145,7 +145,7 @@ For example, to scale out the cluster, you can modify the number of TiDB instanc

## Customize

You can change default values in `variables.tf` (such as the cluster name and image versions) as needed.
You can change default values in `variables.tf` (such as the default cluster name and image versions) as needed.

### Customize AWS related resources

Expand All @@ -159,12 +159,91 @@ The TiDB version and component count are also configurable in variables.tf, you

Currently, the instance type of TiDB cluster component is not configurable because PD and TiKV relies on [NVMe SSD instance store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html), different instance types have different disks.

### Customize TiDB parameters
### Customize TiDB Cluster

Currently, there are not many customizable TiDB parameters. And there are two ways to customize the parameters:
The values file ([`./tidb-cluster/values/default.yaml`](./tidb-cluster/values/default.yaml)) provide proper default for TiDB cluster in EKS. You can specify an overriding values file in [`clusters.tf`](./clusters.tf) for each TiDB cluster. Values of this file will override the default ones.

* Before deploying the cluster, you can directly modify the `templates/tidb-cluster-values.yaml.tpl` file and then deploy the cluster with customized configs.
* After the cluster is running, you must run `terraform apply` again every time you make changes to the `templates/tidb-cluster-values.yaml.tpl` file, or the cluster will still be using old configs.
For example, the default cluster specify using `./default-cluster.yaml` as the overriding values file, and enable the ConfigMap rollout feature in this file.

In EKS, some values are not customizable as usual, including the cluster version, replicas, node selectors and taints. These variables are controlled by the terraform instead in favor of consistency. To customize these variables, you can edit the [`clusters.tf`](./clusters.tf) and change the variables of each `./tidb-cluster` module directly.

### Customized TiDB Operator

You can customize the TiDB operator by specifying a helm values file through the `operator_values` variable. For example:

```hcl
variable "operator_values" {
description = "The helm values of TiDB Operator"
default = file("operator_values.yaml")
}
```

## Multiple TiDB Cluster Management

An instance of `./tidb-cluster` module corresponds to a TiDB cluster in the EKS cluster. If you want to add a new TiDB cluster, you can edit `./cluster.tf` and add a new instance of `./tidb-cluster` module:

```hcl
module example-cluster {
source = "./tidb-cluster"

# The target EKS, required
eks_info = local.default_eks
# The subnets of node pools of this TiDB cluster, required
subnets = local.default_subnets
# TiDB cluster name, required
cluster_name = "example-cluster"

# Helm values file
override_values = file("example-cluster.yaml")
# TiDB cluster version
cluster_version = "v3.0.0"
# SSH key of cluster nodes
ssh_key_name = module.key-pair.key_name
# PD replica number
pd_count = 3
# TiKV instance type
pd_instance_type = "t2.xlarge"
# TiKV replica number
tikv_count = 3
# TiKV instance type
tikv_instance_type = "t2.xlarge"
# The storage class used by TiKV, if the TiKV instance type do not have local SSD, you should change it to storage class
# TiDB replica number
tidb_count = 2
# TiDB instance type
tidb_instance_type = "t2.xlarge"
# Monitor instance type
monitor_instance_type = "t2.xlarge"
# The version of tidb-cluster helm chart
tidb_cluster_chart_version = "v1.0.0-beta.3"
}

module other-cluster {
source = "./tidb-cluster"

cluster_name = "other-cluster"
override_values = file("other-cluster.yaml")
#......
}
```

> **Note:**
>
> The `cluster_name` of each cluster must be unique.

You can refer to [./tidb-cluster/variables.tf](./tidb-cluster/variables.tf) for the complete configuration reference of `./tidb-cluster` module.

You can get the DNS name of TiDB service and grafana service via kubectl. If you want terraform to print these information like the `default-cluster`, you can add `output` sections in `outputs.tf`:

```hcl
output "example-cluster_tidb-dns" {
value = module.example-cluster.tidb_dns
}

output "example-cluster_monitor-dns" {
value = module.example-cluster.monitor_dns
}
```

## Destroy

Expand All @@ -174,4 +253,35 @@ It may take some while to finish destroying the cluster.
$ terraform destroy
```

> **Note:** You have to manually delete the EBS volumes in AWS console after running `terraform destroy` if you do not need the data on the volumes anymore.
> **Note:**
>
> This will destroy your EKS cluster along with all the TiDB clusters you deployed on it.

> **Note:**
>
> You have to manually delete the EBS volumes in AWS console after running terraform destroy if you do not need the data on the volumes anymore.

## Advanced Guide: Use the tidb-cluster and tidb-operator Modules

Under the hood, this terraform module composes two sub-modules:

- [tidb-operator](./tidb-operator/README.md), which provisions the Kubernetes control plane for TiDB cluster
- [tidb-cluster](./tidb-cluster/README.md), which provisions a TiDB cluster in the target Kubernetes cluster

You can use these modules separately in your own terraform scripts, by either referencing these modules locally or publish these modules to your terraform module registry.

For example, let's say you create a terraform module in `/deploy/aws/staging`, you can reference the tidb-operator and tidb-cluster modules as following:

```hcl
module "setup-control-plane" {
source = "../tidb-operator"
}

module "tidb-cluster-a" {
source = "../tidb-cluster"
}

module "tidb-cluster-b" {
source = "../tidb-cluster"
}
```
43 changes: 43 additions & 0 deletions deploy/aws/aws-key-pair/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
locals {
public_key_filename = "${var.path}/${var.name}.pub"
private_key_filename = "${var.path}/${var.name}.pem"
}

resource "tls_private_key" "generated" {
algorithm = "RSA"
}

resource "aws_key_pair" "generated" {
key_name = var.name
public_key = tls_private_key.generated.public_key_openssh

lifecycle {
ignore_changes = [key_name]
}
}

resource "local_file" "public_key_openssh" {
count = var.path != "" ? 1 : 0
content = tls_private_key.generated.public_key_openssh
filename = local.public_key_filename
}

resource "local_file" "private_key_pem" {
count = var.path != "" ? 1 : 0
content = tls_private_key.generated.private_key_pem
filename = local.private_key_filename
}

resource "null_resource" "chmod" {
count = var.path != "" ? 1 : 0
depends_on = [local_file.private_key_pem]

triggers = {
key = tls_private_key.generated.private_key_pem
}

provisioner "local-exec" {
command = "chmod 600 ${local.private_key_filename}"
}
}

20 changes: 20 additions & 0 deletions deploy/aws/aws-key-pair/outputs.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
output "key_name" {
value = aws_key_pair.generated.key_name
}

output "public_key_openssh" {
value = tls_private_key.generated.public_key_openssh
}

output "private_key_pem" {
value = tls_private_key.generated.private_key_pem
}

output "public_key_filepath" {
value = local.public_key_filename
}

output "private_key_filepath" {
value = local.private_key_filename
}

8 changes: 8 additions & 0 deletions deploy/aws/aws-key-pair/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
variable "name" {
description = "Unique name for the key, should also be a valid filename. This will prefix the public/private key."
}

variable "path" {
description = "Path to a directory where the public and private key will be stored."
default = ""
}
4 changes: 4 additions & 0 deletions deploy/aws/aws-key-pair/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@

terraform {
required_version = ">= 0.12"
}
17 changes: 8 additions & 9 deletions deploy/aws/aws-tutorial.tfvars
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
pd_instance_type = "c5d.large"
tikv_instance_type = "c5d.large"
tidb_instance_type = "c4.large"
monitor_instance_type = "c5.large"
default_cluster_pd_instance_type = "c5d.large"
default_cluster_pd_tikv_instance_type = "c5d.large"
default_cluster_tidb_instance_type = "c4.large"
default_cluster_monitor_instance_type = "c5.large"

pd_count = 1
tikv_count = 1
tidb_count = 1
default_cluster_pd_count = 1
default_cluster_tikv_count = 1
default_cluster_tidb_count = 1

cluster_name = "aws_tutorial"
tikv_root_volume_size = "50"
default_cluster_cluster_name = "aws-tutorial"
33 changes: 33 additions & 0 deletions deploy/aws/bastion.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
resource "aws_security_group" "ssh" {
name = "${var.eks_name}-bastion"
description = "Allow SSH access for bastion instance"
vpc_id = var.create_vpc ? module.vpc.vpc_id : var.vpc_id
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = var.bastion_ingress_cidr
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}

module "ec2" {
source = "terraform-aws-modules/ec2-instance/aws"

version = "2.3.0"
name = "${var.eks_name}-bastion"
instance_count = var.create_bastion ? 1 : 0
ami = data.aws_ami.amazon-linux-2.id
instance_type = var.bastion_instance_type
key_name = module.key-pair.key_name
associate_public_ip_address = true
monitoring = false
user_data = file("bastion-userdata")
vpc_security_group_ids = [aws_security_group.ssh.id]
subnet_ids = local.default_subnets
}
1 change: 0 additions & 1 deletion deploy/aws/charts/tidb-cluster

This file was deleted.

1 change: 0 additions & 1 deletion deploy/aws/charts/tidb-operator

This file was deleted.

Loading