Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gke terraform and guide #493

Merged
merged 22 commits into from
May 23, 2019
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
77ca582
Initial progress
jlerche May 10, 2019
2b0224c
Adds infrastructure
jlerche May 11, 2019
e073432
Some changes to main.tf and adds manifests and charts
jlerche May 14, 2019
9283c8c
Final touches, everything works
jlerche May 14, 2019
d8d07c9
Adds readme
jlerche May 14, 2019
9fd1927
Cleans up README a little
jlerche May 14, 2019
35b9056
Adds firewall rule to allow ssh from bastion to GKE nodes
jlerche May 15, 2019
adb5854
Fixes name of ssh from bastion firewall rule
jlerche May 15, 2019
cad1678
Adds tweaks and formatting
jlerche May 15, 2019
dfb028b
Adds reclaim policy to delete disks
jlerche May 15, 2019
c9bcbaa
Updates readme to reflect disk reclaim policy
jlerche May 15, 2019
5f300b9
Adds on-destroy to change persistent volume claimpolicy to delete
jlerche May 15, 2019
b73876b
Removes superfluous command changing reclaim policy
jlerche May 17, 2019
f428090
Adds monitor node count variable
jlerche May 17, 2019
25e0d2e
Adds startup script daemonset to properly change open fd on pd and ti…
jlerche May 18, 2019
157f039
Streamlines startup daemonset and adds Linux Guest Environment instal…
jlerche May 18, 2019
2b145ab
Adds note about default set up exceeding GCP default cpu quota
jlerche May 21, 2019
1c5b36d
Adds link to GCP quota documentation page
jlerche May 21, 2019
88f32c6
Fixes formatting
jlerche May 21, 2019
843d3e9
Adds clarifications to README, adds tidb and monitor port to outputs
jlerche May 21, 2019
18cb957
Adds section on how to delete nodes that are automatically replicated…
jlerche May 22, 2019
16a163b
Merge branch 'master' into gke_terraform_and_guide
tennix May 23, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions deploy/gcp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
.terraform
*.tfstate*
credentials
rendered
126 changes: 126 additions & 0 deletions deploy/gcp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Deploy TiDB Operator and TiDB cluster on GCP GKE

## Requirements:
* [gcloud](https://cloud.google.com/sdk/install)
* [terraform](https://www.terraform.io/downloads.html)
* [kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl) >= 1.11
* [helm](https://github.com/helm/helm/blob/master/docs/install.md#installing-the-helm-client) >= 2.9.0
* [jq](https://stedolan.github.io/jq/download/)

## Configure gcloud

https://cloud.google.com/sdk/docs/initializing

## Setup

The default setup will create a new VPC, two subnetworks, and an f1-micro instance as a bastion machine. The GKE cluster is created with the following instance types as worker nodes:
jlerche marked this conversation as resolved.
Show resolved Hide resolved

* 3 n1-standard-4 instances for PD
* 3 n1-highmem-8 instances for TiKV
* 3 n1-standard-16 instances for TiDB
* 3 n1-standard-2 instances for monitor
jlerche marked this conversation as resolved.
Show resolved Hide resolved

> *NOTE*: The number of nodes created depends on how many availability zones there are in the chosen region. Most have 3 zones, but us-central1 has 4. See https://cloud.google.com/compute/docs/regions-zones/ for more information. Please refer to the `Customize` section for information on how to customize node pools in a regional cluster.

> *NOTE*: The default setup, as listed above, will exceed the default CPU quota of a GCP project. To increase your project's quota, please follow the instructions [here](https://cloud.google.com/compute/quotas). The default setup will require at least 91 CPUs, more if you need to scale out.

The terraform script expects three environment variables. You can let Terraform prompt you for them, or `export` them ahead of time. If you choose to export them, they are:

* `TF_VAR_GCP_CREDENTIALS_PATH`: Path to a valid GCP credentials file. It is generally considered a good idea to create a service account to be used by Terraform. See [this page](https://cloud.google.com/iam/docs/creating-managing-service-accounts) for more information on how to manage them. See [this page](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) for creating and managing service account keys which, when downloaded, will be the needed credentials file.
* `TF_VAR_GCP_REGION`: The region to create the resources in, for example: `us-west1`
* `TF_VAR_GCP_PROJECT`: The name of the GCP project



The service account should have sufficient permissions to create resources in the project. The `Project Editor` primitive will accomplish this.

If the GCP project is new, make sure the relevant APIs are enabled:

```bash
gcloud services enable cloudresourcemanager.googleapis.com && \
gcloud services enable cloudbilling.googleapis.com && \
gcloud services enable iam.googleapis.com && \
gcloud services enable compute.googleapis.com && \
gcloud services enable container.googleapis.com
```

Now we can launch the script:

```bash
git clone --depth=1 https://github.com/pingcap/tidb-operator
cd tidb-operator/deploy/gcp
terraform init
terraform apply
```

After `terraform apply` is successful, the TiDB cluster can be accessed by SSHing into the bastion machine and connecting via MySQL:
```bash
gcloud compute ssh bastion --zone <zone>
mysql -h <tidb_ilb_ip> -P 4000 -u root
```

It is possible to interact with the cluster using `kubectl` and `helm` with the kubeconfig file `credentials/kubeconfig_<cluster_name>`. The default `cluster_name` is `my-cluster`, it can be changed in `variables.tf`
```bash
# By specifying --kubeconfig argument
kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb
helm --kubeconfig credentials/kubeconfig_<cluster_name> ls

# Or setting KUBECONFIG environment variable
export KUBECONFIG=$PWD/credentials/kubeconfig_<cluster_name>
kubectl get po -n tidb
helm ls
```

When done, the infrastructure can be torn down by running `terraform destroy`


jlerche marked this conversation as resolved.
Show resolved Hide resolved
## Upgrade TiDB cluster

To upgrade TiDB cluster, modify `tidb_version` variable to a higher version in variables.tf and run `terraform apply`.

> *Note*: The upgrading doesn't finish immediately. You can watch the upgrading process by `watch kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb`

## Scale TiDB cluster

To scale TiDB cluster, modify `tikv_count`, `tikv_replica_count`, `tidb_count`, and `tidb_replica_count` to your desired count, and then run `terraform apply`.

> *Note*: Currently, scaling in is not supported since we cannot determine which node to remove. Scaling out needs a few minutes to complete, you can watch the scaling out by `watch kubectl --kubeconfig credentials/kubeconfig_<cluster_name> get po -n tidb`

> *Note*: Incrementing the node count will create a node per GCP availability zones.

## Customize

### Customize GCP resources

GCP allows attaching a local SSD to any instance type that is `n1-standard-1` or greater. This allows for good customizability.

### Customize TiDB Parameters

Currently, there are not too many parameters exposed to be customized. However, you can modify `templates/tidb-cluster-values.yaml.tpl` before deploying. If you modify it after the cluster is created and then run `terraform apply`, it will not take effect unless the pod(s) is manually deleted.

### Customizing node pools
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Customizing node pools
### Removing unused instances


The cluster is created as a regional, as opposed to a zonal cluster. This means that GKE will replicate node pools to each availability zone. This is desired to maintain high availability, however for the monitoring services, like Grafana, this is potentially unnecessary. It is possible to manually remove nodes if desired via `gcloud`.

> *NOTE*: GKE node pools are managed instance groups, so a node deleted by `gcloud compute instances delete` will be automatically recreated and added back to the cluster.

Suppose we wish to delete a node from the monitor pool, we can do
```bash
$ gcloud compute instance-groups managed list | grep monitor
```
and the result will be something like this
```bash
gke-my-cluster-monitor-pool-08578e18-grp us-west1-b zone gke-my-cluster-monitor-pool-08578e18 0 0 gke-my-cluster-monitor-pool-08578e18 no
gke-my-cluster-monitor-pool-7e31100f-grp us-west1-c zone gke-my-cluster-monitor-pool-7e31100f 1 1 gke-my-cluster-monitor-pool-7e31100f no
gke-my-cluster-monitor-pool-78a961e5-grp us-west1-a zone gke-my-cluster-monitor-pool-78a961e5 1 1 gke-my-cluster-monitor-pool-78a961e5 no
```
The first column is the name of the managed instance group, and the second column is the zone it was created in. We will also need the name of the instance in that group, we can get it as follows
```bash
$ gcloud compute instance-groups managed list-instances gke-my-cluster-monitor-pool-08578e18-grp --zone us-west1-b
NAME ZONE STATUS ACTION INSTANCE_TEMPLATE VERSION_NAME LAST_ERROR
gke-my-cluster-monitor-pool-08578e18-c7vd us-west1-b RUNNING NONE gke-my-cluster-monitor-pool-08578e18
```
Now we can delete the instance
```bash
$ gcloud compute instance-groups managed delete-instances gke-my-cluster-monitor-pool-08578e18-grp --instances=gke-my-cluster-monitor-pool-08578e18-c7vd --zone us-west1-b
```
1 change: 1 addition & 0 deletions deploy/gcp/charts/tidb-cluster
1 change: 1 addition & 0 deletions deploy/gcp/charts/tidb-operator
31 changes: 31 additions & 0 deletions deploy/gcp/data.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
data "template_file" "tidb_cluster_values" {
template = "${file("${path.module}/templates/tidb-cluster-values.yaml.tpl")}"

vars {
cluster_version = "${var.tidb_version}"
pd_replicas = "${var.pd_replica_count}"
tikv_replicas = "${var.tikv_replica_count}"
tidb_replicas = "${var.tidb_replica_count}"
operator_version = "${var.tidb_operator_version}"
}
}

data "external" "tidb_ilb_ip" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-tidb -o json | jq '.status.loadBalancer.ingress[0]'"]
}

data "external" "monitor_ilb_ip" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-grafana -o json | jq '.status.loadBalancer.ingress[0]'"]
}

data "external" "tidb_port" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-tidb -o json | jq '.spec.ports | .[] | select( .name == \"mysql-client\") | {port: .port|tostring}'"]
}

data "external" "monitor_port" {
depends_on = ["null_resource.deploy-tidb-cluster"]
program = ["bash", "-c", "kubectl --kubeconfig ${local.kubeconfig} get svc -n tidb tidb-cluster-grafana -o json | jq '.spec.ports | .[] | select( .name == \"grafana\") | {port: .port|tostring}'"]
}
Loading