This terraform script provisions infra resources for my personal website, as well as other projects.
This repository is part of my personal website project. Also see other repositories:
- Iriversland2 SPA: the frontend code base, using Angular.
- Iriversland2 Backend API: the backend RESTful API in Django.
- Iriversland2 Kubernetes: (this repository) infrastructure as code provisioning the Kubernetes cluster for the backend server.
- Kafka Connect CDC: the repository for Kafka Connect docker image used for real-time, change-data-capture (CDC) sync between postgres and elasticsearch.
The CircleCI for this repo dockerizes this repo as image and is for use of other projects as a base image (mainly in their CircleCI jobs) to run terraform and k8 commands.
This repository provisions the entire Kubernetes cluster as below in the image. We use Terraform to do so, and this repository serves as the big "Terraform" base in the image. It creates the infrastructure for other projects, and forms an ecosystem on the cloud, enabling me to quickly deploy production-ready, highly available, scalable services.
- Install
brew terraform kubernetes-cli helm
- Terraform (version 0.12.6)
- Kubernetes CLI (version v1.15.2) for kubernetes CRD resources management support
- Helm CLI (version v2.16.1) for helm release resources management support
- Optionally install
brew install doctl
for the digitalocean cli tool- To let
doctl
generatekubeconfig.yaml
, rundoctl k8s cluster kubeconfig show project-shaungc-digitalocean-cluster-<random> > kubeconfig.yaml
- To let
- Run
export KUBECONFIG=kubeconfig.yaml
. The terraform providerkubernetes
will need this file present.
Optional, nice to have (useful for debug):
- Install
brew install doctl
, the cli tool for digitalocean.- Initialize auths for do, like
doctl auth init
- See DigitalOcean Doc: download the k8 credential (yaml).
- Initialize auths for do, like
- Provide the credentials, create the files below in the project root directory:
Create backend-credentials.tfvars
: we use S3 for terraform remote state storage backend, specify the following content in the file:
access_key = "XXXXXX"
secret_key = "XXXXXXXXYYYYYYYYZZZZZZZ"
region = "aws-region-x"
Create credentials.auto.tfvars
: specify the following:
do_token = "digitalocean-token"
aws_access_key = "AWS key"
aws_secret_key = "AWS secret"
aws_region = "AWS region like us-east-2"
docker_email = "Dockerhub email"
docker_password = "Dockerhub password"
Create local.auto.tfvars
. This file is to avoid having to manually input these values every time when you run terraform plan / apply / destroy
. You can specify:
project_name = "your-project-name, will be prefixed to resources"
letsencrypt_env = "either prod or staging"
app_container_image_tag = "the-tag-when-you-docker-build"
-
Run
init-backend-local.sh
to initialize terraform. Avoid running terraform init yourself. -
Make some changes if needed
-
If changes involve TLS / Cert Manager, please refer to the section
Terraform: Lifecycle of TLS / Cert Manager / Let's Encrypt Resources
underPitfalls and Known Issues
below. -
Run
python release.py [options]
to auto populate required image tags variables.- Run
python release.py -p
to get the plan. - If you make changes to tf files instead of image tag (e.g. for kafka, redis, postgres, etc),
python release.py
might not run since it will only run when detected change in image tag. To apply changes for tf files, force the change bypython release.py -f
. - For other options, please see
release.py
.
- Run
-
After commit, pushed changes to
master
branch, if confident with the change, then it's time for sync for release and destroy branch. Rungco release && git merge master && gp && gco destroy-release && git merge master && gp && gco master
.
Use python release.py ...
to update microservice deployment using the docker build hash.
For example, to update a new build for appl tracky, run python release.py -at <new hash here>
.
To add a new microservice to be supported by the release script, add a new entry in MANIFEST_IMAGE_TAGS
in release.py
.
When error occurs in terraform apply
or terraform destroy
and it's hard to fix the issue, the last resort could be nuking the entire kubernetes cluster and start over. The process is simple:
- Delete tfstate file in s3. Currently we have script
clean-s3-tf-state.sh
to do this. - Delete k8s cluster in digitalocean
- Delete firewall in digitalocean
- Delete volumes in digitalocean
Check out the issue page for ongoing progress. Below talks about the achieved goals.
What we've done, and how we set it up. Two phases:
- Make sure this Terraform repo is working for us - Running only terraform in circle ci
- Created a
config.yaml
for Circle CI for this repo. - As long as you make sure to pass credentials to circleci by env var (set via circleci web UI), you should be able to access those env var in jobs, and so to complete the terraform apply.
- The
config.yaml
only doesterraform plan
and will not make any actual changes by default. Passing the Circle CI build roughly verifies that Terraform backend, all the variables including credentials and the resources are working great.
- Created a
- Running both your app build (in another repo) AND terraform (this repo) in Circle CI, sequentially.
- Refer to CircleCI doc: sharing data among jobs.
- Context: Continuous deployment often comes with two big parts: docker build, then deploy. This Terraform only handles the deploy part, and needs you to porvide a docker build image tag, as a terraform variable input. The image tag is the same that AWS CodePipeline refer to as "artifacts".
- Complete CI/CD automation: in order to have CircleCI automate build and deploy for us, we need to combine them into one
config.yaml
. The basic idea: you have two repository, one for your app containingDockerfile
, another is this Terraform repo. You choose either one to put CircleCI'sconfig.yaml
to start the automation, and in the CircleCI job you retrieve another repo's code by either git clone or docker pull, so you have access to both repos in a single CircleCIconfig.yaml
. - You can refer to the example in iriversland2-public and look at the CircleCI
config.yaml
. It docker builds the app, push image to registry, then use Terraform script (from this repo) to update K8 resources and deploy changes to app on K8 cluster.
-
We setup certificate for the rx domain name, using helm, Jetstack cert-manager and letsencrypt. For ingress resources we use nginx ingress.
-
Configured to use dns01 challenge, so we can register a wildcard certificate, and don't have to worry about certificate when creating other apps/services on subdomains on this K8 cluster.
TL;DR Conclusion: before Terraform provides robust K8 CRD resources support, we will use a "always re-create" mechanism by local-exec
when dealing with changes.
Due to the lack of support for CRD (K8 custom resource) in Terraform, we are using null_resource
and provisioner local-exec
together to provision custom resources like ClusterIssuer
.
null_resource
does not have much option when dealing with change - currently only creation and destroy, but not modify. To guarantee resources are in the right state, we put all dependencies in the trigger block of null_resource
. How trigger works is quite rigid at this point (Terraform v0.12.6): whenever any of these values in trigger block change, it will always do re-create, i.e., destroy then create: run the provisioners commands w/ when = destroy
(Destroy Provisioners), then run the provisioners w/o when = destroy
(Creation Provisioners). This is far from ideal, but at least this makes sure our local-exec
approach is reflecting any change correctly.
However, Let's Encrypt, the certificate issuer, has a pretty strict rate limit on requesting production certificate. Changes like ClusterIssuer
's name are defintely not worth of requesting a new certificate, and should just run the creation provisioners (kubectl apply
) w/o running the destroy provisioners (kubectl delete
) beforehand. These changes should be avoid, or at least one has to be aware of Let's Encrypt rate limit. You can always check how many certificate you have requested so far. Or, use the tool lectl
suggested in this post.
Still, there are changes that indeed need a certificate renewal. e.g., changes in Let's Encrypt API endpoint (most likely due to version update), tls block in ingress resource, as well as aws credentials for the route53 dns challenge. Luckily, these changes are not likely to happen frequently. Using the current approach, you will change the variable values, then the local-exec
will handle the rest for you.
- For useful K8 commands, and debugging for TLS or
cert-manager
issues, see the TLS Debug README.- The README includes commands to monitor ingress and
cert-manager
controller logs in realtime. The scriptThe script is deprecated. Do not run the script w/o inspecting what the script does first.. ./cert_resources_reset_interactive.sh
provides an interactive way to verify the TLS is correctly set up.- The issue where we add TLS to our domain.
- The issue where we add another micro service domain in.
- The README includes commands to monitor ingress and
It builds a base image for other project's CircleCI jobs to run the terraform scripts included in this repository. It also does terraform plan
to provide a preliminary test on the terraform script.
Currently in config.yaml
, several parts are commented and disabled to minimize accidental changes to the infra, but you should consider uncomment them in the following cases:
- When you make changes to Terraform script or Dockerfile: uncomment the docker build part. Once the image published to Dockerhub, you shuold re-comment them.
- When you want to run Terraform to provision infra on CircleCI, but skipping app's build process: uncomment the
Test Terraform Apply
step.- You should manually change the
-var=...
forletsencrypt_env
andapp_container_image_tag
. - Once the infra is provisioned corrently, you should re-comment this step.
- You should manually change the
- Have to use
.id
for k8 tags, otherwise if only "dot" to the resource name, it's the whole resource which cannot be used for specifying a tag. You either use.name
or.id
. The Medium post uses.id
.
When Terraform error, remember that Terraform will keep all successfully created resources in track and won't re-create them next time (assume that you make no changes). You have two choice as below.
- Use
terraform apply
to continue working on the rest of the provisioning. - Use
terraform destroy
to undo all created resources.
If you need any help or have any question about this repo, feel free to shoot me a message by visiting my website (hosted on K8 DO and provisioned by this repo's tf!) and fill out the contact form at the bottom of the home page.
- Our migrating database guide
- Kubernetes and
kubectl
- Setting
--kubeconfig
. (K8 official)
- Setting
- Route53 Console
- CircleCI Console for Terraform Provisioning
- CircleCI Console for Iriversland
- Advance Terraform syntax. (Gruntwork)