This tutorial walks through provisioning a highly-available HashiCorp Vault cluster on Google Kubernetes Engine using HashiCorp Terraform as the provisioning tool.
This tutorial is based on Kelsey Hightower's Vault on Google Kubernetes Engine, but focuses on codifying the steps in Terraform instead of teaching you them individually. If you would like to know how to provision HashiCorp Vault on Kuberenetes step-by-step (aka "the hard way"), please follow Kelsey's repository instead.
This version of the configurations work with Terraform 0.11. If you are using
Terraform 0.12+, please use the sethvargo/12
branch. These configurations
will migrate to 0.12 style by default when Terraform 0.12.5 is released or when
a large enough mass of adopters have migrated to Terraform 0.12.
-
Vault HA - The Vault cluster is deployed in HA mode backed by Google Cloud Storage
-
Production Hardened - Vault is deployed according to the production hardening guide. Please see the security section for more information.
-
Auto-Init and Unseal - Vault is automatically initialized and unsealed at runtime. The unseal keys are encrypted with Google Cloud KMS and stored in Google Cloud Storage
-
Full Isolation - The Vault cluster is provisioned in it's own Kubernetes cluster in a dedicated GCP project that is provisioned dynamically at runtime. Clients connect to Vault using only the load balancer and Vault is treated as a managed external service.
-
Audit Logging - Audit logging to Stackdriver can be optionally enabled with minimal additional configuration.
-
Download and install Terraform.
-
Download, install, and configure the Google Cloud SDK. You will need to configure your default application credentials so Terraform can run. It will run against your default project, but all resources are created in the (new) project that it creates.
-
Install the kubernetes CLI (aka
kubectl
) -
Run Terraform:
$ cd terraform/ $ terraform init $ terraform apply
This operation will take some time as it:
- Creates a new project
- Enables the required services on that project
- Creates a bucket for storage
- Creates a KMS key for encryption
- Creates a service account with the most restrictive permissions to those resources
- Creates a GKE cluster with the configured service account attached
- Creates a public IP
- Generates a self-signed certificate authority (CA)
- Generates a certificate signed by that CA
- Configures Terraform to talk to Kubernetes
- Creates a Kubernetes secret with the TLS file contents
- Configures your local system to talk to the GKE cluster by getting the cluster credentials and kubernetes context
- Submits the StatefulSet and Service to the Kubernetes API
-
Export environment variables:
Vault reads these environment variables for communication. Set Vault's address, the CA to use for validation, and the initial root token.
# Make sure you're in the terraform/ directory # $ cd terraform/ $ export VAULT_ADDR="https://$(terraform output address)" $ export VAULT_TOKEN="$(terraform output root_token)" $ export VAULT_CAPATH="$(cd ../ && pwd)/tls/ca.pem"
-
Run some commands:
$ vault kv put secret/foo a=b
Audit logging is not enabled in a default Vault installation. To enable audit
logging to Stackdriver on Google Cloud, enable the file
audit
device on stdout
:
$ vault audit enable file file_path=stdout
That's it! Vault will now log all audit requests to Stackdriver. Additionally,
because the configuration uses an L4 load balancer, Vault does not need to
parse X-Forwarded-For
headers to extract the client IP, as requests are
passed directly to the node.
You may wish to grant the Vault service account additional permissions. This service account is attached to the GKE nodes and will be the "default application credentials" for Vault.
To specify additional permissions, create a terraform.tfvars
file with the
following:
service_account_custom_iam_roles = [
"roles/...",
]
To use the GCP auth method with the default application credentials, the Vault server needs the following role:
roles/iam.serviceAccountKeyAdmin
Alternatively you can create and upload a dedicated service account for the GCP auth method during configuration and restrict the node-level default application credentials.
To use the GCP secrets engine with the default application credentials, the Vault server needs the following roles:
roles/iam.serviceAccountKeyAdmin
roles/iam.serviceAccountAdmin
Additionally, Vault needs the superset of any permissions it will grant. For example, if you want Vault to generate GCP access tokens with access to compute, you must also grant Vault access to compute.
Alternatively you can create and upload a dedicated service account for the GCP auth method during configuration and restrict the node-level default application credentials.
$ terraform destroy
This set of Terraform configurations is designed to make your life easy. It's a best-practices setup for Vault, but also aids in the retrieval of the initial root token. The decrypted initial root token will be stored in your state file!
As such, you should use a Terraform state backend with encryption enabled,
such as Cloud Storage. Alternatively you can remove the decryption calls in
k8s.tf
and manually decrypt the root token using gcloud
. Terraform
auto-generates the command, but you will need to setup the permissions for your
local default application credentials.
$ $(terraform output token_decrypt_command)
Just like the Vault root token, additional information is stored in plaintext in the Terraform state. This is not a bug and is the fundamental design of Terraform. You are ultimately responsible for securing access to your Terraform state. As such, you should use a Terraform state backend with encryption enabled, such as Cloud Storage.
-
Vault TLS keys - the Vault TLS keys, including the private key, are stored in Terraform state. Terraform created the resources and thus maintains their data.
-
Service Account Key - Terraform generates a Google Cloud Service Account key in order to download the initial root token from Cloud Storage. This service account key is stored in the Terraform state.
-
OAuth Access Token - In order to communicate with the Kubernetes cluster, Terraform gets an OAuth2 access token. This access token is stored in the Terraform state.
You may be seeing a theme, which is that the Terraform state includes a wealth of information. This is fundamentally part of Terraform's architecture, and you should use a Terraform state backend with encryption enabled, such as Cloud Storage.
The Kubernetes cluster is a "private" cluster, meaning nodes do not have publicly exposed IP addresses, and pods are only publicly accessible if exposed through a load balancer service. Additionally, only authorized IP CIDR blocks are able to communicate with the Kubernetes master nodes.
The default allowed CIDR is 0.0.0.0/0 (anyone)
. You should restrict this
CIDR to the IP address(es) which will access the nodes!.
Q: How is this different than kelseyhightower/vault-on-google-kubernetes-engine?
Kelsey's tutorial walks through the manual steps of provisioning a cluster,
creating all the components, etc. This captures those steps as
Terraform configurations, so it's a single command
to provision the cluster, service account, ip address, etc. Instead of using
cfssl, it uses the built-in Terraform functions.
Q: Why are you using StatefulSets instead of Deployments?
A: StatefulSets ensure that each pod is deployed in order. This is important for
the initial bootstrapping process, otherwise there's a race for which Vault
server initializes first with auto-init.
Q: Why didn't you use the Terraform Kubernetes provider to create the pods? There's this hacky template_file data source instead...
A: StatefulSets are not fully supported in Terraform yet. Should that change,
we can avoid the shellout to kubectl.
Q: I want to deploy without Terraform. Were is the YAML I can just apply?
A: The YAML template is in [k8s/vault.yaml][] in this repository. However,
the spec requires information that is only known at runtime. Specifically, you
will need to fill in any values with the dollar-brace ${...}
syntax before
you can apply the spec with kubectl
;. For example:
spec:
type: LoadBalancer
- loadBalancerIP: ${load_balancer_ip}
+ loadBalancerIP: 124.2.55.3
externalTrafficPolicy: Local
selector:
app: vault