Skip to content

Procedures and assets to create Enterprise-grade Consul & Vault in production

Notifications You must be signed in to change notification settings

hashicorp-services/consul-in-prod

Repository files navigation

# hashicorp-services/consul-in-prod = Reference Implementation Book

What's Inside

This book provides to customer-facing implementers hands-on end-to-end instructions about automation to create -- for production usage -- secure and highly-available implementations of HashiCorp Consul with Vault common within Global 2000 enterprises.

The most common usage of Consul is a "Global Network Mesh" which is a superset of the industry concept of "Service Mesh". Unlike others which operate only in Kubernetes or only in AWS, Consul provides multi-region reliability, multi-cloud flexibility, and multi-platform adaptability.

So this document is focused how to create such amazing capabilities.

Contents:

  1. What's inside - Table of Contents

  2. Audience for this document

  3. Editions of Consul

  4. The App/System Managed by Consul

  5. Global App Network Mesh

  6. Multi-Region Production Scope

  7. Construction Stages (part of Adoption Plan)

  8. Microservices App

  9. Implementations

  10. Clouds

  11. OS Platforms

  12. DevSecOps Workflow

  13. Proof of Production Viability (part of Reliability Plan)

  14. Workshop Resources

Additional pages this summary page links to, alphabetically:


Audience for this document

  • HashiCorp Presales Solution Engineering (Kyle Rarey, Ram Ramhariram, Nathan Pearce)
  • SE SME (Ancil McBarnett)
  • HashiCorp PS (Professional Services)
  • HashiCorp Implementation Services (Austin Workman, )
  • HashiCorp CS SEs (Customer Service Engineers) (Josh Wolfer)
  • Consultants within HashiCorp services partners (Gabe)
  • HashiCorp Field CTOs (Jake Lundberg)

This is being collaboratively developed and maintained by the above plus these stakeholders: TODO: Get the org names correct!

  • Domain Architecture (Wilson Mar, Frank Hane)
  • PSE (Iman, John Boero, Jim Sullivan)
  • Education (Tu Nguyen, Daniele Carcasole)
  • Operations Experience
  • (Tony Pulickal)
  • (Matt Peters)
  • Reference Architecture (Chloe Cota)
  • SE (segment leaders in US, EMEA, APJ)
  • Field (Thomas Kula)
  • CSA
  • CSM
  • IS
  • Consul Marketing PMM (Van Phan)
  • Consul Product Management (Usha Kodali, Abhishek Tiwari)
  • Dev Evangelists (Rosemary Wang)

Governance:

  • Hari
  • Joe Weber

NOTE: This document presents best practices and tools which individual practioners are free to adjust as they see fit for each situation.

During a Consul Accelerator Program (CAP) engagement, this document is modified to the customer.

QUESTION: Who should be included?

QUESTION: Who grants access to services partners to this org/repo on GitBook?


Editions of Consul

Consul is available in several editions :

  • Free Open-Source (at https://github.com/hashicorp/consul)
  • Licensed Enterprise (installed, configured, and managed by enterprise customers)
  • Licensed HCP cloud Consul (using HVN setup within the customer's app infrastructure)

Because most enterprises want support contracts, this document is focused on enterprise use in production, and does not cover setting up of an individual stand-alone Consul cluster for purpose of learning.

The HashiCorp cloud edition of Consul, Enterprise, and Terraform peered is used for "Cloud To Ground" peering connection to on-prem. servers maintained by enterprise customers.

Multi-Region Production Scope

The baseline Solution Design here assumes, for reliability, use of 5 nodes per Consul datacenter across 3 Availability Zones (each a separate VPC) within each region.

single datacenter TODO: Add performance nodes to this Consul single-datacenter/region Reference Architecture.

multi-region To ensure production-level reliability at Enterpise scale, each implementation here also addresses two regions peered together.

This HA decison tree from Microsoft :
<a target="_blank" href="https://docs.microsoft.com/en-us/azure/architecture/example-scenario/infrastructure/media/ha-decision-tree.png>Azure HA/DR selection

The App/System Managed by Consul

An app is needed for Consul to manage. This document makes use of the HashiCups sample app maintained by HashiCorp.

Although Consul works with multiple platform technologies, a Linux-based sample e-commerce application (HashiCups?) running in Kubernetes with a server node for each of these APIs:

  • front-end web server
  • product
  • shipment
  • payment (external)

(Not included are Redis cache, Elastisearch (parse logs), mail/SMS, ratings, observability, analytics, etc.)

TODO: Create a diagram to add in the Azure Reference Architecture diagrams with HA/DR

Alternative HA-capable apps:

Magento (e-commerce) in AKS?

https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/app-service-web-app/multi-region

https://docs.microsoft.com/en-us/azure/architecture/example-scenario/infrastructure/multi-tier-app-disaster-recovery-experiment

https://docs.microsoft.com/en-us/azure/architecture/example-scenario/infrastructure/wordpress

Registry A key capability of Consul is that, as each app service is instantiated, Consul detects it and adds it to its Service Registry, then securely route network traffic to them, even across disparate platforms (via a Consul API Gateway).

Consul provides the "glue" to services across the enterprise.

Consul Global Network Mesh

Within enterprises, each app exists in a sea of other apps and systems.

consul with envoy In each app server node, a Consul sidecar enables a Consul Global Network Mesh which directs L4 (network level 4) traffic across multiple clouds operating different platform technologies:

    A. Among app nodes within the same app cluster
    B. Database (MySQL, PostgreSQL, Oracle) outside Kubernetes
    C. AWS EC2 image running in AWS
    D. AWS ECS (Elastic Container Service) VID
    E. AWS EKS (Elastic Kubernetes Service) with Service Mesh
    F. VMWare registry image of hypervisor communicating with Kubernetes
    G. Pivotal Cloud Foundary
    H. RedHat OpenShift
    I. Serverless (AWS Lambda, Azure & GCP Functions) running within clouds
  • The Consul Docker container is retrieved into Kubernetes.

Thus, Consul "future proofs" how enterprises securely manage networking between services.

HashiCorp products and features used

app layers This document defines the features of Consul and Vault implemented :

  1. Centralized identity-based authentication for users and service accounts (with SSO and MFA using Okta OIDC IdP and other Authentication Methods)

  2. Centralized secrets management (using Vault)

  3. Encryption of app data in transit and at rest (using Vault)

  4. Segregation of app data into different data Namespaces to reduce access to data in case of breach (which we assume will occur)

  5. Admin partitions in a centralized service registry in the Consul Key/Value store - (this is the core feature desired by most enterprises)

  6. Segmentation of network traffic to restrict network access in case of breach (which we assume will occur)

  7. Layer 7 Traffic Managment for Canary testing, A/B tests, blue/green deploys, and soft multi-tenancy (instead of "East/West" Load Balancers among microservices)

  8. Use of Access Control Lists (ACLs tokens) to enforce least-privilege access

  9. Read replicas to ensure performance and reliability as systems scale

  10. Automated backups to ensure quick and reliable recovery in case of disaster at app, cloud Availability Zone, and cloud region levels (for MTTR to meet RPO and RTO standards)

  11. Autopilot Enterprise Redundancy Zones to improve resiliency and scaling by adding “non-voting” servers which will be promoted to voting status in case of voting server failure.

  12. Automated upgrades of versions

Zero Trust MaturityThe objective of implementing Consul and Vault features is to enable Optimal achievement of what is defined in the Zero-Trust Maturity Model, SOC2, ISO 27000, and other security frameworks along with the Well-Architected Framework.


Construction Stages

consul tf vault Manual steps to create each implementation, explained like the Imagina GitBook) are logically organized into these sequential stages (using automated means where applicable):

TODO: Create a flowchart such as this

  1. Project Management - tools and processes to ensure inclusion during organized and fast work

  2. Questions for interviewing each persona, based on the comprehensive Well-Architected Framework

  3. Design solution settings - decide on values of variables for datacenter, region, values for variables, abbreviations (such as Azure's)

  4. Define least-privilege Roles - the actions allowed/disallowed for each persona (in role files)

  5. Establish Auth Methods - Okta, etc. for SSO and MFA by each user

  6. Establish cloud accounts - with special Administrator access used during setup and regular accounts. This being Enterprise, we assume use of multiple cloud accounts.

  7. Establish cloud admin account for managing backup data (assuming breach of other accounts)

  8. Laptop setup - on each builder laptop: XCode, Homebrew, wget, tree, Jinja, VSCode, Git, GPG, Vault, Consul, Terraform, Packer, Docker, Docker Compose, etc.

  9. GitHub setup - with SSH and GPG certificates

  10. Clone GitHub template repos - each of the Reference Implementation components

  11. Establish CI/CD DevSecOps systems

  12. Establish Observability systems (log Log/SIEM, Dashboards, etc.) used during troubleshooting and managerial reviews

  13. Establish the App/system and databases described above

  14. Run security scans - on Terraform while on laptop (secret detection, TFSec, etc.)

    Run GitHub Actions or CircleCI CI/CD which invoke bootstraping Bash shell scripts and Terraform which load variables, create folders, bootstrap, configure to reboot automatically, etc.

  15. Obtain Enterprise licenses - from a HashiCorp Customer Success Solution Engineer

  16. Establish Vault - install and configure

  17. Establish Consul per Features listed above (with segrated namespaces, segmented networks, read replicas, automated backup, etc.)

  18. Define Intentions and ACLs - using Consul to manage the sample application

  19. Estimate costs - (using Terracost)

  20. Prove - that production-grade mechanisms can actually respond effectively to various operational and security stresses

  21. Maintain system

The list above provides a way to measure progress of the entire project.

There are different construction steps for each cloud and platform (defined below).

Clouds

This presents procedures and automation for creating Consul within each cloud:

Scripts to install assets for Instruqt labs may be used in production scripts.

  1. AWS is the most popular cloud.

  2. Google (GCP) - HashiCorp's hands-on Instruqt labs run on GCP.

  3. Azure

NOTE: We aim to structure our implementation scripts to make it easier to customize across different clouds.

References:

Slide decks from https://hashicorp.github.io/workshops/ and https://github.com/hashicorp/field-workshops-consul

  1. Intro to Consul Enterprise - A two hour introductory workshop. (Sales/Consul Team or FM Team)

  2. Life of a Developer with Consul Enterprise - A two hour container based progressive application delivery workshop. (Sales/Consul Team)

  3. Network Infrastructure Automation with Consul Enterprise - A two hour network acceleration workshop with Terraform and Consul-Terraform-Sync. (Sales/Consul Team)

  4. Multi-Cloud Service Networking with Consul Enterprise - An operational half day multi-product Zero Trust workshop across AWS, Azure, and GCP. (Sales/Consul Team)

OS Platforms

Each implementation has an edition/variation for each technical platform:

  • MacOS is commonly used on laptop clients used by developers and administrators.

  • Linux (Ubuntu RedHat)

DevSecOps Workflow

Our objective is that, after initial install and configuration, minimal manual intervention be required to keep Consul running.

Azure CI/CD To save time, enable collaboration, ensure repeatability, and avoid mistakes, the Consul features above are instantiated using modern DevSecOps principles:

  • Self-serve Shift-left for developer efficiency (TFSec on desktops)

  • Version control code with code reviews (in GitHub)

  • GitHub on Azure Automated CI/CD (GitHub Actions) for speed and comprehensive security scanning. TODO: Adadpt example from Azure:

  • Infrastructure-as-Code HashiCorp Terraform Enterprise Workspaces invoked by Bash shell scripts.

  • Dockerized (Docker Compose) from a Docker Registry

  • Automated policy enforcement to replace long waits for manual approvals

  • Helm charts

  • AWS Landing Zones using Terraform (VIDEO)

  • Azure/Terraform-provider-azapi VIDEO


Proof of Production Viability

promotion We also provide procedures and automation to prove that production-grade mechanisms can actually respond effectively to various stresses (planned and unplanned outages):

  1. ACL blocks access to those not authorized

  2. As each new service comes online, Consul "discovers" them

  3. Nodes auto-starts and re-joins its cluster when appropriate

  4. When a service becomes unhealthy, Consul no longer routes traffic to it

  5. When ACL and Intentions are changed, access is immediately restricted or allowed

  6. Data associated with each change is propagated (via Serf Gossip protocol) among nodes

  7. Each change that occurs in Consul results in notification of network management systems (Palo Alto via NIA CTS)

  8. When a Follower node is no longer functional, others continue working

  9. When the Leader node is no longer functional, the Raft protocol establishes a new Leader

  10. Additional cloud resources are added or removed as demand rises or drops significantly

  11. Logs can be filtered for troubleshooting (using Datadog and other observability tools)

  12. Dashboards (using Grafana) display analytics about key trends

  13. Alerts are issued when trouble is recognized (due to injection of troubling events)

VIDEO



Other Reference Implementations

About

Procedures and assets to create Enterprise-grade Consul &amp; Vault in production

Resources

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published