This document aims to provide an opinionated working solution leveraging Kubernetes and proven GitOps techniques to have a resilient, composable and scalable Kubernetes platform.
Nothing outlined below is new or innovative, but it should be at least a good starting point to have a cluster up and running pretty quickly and give you a chance to remain focused and try out new ideas.
Feedback and help are always welcome!
- Kubernetes is a declarative system
- Git can be used to describe infrastructure and applications
- Git repository is the source of truth and represents a cluster
- GitOps is a way to do Continuous Delivery and operate Kubernetes via Git pull requests
- GitOps empowers developers to do operations
- CI pipelines should only run builds, tests and publish images
- In a pull-based approach, an operator deploys new images from inside of the cluster
- You can only observe the actual state of the cluster and react when it diverges from the desired state
In an imperative system, the user knows the desired state, determines the sequence of commands to transition the system to the desired state and supplies a representation of the commands to the system.
By contrast, in a declarative system, the user knows the desired state, supplies a representation of the desired state to the system, then the system reads the current state and determines the sequence of commands to transition the system to the desired state.
Declarative systems have the distinct advantage of being able to react to unintended state changes without further supervision. In the event of an unintended state change leading to a state drift, the system may autonomously determine and apply the set of mitigating actions leading to a state match. This process is called a control loop, a popular choice for the implementation of controllers.
GitOps is the art and science of using Git pull requests to manage infrastructure provisioning and software deployment.
The concept of GitOps originated at Weaveworks, whose developers described how they use Git to create a single source of truth. Kubernetes is a declarative system and by using declarative tools, the entire set of configuration files can be version controlled in Git.
More generally, GitOps is a way to do Continuous Delivery and operate Kubernetes via Git.
In a push-based pipeline, the CI system runs build and tests, followed by a deployment directly to Kubernetes. This is an anti-pattern. CI server is not an orchestration tool. You need something that continually attempts to make progress until there are no more diffs because CI fails when it encounters a difference and then you could end up being in a partial and unknown state.
In a pull-based pipeline, a Kubernetes operator deploys new images from inside of the cluster. The operator notices when a new image has been pushed to the registry. Convergence of the cluster state is then triggered and the new image is pulled from the registry, the manifest is automatically updated and the new image is deployed to the cluster.
A CI pipeline should be used to merge and integrate updates with master, while with GitOps you should rely on Kubernetes or the cluster to internally manage deployments based on those master updates.
You could potentially have multiple cluster pointing to the same GitOps repository, but you won't have a centralized view of them, all the clusters will be independent.
Git provides a source of truth for the desired state of the system and observability provides a source of truth for the actual state of the running system.
You cannot say what actual state is in the cluster. You can only observe it. This is why diffs are so important.
A system is observable if developers can understand its current state from the outside. Observability is a property of systems like Availability and Scalability. Monitoring, Tracing and Logging are techniques for baseline observations.
Observability is a source of truth for the actual running state of the system right now. You observe the running system in order to understand and control it. Observed state must be compared with the desired state in Git and usually you want to monitor and alert when the system diverge from the desired state.
Resources
- Imperative vs Declarative
- GitOps - Operations by Pull Request (Part 1)
- The GitOps Pipeline (Part 2)
- GitOps - Observability (Part 3)
- GitOps - Application Delivery Compliance and Secure CICD (Part 4)
- Making the Leap from Continuous Integration to Continuous Delivery (Whitepaper)
- What is GitOps really?
- Why is a PULL vs a PUSH pipeline important?
- Kubernetes anti-patterns: Let's do GitOps, not CIOps!
- GitOps: High velocity CICD for Kubernetes
- GitOps - What you need to know
- GitOps for Kubernetes - A DevOps Iteration Focused on Declarative Infrastructure
- Automating continuous delivery with Kubernetes, Google Cloud and Git
- Continuous Delivery the Hard Way
Argo CD is a declarative, GitOps continuous delivery tool for Kubernetes. It automates the deployment of the desired application states in the specified target environments. In this project Kubernetes manifests are specified as helm charts.
This guide will explain how to setup in few steps the whole infrastructure via GitOps with Argo CD. Note that it's not tightly coupled to any specific vendor and you should be able to easily run it on DigitalOcean, EKS or GKE for example.
Most of the steps have been kept manual on purpose, but they should be automated in a production enviroment.
- Setup required tools
- Create a Kubernetes cluster locally or with your favourite provider
- Download the cluster configs and test connection
export KUBECONFIG=~/.kube/<CLUSTER_NAME>-kubeconfig.yaml kubectl get nodes
- TODO Setup secrets (optional)
- Setup Argo CD and all the applications
make bootstrap
- Access Argo CD
# username: admin # password: (autogenerated) the pod name of the Argo CD API server kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2 # port forward the service kubectl port-forward service/argocd-server -n argocd 8080:443 # from the UI [open|xdg-open] https://localhost:8080 # from the CLI argocd login localhost:8080 --username admin
- You might need to Allow invalid certificates for resources loaded from localhost on Chrome enabling the flag
chrome://flags/#allow-insecure-localhost
to access it
- You might need to Allow invalid certificates for resources loaded from localhost on Chrome enabling the flag
- First time only sync all the
OutOfSync
applications- manually
- TODO with a cronjob (optional)
- verify guestbook example
# port forward the service kubectl port-forward service/guestbook-ui -n guestbook 8081:80 # open browser [open|xdg-open] http://localhost:8081
This is how it should looks like on the UI
Resources
Applications in this repository are defined in the parent applications chart and are logically split into folders which represent Kubernetes namespaces.
ambassador
namespace is dedicated for Ambassador, a lightweight Kubernetes-native microservices API gateway built on the Envoy Proxy which is mainly used for routing and supports canary deployments, traffic shadowing, rate limiting, authentication and more
# retrieve EXTERNAL-IP
kubectl get service ambassador -n ambassador
[open|xdg-open] http://<EXTERNAL-IP>/ambassador
[open|xdg-open] http://<EXTERNAL-IP>/httpbin/
[open|xdg-open] http://<EXTERNAL-IP>/guestbook
# debug ambassador
kubectl port-forward service/ambassador-admins 8877 -n ambassador
[open|xdg-open] http://localhost:8877/ambassador/v0/diag
Ambassador is disabled by default because the recommended way is to use host-based routing which requires a domain
For a working example on DigitalOcean using external-dns
you can have a look at niqdev/do-k8s
TODO Service mesh
observe
namespace is dedicated for observability and in the specific Monitoring, Alerting and Logging
-
prometheus-operator
provides monitoring and alerting managing Prometheus, Alertmanager and Grafana# prometheus kubectl port-forward service/prometheus-operator-prometheus 8001:9090 -n observe # alertmanager kubectl port-forward service/prometheus-operator-alertmanager 8002:9093 -n observe # grafana # username: admin # password: prom-operator kubectl port-forward service/prometheus-operator-grafana 8003:80 -n observe
-
kube-ops-view
provides a read-only system dashboard for multiple k8s clusterskubectl port-forward service/kube-ops-view -n observe 8004:80
EFK stack for logging
-
elasticsearch
is a distributed, RESTful search and analytics engine and it's used for log storagekubectl port-forward service/elasticsearch-master 9200:9200 -n observe
-
cerebro
is an Elasticsearch web admin toolkubectl port-forward service/cerebro 9000:80 -n observe
-
kibana
visualize and query the log data stored in an Elasticsearch indexkubectl port-forward service/kibana-kibana 9001:5601 -n observe
-
fluentbit
is a fast and lightweight Log Processor and Forwarder -
elasticsearch-curator
orcurator
helps to curate, or manage, Elasticsearch indices and snapshots
Resources
- Prometheus
- Prometheus Operator - Getting Started Guide
- Grafana - Dashboards
- Fluent Bit
- Logging Best Practices for Kubernetes using Elasticsearch, Fluent Bit and Kibana
- Exporting Kubernetes Logs to Elasticsearch Using Fluent Bit
- Fluentd vs. Fluent Bit: Side by Side Comparison
- Logging & Monitoring of Kubernetes Applications: Requirements & Recommended Toolset
- Loki
kube-system
namespace is reserved for Kubernete system applications
-
kubernetes-dashboard
is a general purpose, web-based UI for Kubernetes clusters. It allows users to manage applications running in the cluster and troubleshoot them, as well as manage the cluster itselfkubectl port-forward service/kubernetes-dashboard -n kube-system 8000:443
-
metrics-server
is an add-on which extends the metrics api group and enables the Kubernetes resourceHorizontalPodAutoscaler
kubectl top node kubectl top pod --all-namespaces
-
spotify-docker-gc
performs garbage collection in the Kubernetes cluster and the default configurations have the gc running once a day which:- removes containers that exited more than a hour ago
- removes images that don't belong to any container
- removes volumes that are not associated to any remaining container
- argocd: example secrets for private charts
- argocd: override default
admin.password
- argocd-bootstrap: open source and explain solution of how to sync automatically first time with cronjob
- expose argocd over http i.e.
--insecure
flag - configure TLS/cert and authentication on ambassador for all services
- centralize auth on ambassador/istio
- Jaeger tracing
- kube-monkey or chaoskube
- explain how to switch cluster via DNS
- Kafka from public chart + JMX fix
- stateless vs stateful: disaster recovery stratecy e.g S3 backup/restore
- example with multiple providers: DigitalOcean, EKS, GKE
- add prometheus adapter for custom metrics that can be used by the HorizontalPodAutoscaler
- explain how to test a branch i.e. change target revision from the UI
- TODO fix
alertmanager: error: unrecognized log format "<nil>", try --help
- add screenshots to readme for each app
- explain how to add grafana dashboards with
ConfigMap
- add alerting example on Slack/PagerDuty
- add example of prometheus
ServiceMonitor
+ dashboard - explain how to init es index on kibana for logging + screenshot
- add
kubefwd
to docs - argocd issue: Add support for secrets in Application parameters
- argocd issue: Helm repository as first class Argo CD Application source