-
Notifications
You must be signed in to change notification settings - Fork 0
/
run-k8s-platform-in-production-checklist
34 lines (29 loc) · 1.78 KB
/
run-k8s-platform-in-production-checklist
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# A checklist of build and run kubernetes platform in production with scale.
## This is what my understanding of running kubernetes as platform in production in Jan.2022
A checklist
- Automated scalability and performance testing
- Vulnerability scan and threat modeling, and CVE monitoring.
- Automated metrics monitoring, alert and on-call.
- Gitops readiness, every aspect of the platform is version controlled through git
- CI/CD readiness to continuously validate and deliver new platform feature/configure.
- Disaster recovery.
- Engineer readiness for operations and continuously development.
- Architecture extensibility.
- Issue tracking system
After spending 6 months at the Intuit Kubernetes team, we not only developed a kubernetes runtime, but also
build a kubernete platform with several hundreds medium size clusters in production, here is my checklist right now (Aug.2022)
- Resilient K8s controlplane, minimally 3 control planes, 3 etcd nodes.
- Node health check and auto repair (i.e AWS cluster auto-scaler)
- Resilient Networking, DNS, Node network, CNI, LB, API GW, Mesh and more.
- Resilient Storage (cloud storage)
- Capacity planning and scaling (namespace, node)
- Pod autoscaling (i.e HPA)
- Pod health and alerting
- Logging and tracing
- Metrics
- Security (define and enforce security policies)
- Process to develop and reliably upgrade clusters
We are solving real challenges that developers are facing, because we talk to those developers every day.
we need to abstract Kubernetes API, expose less API to developers, make those abstracted API portable, predictable, and reliable,
for example, we will not expose Horizontal Pod Autoscaler API to developer directly, rather, we will automate auto-scaling for user,
automatically scale up and down without user intervention and input.