Project status: alpha Not all planned features are completed. The API, spec, status and other user facing objects may change, but in a backward compatible way.
Packaging scripts and instructions for deployment are still in progress and looking for contributors.
Kubernetes Operator which allows for automating manual actions, normally documeneted in applcation runbooks and executed by Ops or SRE staff, in reaction to an application alert. Simple examples include:
- deleting/restarting a pod on application error that doesn't cause liveness/readiness probes to restart
- taking a Java thread-dump or enabling profiler such as async-profiler on high CPU usage alert
For more detailed examples and use cases see the README in the docs folder.
This project aims to define a API and controller in Kubernetes to codify project runbooks, allowing for automation of actions that are manually taken when on on-call engineer receives an alert.
For example, imagine a Java application with a runbook that defines when an alert for high CPU is received, the on-call engineer is to take a thread-dump for analysis. Doing this manually may prove difficult depending on how long the high CPU event lasts and the engineer availability, and whether or not the container has the debug tools required.
This project allows for the automation of the above runbook task by using an operator
written using the OperatorSDK and a few CRDs
to define the event
to monitor and the actions
to take.
The operator allows for deployment of an event source, currently only Prometheus is supported, and a countermeasure that defines one or more actions. The event source will publish events into an internal event bus to be conssumed by the countermeasures.
The Kubernetes CounterMeasures Operator uses Ephemeral Containers
which was alpha in Kubernetes 1.22.0
, beta in 1.23.0
, and stable in >=1.25.0
.
Therefore it is recommended to use verion >=1.25.0
, but development and testing
was done with a Kubernetes cluster of version >=1.23.0
.
A core feature of the Kubernetes CounterMeasures Operator is to monitor the Kubernetes API server for changes to specific objects and ensure that your application is monitored for any undesirable conditions and when detected the appropriate actions are taken as a counter measure. The Operator acts on the following custom resource definitions (CRDs):
CounterMeasure
, which defines a condition to watch for and actions to take when it occurs.Prometheus
, which defines an event source that trigger the counter measures.
The Kubernetes CounterMeasures operator automatically detects changes in the Kubernetes API server to any of the above objects, and ensures your the monitors are updated.
To learn more about the CRDs introduced by the Kubernetes CounterMeasures Operator have a look at the documentation.
To provide validation an admission webhook is provided to validate CRD resources upon initial creation or update or during dry run.
For more information on this feature, see the user guide.
To quickly try out the Kubernetes CounterMeasures Operator inside a Kind cluster, run the following command:
./hack/start-cluster.sh
make install
make deploy
To run the Operator outside of a cluster instead of running make deploy
, use:
make run
To remove the operator, first delete any custom resources you created in each namespace.
for n in $(kubectl get namespaces -o jsonpath={..metadata.name}); do
kubectl delete --all --namespace=$n countermeasure
done
After a couple of minutes you can go ahead and remove the operator itself.
make undeploy
make uninstall
- golang environment
- docker (used for creating container images, etc.)
- kind (optional)
make test
To debug the controller locally against a running K8s cluster, add this entry to
the /etc/hosts
file so that the operator can communicate with Prometheus.
##
# Host Database
#
# localhost is used to configure the loopback interface
# when the system is booting. Do not change this entry.
##
127.0.0.1 localhost
# Add for k8s-countermeasures debugging
127.0.0.1 prometheus-operated.monitoring.svc
then enable port forwarding from the development host to the promtheus service:
kubectl -n monitoring port-forward service/prometheus-operated 9090:9090
Many files (documentation, manifests, ...) in this repository are auto-generated. Before proposing a pull request:
- Commit your changes.
- Run
make generate
. - Commit the generated changes.
If you find a security vulnerability related to the Kubernetes CounterMeasures Operator, please do not report it by opening a GitHub issue, but instead please send an e-mail to the owner of this project.