Active-Monitor

Motivation

Active-Monitor is a Kubernetes custom resource controller which enables deep cluster monitoring using Argo workflows.

While it is not too difficult to know that all entities in a cluster are running indvidually, it is often quite challenging to know that they can all coordinate with each other as required for successful cluster operation (network connectivity, volume access, etc).

Overview

Active-Monitor will create a new health namespace when installed in the cluster. Users can then create and submit HealthCheck objects to the Kubernetes server. A HealthCheck is essentially an instrumented wrapper around an Argo workflow.

The workflow is run periodically, as definied by repeatAfterSec property in its spec, and watched by the Active-Monitor controller.

Active-Monitor sets the status of the HealthCheck CR to indicate whether the monitoring check succeeded or failed. External systems can query these CRs and take appropriate action if they failed.

Typical examples of such workflows include tests for basic Kubernetes object creation/deletion, tests for cluster-wide services such as policy engines checks, authentication and authorization checks, etc.

The sort of HealthChecks one could run with Active-Monitor are:

verify namespace and deployment creation
verify AWS resources are using < 80% of their instance limits
verify kube-dns by running DNS lookups on the network
verify kube-dns by running DNS lookups on localhost
verify KIAM agent by running aws sts get-caller-identity on all available nodes

Dependencies

Kubernetes command line tool (kubectl)
Access to Kubernetes Cluster as specified in ~/.kube/config
Argo Workflows Controller

Installation Guide

# step 0: ensure that all dependencies listed above are installed or present

# step 1: install argo workflow controller
kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/deploy/deploy-argo.yaml

# step 2: install active-monitor controller
kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/config/crd/bases/activemonitor.orkaproj.io_healthchecks.yaml
kubectl apply -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/deploy/deploy-active-monitor.yaml

# step 3: run the controller via docker container (binding a volume and setting envVar for kubeconfig file)
docker run -v ~/.kube/config:/root/.kube/config -e "KUBECONFIG=/root/.kube/config" orkaproj/active-monitor:latest

Alternate Install - using locally cloned code

# step 0: ensure that all dependencies listed above are installed or present

# step 1: install argo workflow-controller
kubectl apply -f deploy/deploy-argo.yaml

# step 2: install active-monitor controller
make install
kubectl apply -f deploy/deploy-active-monitor.yaml

# step 3: run the controller via Makefile target
make run

Usage and Examples

Run example healthchecks

Create a new healthcheck:

kubectl create -f https://raw.githubusercontent.com/orkaproj/active-monitor/master/examples/inlineHello.yaml

OR with local source code:

kubectl create -f examples/inlineHello.yaml

Then, list all healthchecks:

kubectl get healthcheck -n health OR kubectl get hc -n health

NAME                 AGE
inline-hello-zz5vm   55s

View additional details/status of a healthcheck:

kubectl describe healthcheck inline-hello-zz5vm -n health

...
Status:
  Failed Count:              0
  Finished At:               2019-08-09T22:50:57Z
  Last Successful Workflow:  inline-hello-4mwxf
  Status:                    Succeeded
  Success Count:             13
Events:                      <none>

Generates Resources

activemonitor.orkaproj.io/v1alpha1/HealthCheck
argoproj.io/v1alpha1/Workflow

Sample HealthCheck CR:

apiVersion: activemonitor.orkaproj.io/v1alpha1
kind: HealthCheck
metadata:
  generateName: dns-healthcheck-
  namespace: health
spec:
  repeatAfterSec: 60
  description: "Monitor pod dns connections"
  workflow:
    generateName: dns-workflow-
    resource:
      namespace: health
      serviceAccount: activemonitor-controller-sa
      source:
        inline: |
            apiVersion: argoproj.io/v1alpha1
            kind: Workflow
            spec:
              ttlSecondsAfterFinished: 60
              entrypoint: start
              templates:
              - name: start
                retryStrategy:
                  limit: 3
                container: 
                  image: tutum/dnsutils
                  command: [sh, -c]
                  args: ["nslookup www.google.com"]

Access Workflows on Argo UI

kubectl -n health port-forward deployment/argo-ui 8001:8001

Then visit: http://127.0.0.1:8001

Prometheus Metrics

Active-Monitor controller also exports metrics in Prometheus format which can be further used for notifications and alerting.

Prometheus metrics are availabe on :2112/metrics

kubectl -n health port-forward deployment/active-monitor-controller 2112:2112

Then visit: http://localhost:2112/metrics

Active-Monitor, by default, exports following Promethus metrics:

healthcheck_success_count - The total number of successful monitor resources
healthcheck_error_count - The total number of errored monitor resources
healthcheck_runtime_seconds - Time taken for the workflow to complete

Active-Monitor also supports custom metrics. For this to work, your workflow should export a global parameter. The parameter will be programatically available in the completed workflow object under: workflow.status.outputs.parameters.

The global output parameters should look like below:

"{\"metrics\":
  [
    {\"name\": \"custom_total\", \"value\": 123, \"metrictype\": \"gauge\", \"help\": \"custom total\"},
    {\"name\": \"custom_metric\", \"value\": 12.3, \"metrictype\": \"gauge\", \"help\": \"custom metric\"}
  ]
}"

❤ Contributing ❤

Please see CONTRIBUTING.md.

License

The Apache 2 license is used in this project. Details can be found in the LICENSE file.

Other Orka Projects

Instance Manager - Kube Forensics - Addon Manager - Upgrade Manager - Minion Manager - Governor

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
api/v1alpha1		api/v1alpha1
config		config
controllers		controllers
deploy		deploy
examples		examples
hack		hack
images		images
metrics		metrics
sample_workflows		sample_workflows
store		store
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG		CHANGELOG
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
PROJECT		PROJECT
README.md		README.md
codecov.yml		codecov.yml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Active-Monitor

Motivation

Overview

Dependencies

Installation Guide

Alternate Install - using locally cloned code

Usage and Examples

Run example healthchecks

Generates Resources

Sample HealthCheck CR:

Access Workflows on Argo UI

Prometheus Metrics

❤ Contributing ❤

License

Other Orka Projects

About

Releases

Packages

Languages

License

tekenstam/active-monitor

Folders and files

Latest commit

History

Repository files navigation

Active-Monitor

Motivation

Overview

Dependencies

Installation Guide

Alternate Install - using locally cloned code

Usage and Examples

Run example healthchecks

Generates Resources

Sample HealthCheck CR:

Access Workflows on Argo UI

Prometheus Metrics

❤ Contributing ❤

License

Other Orka Projects

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages