Skip to content

Latest commit

 

History

History
108 lines (93 loc) · 3.93 KB

README.md

File metadata and controls

108 lines (93 loc) · 3.93 KB

Go Report Card

Kubernetes Node Fence controller

Based on fence controller design proposal

Status

alpha

Goal

The goal is to run controller that monitors partitioned nodes (i.e. not-ready nodes or nodes that raises problem events) Once partitioned is monitored, the controller posts NodeFence CRD object.

Deployment

Based on quay.io/bronhaim/standalone-fence-controller image

kubectl create -f standalone-controller/fence-controller-deployment.yaml

The deployment starts the controller with permissions to read and start agent pods (fence executors)

Agents pods are based centos7 and include all fence agents script (maintained in docker.io/bronhaim/agent-image)

Configuration

The following defines fence config for node1 which runs fence-rhevm in power-management step.

Each step list contains the names of the fence-method to perform with space separation. For each method you should define fence-method config.

The controller moves between steps every grace_period which is defined by the fence-cluster-config (default is 5seconds). For more available configs, such as defining method templates, see examples.

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-config-node1
   namespace: default
  data:
   config.properties: |-
    node_name=node1
    isolation=
    power_management=fence-rhevm
    recovery=

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-method-fence-rhevm-node1
   namespace: default
  data:
   method.properties: |
          agent_name=fence-rhevm
          namespace=default
          ip=ovirt.com  # address to the rhevm management
          username=admin@internal
          password-script=/usr/sbin/fetch_passwd
          ssl-insecure=true
          plug=vm-node1  # the vm name
          action=reboot
          ssl=true
          disable-http-filter=true

- kind: Secret
  apiVersion: v1
  metadata:                                 
    name: secret-fence-method-fence-rhevm-node1
  type: Opaque
  data:                                                              
    password: MTIz # ecoded password - follow https://kubernetes.io/docs/concepts/configuration/secret/ for more info

- kind: ConfigMap
  apiVersion: v1
  metadata:
   name: fence-cluster-config
   namespace: default
  data:
   config.properties: |-
    grace_timeout=10
    giveup_retries=5
    roles=

Build

# make
go build -i -o standalone-controller/_output/bin/node-fencing-controller cmd/node-fencing-controller.go

Build Images

To rebuild images after changes:

export REGISTRY=[registry name]
export VERSION=[build version]
# make images

Run

# $ ./standalone-controller/_output/bin/node-fencing-controller -kubeconfig=[config path]
  I1228 10:29:01.180885    6376 controller.go:120] Fence controller starting
  I1228 10:29:01.181089    6376 controller.go:123] Waiting for pod informer initial sync
  I1228 10:29:02.181297    6376 controller.go:133] Waiting for node informer initial sync
  I1228 10:29:03.181527    6376 controller.go:144] Waiting for event informer initial sync
  I1228 10:29:04.335665    6376 controller.go:170] Controller monitor is running every 10 seconds
   ...
  W1228 10:34:55.392293    6376 controller.go:272] Node node1-48dw ready status is unknown
  I1228 10:34:55.547233    6376 controller.go:447] Posted NodeFence CRD object for node node1-48dw - starting Isolation
  ...

Demo

In the demo video we show running example over GCE k8s cluster - enable subtitles to understand the process