Yandex Cloud preemptible instances killer

This is heavily borrowed from the estafette-gke-preemptible-killer.

This small Kubernetes application loop through a given preemptibles node pool and kill a node before the regular 24h life time of a preemptible VM.

Why?

When creating a cluster, all the node are created at the same time and should be deleted after 24h of activity. Even though Yandex Cloud instances are stopped in a random interval of 22 to 24 hours, that still doesn't work well with Kubernetes. To prevent large disruption, the yc-preemptible-killer can be used to kill instances during a random period of time between 12 and 24h. It makes use of the node annotation to store the time to kill value.

How does that work

At a given interval, the application get the list of preemptible nodes and check weither the node should be deleted or not. If the annotation doesn't exist, a time to kill value is added to the node annotation with a random range between 12h and 24h based on the node creation time stamp. When the time to kill time is passed, the Kubernetes node is marked as unschedulable, drained and the instance deleted on GCloud.

Known limitations

Selecting node pool is not supported yet, the code is processing ALL preemptible nodes attached to the cluster, and there is no way to limit it even via taints nor annotations
This tool increases the chances to have many small disruptions instead of one major disruption.
This tool does not guarantee that major disruption is avoided - GCP can trigger large disruption because the way preemptible instances are managed. Ensure your have PDB and enough of replicas, so for better safety just use non-preemptible nodes in different zones. You may also be interested in estafette-gke-node-pool-shifter

Usage

You can either use environment variables or flags to configure the following settings:

Environment variable	Flag	Default	Description
BLACKLIST_HOURS	--blacklist-hours (-b)		List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` in which deletion is NOT allowed
DRAIN_TIMEOUT	--drain-timeout	300	Max time in second to wait before deleting a node
FILTERS	--filters (-f)		Label filters in the form of `key1: value1[, value2[, ...]][; key2: value3[, value4[, ...]], ...]`
INTERVAL	--interval (-i)	600	Time in second to wait between each node check
KUBECONFIG	--kubeconfig		Provide the path to the kube config path, usually located in ~/.kube/config. This argument is only needed if you're running the killer outside of your k8s cluster
METRICS_LISTEN_ADDRESS	--metrics-listen-address	:9001	The address to listen on for Prometheus metrics requests
METRICS_PATH	--metrics-path	/metrics	The path to listen for Prometheus metrics requests
WHITELIST_HOURS	--whitelist-hours (-w)		List of UTC time intervals in the form of `09:00 - 12:00, 13:00 - 18:00` in which deletion is allowed and preferred

Create a Google Service Account

In order to have the yc-preemptible-killer instance delete nodes, create a service account and give the compute.instances.delete permissions.

You can either create the service account and associate the role using the GCloud web console or the cli:

$ export project_id=<PROJECT>
$ gcloud iam --project=$project_id service-accounts create preemptible-killer \
    --display-name preemptible-killer
$ gcloud iam --project=$project_id roles create preemptible_killer \
    --project $project_id \
    --title preemptible-killer \
    --description "Delete compute instances" \
    --permissions compute.instances.delete
$ export service_account_email=$(gcloud iam --project=$project_id service-accounts list --filter preemptible-killer --format 'value([email])')
$ gcloud projects add-iam-policy-binding $project_id \
    --member=serviceAccount:${service_account_email} \
    --role=projects/${project_id}/roles/preemptible_killer
$ gcloud iam --project=$project_id service-accounts keys create \
    --iam-account $service_account_email \
    google_service_account.json

Installation

Prepare using Helm:

brew install kubernetes-helm
kubectl -n kube-system create serviceaccount tiller
kubectl create clusterrolebinding tiller --clusterrole=cluster-admin --serviceaccount=kube-system:tiller
helm init --service-account tiller --wait

Then install or upgrade with Helm:

helm repo add estafette https://helm.estafette.io
helm upgrade --install yc-preemptible-killer --namespace estafette estafette/yc-preemptible-killer

Deploy with Kustomize

Create a kustomization.yaml file:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: default
commonLabels:
  app: preemptible-killer
bases:
- github.com/estafette/yc-preemptible-killer//manifests
images:
- name: estafette/yc-preemptible-killer
  newTag: 1.1.21
secretGenerator:
- name: preemptible-killer-secrets
  files:
  - google-service-account.json=google_service_account.json
  type: "Opaque"

Apply manifests:

kubectl apply -k .

Development

To start development run

git clone git@github.com:estafette/estafette-ci-api.git
cd estafette-ci-api

Before committing your changes run

go test ./...
go mod tidy

Testing

In order to test your local changes against an external Kubernetes cluster use the following commands:

# proxy master
kubectl proxy

# in another shell
go build && ./yc-preemptible-killer -i 10

Note: KUBECONFIG=~/.kube/config as environment variable can also be used if you don't want to use the kubectl proxy command.

For an all-in-one script that launches a kind cluster with 3 nodes, runs yc-preemptible-killer and then reports on the kill time, run:

go build && ./scripts/all-in-one-test -i 10

where -i 10 are the arguments to be passed to yc-preemptible-killer, replace with your own test arguments. For safety, it does not remove the kind cluster it leaves behind.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
helm/yc-preemptible-killer		helm/yc-preemptible-killer
manifests		manifests
.DS_Store		.DS_Store
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
helpers.go		helpers.go
helpers_test.go		helpers_test.go
kubernetes_client.go		kubernetes_client.go
kubernetes_client_mock.go		kubernetes_client_mock.go
kubernetes_client_test.go		kubernetes_client_test.go
main.go		main.go
main_test.go		main_test.go
whitelist.go		whitelist.go
whitelist_test.go		whitelist_test.go
ycloud.go		ycloud.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Yandex Cloud preemptible instances killer

Why?

How does that work

Known limitations

Usage

Create a Google Service Account

Installation

Deploy with Kustomize

Development

Testing

About

Releases

Packages

Languages

License

blitss/yc-preemptible-killer

Folders and files

Latest commit

History

Repository files navigation

Yandex Cloud preemptible instances killer

Why?

How does that work

Known limitations

Usage

Create a Google Service Account

Installation

Deploy with Kustomize

Development

Testing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages