- does not respect node labels as specified by pods
- does not respect critical pods
- does not consider jobs and daemonset pods
- does not respect pods with EBS volumes
- using taints - https://kubernetes.io/docs/user-guide/kubectl/kubectl_taint/, kubernetes/kubernetes#17190
- write proper tests before it can be used anywhere
- Identify unschedulable pods
- Get the nodes
- Identify autoscaling groups
- Kill necessary number of fake pods and put the pods in its place
- Create new node by increasing autoscaling group size
- Deploy fake (see below) pods else where
- Compute theoretical possibility of rescheduling pods elsewhere from least loaded node elsewhere (consider only worker nodes)
- If yes drain the node
- Terminate the instance
- Observe the pods rescheduling
- If rescheduling fails scale back up
- This part needs additional thinking
Scale up check is frequent and runs every minute Scale down is rare and runs every 30 minutes
- For now pods with hardcoded resource requests 500m cpu 100mb
- Randomly distributed
- Can be killed on demand and recreated on a new node
- Add exponential back off for sorter on expanding fail