Distributed Scheduling Design Document: Spot Instance Optimization Strategy

Problem Definition

The objective of this strategy is to ensure that workloads are optimally balanced between spot and on-demand instances to reduce costs while maintaining reliability.

Goals

Maximize Spot Instance Utilization: Preferentially schedule workloads on spot instances to cut costs.
Minimize Service Interruptions: Ensure workload continuity by using on-demand instances as fallbacks in case of spot instance terminations.
No Change to Existing K8s Scheduler: Avoid replacing or modifying the current scheduler in the cluster.
Minimal Component Addition: Implement the strategy with the least amount of new components or services to avoid adding complexity.

Requirements

Node Labels

On-demand nodes have the label node.kubernetes.io/capacity: on-demand.
Spot nodes have the label node.kubernetes.io/capacity: spot.

Workload Types

Deployments: Stateless workloads, which can tolerate rescheduling more easily.
StatefulSets: Stateful workloads, where maintaining instance availability is critical.

Distributed Scheduling Design

1. Node Affinity and Tolerations

Use node affinity to prefer scheduling workloads onto spot instances where possible.
Introduce a secondary affinity for on-demand nodes as fallback, ensuring critical workloads remain running even if spot instances are terminated.

2. Pod Anti-affinity

Apply pod anti-affinity to spread workloads across spot and on-demand nodes, reducing the risk of all replicas being terminated at once.

3. Distributed Replication Strategy

Single Replica Workloads: Prioritize on-demand nodes to ensure availability, using spot instances as secondary options if no on-demand capacity is available.
Multiple Replica Workloads: Distribute replicas across both spot and on-demand nodes to ensure at least one replica remains available in the event of spot termination.

4. Spot Instance Monitoring - TODO

Monitor spot instance termination signals through external mechanisms (handled separately from scheduling control).
Once a termination signal is received, existing components will gracefully drain the spot nodes and reschedule workloads onto available nodes.

5. Graceful Fallback to On-Demand Instances

If all spot nodes are preempted or unavailable, on-demand nodes automatically serve as fallback under the affinity rules.
On-demand instances can be configured with a higher cost tolerance to ensure critical services are uninterrupted.

Technique

Using an admission webhook to modify the affinity of the pods, applying the scheduling strategy.

Steps:

Write the webhook in Golang and build it as a container image.
Deploy it in a deployment and expose it via a service.
Configure the MutatingWebhookConfiguration to register the webhook.

Scheduling Strategy

Pre-conditions

Only two kinds of nodes: on-demand and spot.
All nodes are in the same zone (inter-AZ not considered).
User-defined affinities have a higher priority.
Use annotations to control whether a pod follows the webhook scheduling strategy. Only the namespace and the workload have the lable ds-admission-webhook/autoSchedule: "enabled" can trigger the webhook.

Strategy

Deployment

Single Replica: Prioritize scheduling to spot nodes. If unavailable, schedule to on-demand nodes.
Multiple Replicas: Prefer scheduling to spot nodes. If unavailable, fallback to on-demand nodes.

StatefulSet

Single Replica: Schedule to on-demand nodes directly to ensure stability.
Multiple Replicas: 70% should be scheduled to on-demand nodes and 30% to spot nodes to reduce costs while retaining stability. This ratio can be customized via annotations, but the on-demand ratio cannot be 0.

Pod Anti-affinity Configuration

For multi-replica StatefulSets and Deployments, consider using pod anti-affinity to ensure that replicas do not all get scheduled on the same node type, reducing risk.

Affinity Examples

1. Deployment (Single Replica & Multi-Replica)

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stateless-app
spec:
  replicas: 3  # Change to 1 for single replica
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                matchExpressions:
                  - key: "node.kubernetes.io/capacity"
                    operator: In
                    values:
                      - "spot"  # Prefer scheduling to spot nodes
          requiredDuringSchedulingIgnoredDuringExecution:
            - nodeSelectorTerms:
                - matchExpressions:
                    - key: "node.kubernetes.io/capacity"
                      operator: In
                      values:
                        - "spot"
                        - "on-demand"  # Fallback to on-demand nodes

2. StatefulSet - Single Replica

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: stateful-app-single
spec:
  replicas: 1
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - nodeSelectorTerms:
                - matchExpressions:
                    - key: "node.kubernetes.io/capacity"
                      operator: In
                      values:
                        - "on-demand"  # Single replica to ensure stability

3. StatefulSet - Multiple Replicas (70:30 Distribution)

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: stateful-app-multiple
spec:
  replicas: 10
  template:
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 7
              preference:
                matchExpressions:
                  - key: "node.kubernetes.io/capacity"
                    operator: In
                    values:
                      - "on-demand"  # 70% prioritized on on-demand nodes for stability
            - weight: 3
              preference:
                matchExpressions:
                  - key: "node.kubernetes.io/capacity"
                    operator: In
                    values:
                      - "spot"  # 30% prioritized on spot nodes to save costs

4. Anti-Affinity Configuration

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: app-name
          topologyKey: "kubernetes.io/hostname"  # Ensure different replicas are spread across nodes

Future Consideration

Persistent Volume Topology: Based on the pod's PVC (Persistent Volume Claim), the system locates the storage class, identifies the available zones within that storage class, and selects one at random.
Topology Spread: Using Kubernetes topologySpreadConstraints, you can distribute pods across different zones to minimize the impact of potential outages by spreading them out.
Pre-scheduling and Spot Instance Fallback: This involves pre-scheduling mechanisms to predict potential spot instance interruptions and implementing fallback strategies before the unavailability occurs.
Spot-Friendly Pod Identification: Identifying pods that are optimized or suitable for running on spot instances, ensuring efficient use of such instances.
Rolling Updates and Scheduling Consistency: Ensuring consistency in the scheduling process during rolling updates, maintaining stable and reliable deployments.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.idea		.idea
client		client
manifests		manifests
vendor		vendor
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go
makefile		makefile
node		node

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Scheduling Design Document: Spot Instance Optimization Strategy

Problem Definition

Goals

Requirements

Node Labels

Workload Types

Distributed Scheduling Design

1. Node Affinity and Tolerations

2. Pod Anti-affinity

3. Distributed Replication Strategy

4. Spot Instance Monitoring - TODO

5. Graceful Fallback to On-Demand Instances

Technique

Steps:

Scheduling Strategy

Pre-conditions

Strategy

Deployment

StatefulSet

Pod Anti-affinity Configuration

Affinity Examples

1. Deployment (Single Replica & Multi-Replica)

2. StatefulSet - Single Replica

3. StatefulSet - Multiple Replicas (70:30 Distribution)

4. Anti-Affinity Configuration

Future Consideration

About

Releases

Packages

Languages

JackyTYang/distributedScheduling

Folders and files

Latest commit

History

Repository files navigation

Distributed Scheduling Design Document: Spot Instance Optimization Strategy

Problem Definition

Goals

Requirements

Node Labels

Workload Types

Distributed Scheduling Design

1. Node Affinity and Tolerations

2. Pod Anti-affinity

3. Distributed Replication Strategy

4. Spot Instance Monitoring - TODO

5. Graceful Fallback to On-Demand Instances

Technique

Steps:

Scheduling Strategy

Pre-conditions

Strategy

Deployment

StatefulSet

Pod Anti-affinity Configuration

Affinity Examples

1. Deployment (Single Replica & Multi-Replica)

2. StatefulSet - Single Replica

3. StatefulSet - Multiple Replicas (70:30 Distribution)

4. Anti-Affinity Configuration

Future Consideration

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages