diff --git a/CHANGELOG.md b/CHANGELOG.md index a620bdb31..7ca3f25cf 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,25 +4,38 @@ ## Isolate resources for best-effort workloads -In Koodinator v0.2.0, we refined the ability to isolate resources for best-effort worklods. +In Koodinator v0.2.0, we refined the ability to isolate resources for best-effort workloads. -`koordlet` will set the cgroup parameters according to the resources described in the Pod Spec. Currently supports setting CPU Request/Limit, and Memory Limit. +`koordlet` will set the cgroup parameters according to the resources described in the Pod Spec. Currently, supports +setting CPU Request/Limit, and Memory Limit. -For CPU resources, only the case of `request == limit` is supported, and the support for the scenario of `request <= limit` will be supported in the next version. +For CPU resources, only the case of `request == limit` is supported, and the support for the scenario +of `request <= limit` will be supported in the next version. ## Active eviction mechanism based on memory safety thresholds -When latency-sensitiv applications are serving, memory usage may increase due to bursty traffic. Similarly, there may be similar scenarios for best-effort workloads, for example, the current computing load exceeds the expected resource Request/Limit. +When latency-sensitive applications are serving, memory usage may increase due to burst traffic. Similarly, there may be +similar scenarios for best-effort workloads, for example, the current computing load exceeds the expected resource +Request/Limit. -These scenarios will lead to an increase in the overall memory usage of the node, which will have an unpredictable impact on the runtime stability of the node side. For example, it can reduce the quality of service of latency-sensitiv applications or even become unavailable. Especially in a co-location environment, it is more challenging. +These scenarios will lead to an increase in the overall memory usage of the node, which will have an unpredictable +impact on the runtime stability of the node side. For example, it can reduce the quality of service of latency-sensitive +applications or even become unavailable. Especially in a co-location environment, it is more challenging. We implemented an active eviction mechanism based on memory safety thresholds in Koodinator. -`koordlet` will regularly check the recent memory usage of node and Pods to check whether the safty threshold is exceeded. If it exceeds, it will evict some best-effort Pods to release memory. This mechanism can better ensure the stability of node and latency-sensitiv applications. +`koordlet` will regularly check the recent memory usage of node and Pods to check whether the safety threshold is +exceeded. If it exceeds, it will evict some best-effort Pods to release memory. This mechanism can better ensure the +stability of node and latency-sensitive applications. -`koordlet` currently only evicts best-effort Pods, sorted according to the Priority specified in the Pod Spec. The lower the priority, the higher the priority to be evicted, the same priority will be sorted according to the memory usage rate (RSS), the higher the memory usage, the higher the priority to be evicted. This eviction selection algorithm is not static. More dimensions will be considered in the future, and more refined implementations will be implemented for more scenarios to achieve more reasonable evictions. +`koordlet` currently only evicts best-effort Pods, sorted according to the Priority specified in the Pod Spec. The lower +the priority, the higher the priority to be evicted, the same priority will be sorted according to the memory usage +rate (RSS), the higher the memory usage, the higher the priority to be evicted. This eviction selection algorithm is not +static. More dimensions will be considered in the future, and more refined implementations will be implemented for more +scenarios to achieve more reasonable evictions. -The current memory utilization safety threshold default value is 70%. You can modify the `memoryEvictThresholdPercent` in ConfigMap `slo-controller-config` according to the actual situation, +The current memory utilization safety threshold default value is 70%. You can modify the `memoryEvictThresholdPercent` +in ConfigMap `slo-controller-config` according to the actual situation, ```yaml apiVersion: v1 @@ -46,9 +59,11 @@ data: ## v0.1.0 -### Node Metrics +### Node Metrics -Koordinator defines the `NodeMetrics` CRD, which is used to record the resource utilization of a single node and all Pods on the node. koordlet will regularly report and update `NodeMetrics`. You can view `NodeMetrics` with the following commands. +Koordinator defines the `NodeMetrics` CRD, which is used to record the resource utilization of a single node and all +Pods on the node. koordlet will regularly report and update `NodeMetrics`. You can view `NodeMetrics` with the following +commands. ```shell $ kubectl get nodemetrics node-1 -o yaml @@ -78,10 +93,12 @@ status: ### Colocation Resources -After the Koordinator is deployed in the K8s cluster, the Koordinator will calculate the CPU and Memory resources that have been allocated but not used according to the data of `NodeMetrics`. These resources are updated in Node in the form of extended resources. +After the Koordinator is deployed in the K8s cluster, the Koordinator will calculate the CPU and Memory resources that +have been allocated but not used according to the data of `NodeMetrics`. These resources are updated in Node in the form +of extended resources. -`koordinator.sh/batch-cpu` represents the CPU resources for Best Effort workloads, -`koordinator.sh/batch-memory` represents the Memory resources for Best Effort workloads. +`koordinator.sh/batch-cpu` represents the CPU resources for Best Effort workloads, +`koordinator.sh/batch-memory` represents the Memory resources for Best Effort workloads. You can view these resources with the following commands. @@ -105,10 +122,11 @@ Allocatable: pods: 64 ``` - ### Cluster-level Colocation Profile -In order to make it easier for everyone to use Koordinator to co-locate different workloads, we defined `ClusterColocationProfile` to help gray workloads use co-location resources. A `ClusterColocationProfile` is CRD like the one below. Please do edit each parameter to fit your own use cases. +In order to make it easier for everyone to use Koordinator to co-locate different workloads, we +defined `ClusterColocationProfile` to help gray workloads use co-location resources. A `ClusterColocationProfile` is CRD +like the one below. Please do edit each parameter to fit your own use cases. ```yaml apiVersion: config.koordinator.sh/v1alpha1 @@ -128,33 +146,46 @@ spec: schedulerName: koord-scheduler labels: koordinator.sh/mutated: "true" - annotations: + annotations: koordinator.sh/intercepted: "true" patch: spec: terminationGracePeriodSeconds: 30 ``` -Various Koordinator components ensure scheduling and runtime quality through labels `koordinator.sh/qosClass`, `koordinator.sh/priority` and kubernetes native priority. +Various Koordinator components ensure scheduling and runtime quality through labels `koordinator.sh/qosClass` +, `koordinator.sh/priority` and kubernetes native priority. -With the webhook mutating mechanism provided by Kubernetes, koord-manager will modify Pod resource requirements to co-located resources, and inject the QoS and Priority defined by Koordinator into Pod. +With the webhook mutating mechanism provided by Kubernetes, koord-manager will modify Pod resource requirements to +co-located resources, and inject the QoS and Priority defined by Koordinator into Pod. -Taking the above Profile as an example, when the Spark Operator creates a new Pod in the namespace with the `koordinator.sh/enable-colocation=true` label, the Koordinator QoS label `koordinator.sh/qosClass` will be injected into the Pod. According to the Profile definition PriorityClassName, modify the Pod's PriorityClassName and the corresponding Priority value. Users can also set the Koordinator Priority according to their needs to achieve more fine-grained priority management, so the Koordinator Priority label `koordinator.sh/priority` is also injected into the Pod. Koordinator provides an enhanced scheduler koord-scheduler, so you need to modify the Pod's scheduler name koord-scheduler through Profile. +Taking the above Profile as an example, when the Spark Operator creates a new Pod in the namespace with +the `koordinator.sh/enable-colocation=true` label, the Koordinator QoS label `koordinator.sh/qosClass` will be injected +into the Pod. According to the Profile definition PriorityClassName, modify the Pod's PriorityClassName and the +corresponding Priority value. Users can also set the Koordinator Priority according to their needs to achieve more +fine-grained priority management, so the Koordinator Priority label `koordinator.sh/priority` is also injected into the +Pod. Koordinator provides an enhanced scheduler koord-scheduler, so you need to modify the Pod's scheduler name +koord-scheduler through Profile. -If you expect to integrate Koordinator into your own system, please learn more about the [core concepts](/docs/core-concepts/architecture). +If you expect to integrate Koordinator into your own system, please learn more about +the [core concepts](/docs/core-concepts/architecture). ### CPU Suppress -In order to ensure the runtime quality of different workloads in co-located scenarios, Koordinator uses the CPU Suppress mechanism provided by koordlet on the node side to suppress workloads of the Best Effort type when the load increases. Or increase the resource quota for Best Effort type workloads when the load decreases. +In order to ensure the runtime quality of different workloads in co-located scenarios, Koordinator uses the CPU Suppress +mechanism provided by koordlet on the node side to suppress workloads of the Best Effort type when the load increases. +Or increase the resource quota for Best Effort type workloads when the load decreases. -When installing through the helm chart, the ConfigMap `slo-controller-config` will be created in the koordinator-system namespace, and the CPU Suppress mechanism is enabled by default. If it needs to be closed, refer to the configuration below, and modify the configuration of the resource-threshold-config section to take effect. +When installing through the helm chart, the ConfigMap `slo-controller-config` will be created in the koordinator-system +namespace, and the CPU Suppress mechanism is enabled by default. If it needs to be closed, refer to the configuration +below, and modify the configuration of the resource-threshold-config section to take effect. ```yaml apiVersion: v1 kind: ConfigMap metadata: name: slo-controller-config - namespace: {{ .Values.installation.namespace }} + namespace: {{.Values.installation.namespace}} data: ... resource-threshold-config: | @@ -166,4 +197,7 @@ data: ``` ### Colocation Resources Balance -Koordinator currently adopts a strategy for node co-location resource scheduling, which prioritizes scheduling to machines with more resources remaining in co-location to avoid Best Effort workloads crowding together. More rich scheduling capabilities are on the way. + +Koordinator currently adopts a strategy for node co-location resource scheduling, which prioritizes scheduling to +machine with more resources remaining in co-location to avoid Best Effort workloads crowding together. More rich +scheduling capabilities are on the way. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 3de026b0f..88d445513 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -11,7 +11,7 @@ part of the Koordinator community. To be honest, we regard every user of Koordinator as a very kind contributor. After experiencing Koordinator, you may have some feedback for the project. Then feel free to open an issue. -There are lot of cases when you could open an issue: +There are lots of cases when you could open an issue: - bug report - feature request diff --git a/README.md b/README.md index 74dcb03a8..dfee2caa6 100644 --- a/README.md +++ b/README.md @@ -12,12 +12,12 @@ Koordinator is a QoS based scheduling system for hybrid orchestration workloads on Kubernetes. It aims to improve the runtime efficiency and reliability of both latency sensitive workloads and batch jobs, simplify the complexity of -resource-related configuration tuning, and increase pod deployment density to improve resource utilizations. +resource-related configuration tuning, and increase pod deployment density to improve resource utilization. Koordinator enhances the kubernetes user experiences in the workload management by providing the following: - Well-designed priority and QoS mechanism to co-locate different types of workloads in a cluster, a node. -- Allowing for resource overcommitments to achieve high resource utilizations but still satisfying the QoS guarantees by +- Allowing for resource overcommitments to achieve high resource utilization but still satisfying the QoS guarantees by leveraging an application profiling mechanism. - Fine-grained resource orchestration and isolation mechanism to improve the efficiency of latency-sensitive workloads and batch jobs.