Cluster resource modeling #2367

halfrost · 2022-08-12T01:27:00Z

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduced a feature of cluster resource modeling

Which issue(s) this PR fixes:
Part of #772

Special notes for your reviewer:
This pr is for issue #772 cluster resource modeling code implementation part.

Does this PR introduce a user-facing change?:
None

My solution is the picture below:

The system defaults to 11 grade levels. The grading of each grade is as shown above. For example, the resource configuration of grade =2 model is as follows

resourceModel:
  - grade:
      cpu: "2"
      memory: 4Gi
    count: 34

grade = 2, which means cpu belongs to (2C, 4C], memory belongs to (2G, 4G], left open and right closed.

Users can also customize modeling. The user passes in a custom modeling. During initialization, it will be initialized according to the user's configuration. Users can customize multiple levels. Such as cpu, memory, gpu and so on. The system will sort them in order of priority. For example, if the user defines the order of cpu, memoryu, gpu. Then when comparing, the CPU will be compared first. If the cpu is the same, then compare the memory, and finally compare the gpu. modeling in the order of this comparison

Implementation:

The data structure shown in the figure is maintained in a one-dimensional array. There are 3 values stored in item in the array, one value is count. This value indicates how many there are in this model. In addition, 2 pointers are stored. They point to a doubly linked list and a red-black tree, respectively.

In a paper, someone tested that when the number of nodes is less than 6, it is faster to look up nodes in a doubly linked list than in a red-black tree. So less than 6 nodes use double linked list. Red-black trees are used when there are more than 6 nodes. In the implementation of java's red-black tree, there is also such a design.

When the number of nodes is less than 6, the doubly linked list inserts nodes from small to large according to the size of the model. When the node is greater than 6 hours, first convert the double-linked list into a red-black tree, and then insert nodes into the red-black tree. A red-black tree is an AVL tree, which guarantees ordering. So the time complexity of insert, update and delete operations is O(logn)

When adding, deleting, and updating a resource model, insert nodes into this data structure. The data structure will count and maintain the count value of resourceModel.

karmada-bot · 2022-08-12T01:27:10Z

Welcome @halfrost! It looks like this is your first PR to karmada-io/karmada 🎉

Poor12 · 2022-08-12T01:47:30Z

Hi, @halfrost, please check DCO and other CI failure.

Poor12 · 2022-08-15T08:17:08Z

cc @RainbowMango to start CI

Poor12 · 2022-08-15T09:22:36Z

pkg/modeling/modeling.go

+	// +required
+	Quantity int
+
+	// when the the number of node is less than or equal to six, it will be sorted by linkedlist,


when the number

Please check the grammar in the notes.

Sorry, it's done.

RainbowMango

I just looked at the API part.

We also need to add the API to pkg/apis/cluster/v1alpha1.

RainbowMango · 2022-08-15T10:09:12Z

pkg/apis/cluster/types.go

@@ -104,6 +105,65 @@ type ClusterSpec struct {
 	// any resource that does not tolerate the Taint.
 	// +optional
 	Taints []corev1.Taint
+
+	// The model list of resource modeling in this cluster. Each modeling name and quota can be customized by the user.


Comments should be started with ResourceModels xxx

Each modeling name and quota can be customized by the user.

Does the modeling name can be customized? I guess the name should be allowed enumerated values.
Otherwise, for the name my-customized-resource, how do we know what it is?

You are right. What resource names are needed in the current production environment? In this version, I only support 4 kinds of resources. If there needs more resource name, I will customize an enum structure to support more resource names.

const ( // CPU, in cores. (500m = .5 cores) ResourceCPU ResourceName = "cpu" // Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) ResourceMemory ResourceName = "memory" // Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024) ResourceStorage ResourceName = "storage" // Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) // The resource name for ResourceEphemeralStorage is alpha and it can change across releases. ResourceEphemeralStorage ResourceName = "ephemeral-storage" )

the extension name is defined by user, diffierent user may has different custom names, we can not predict the users' extension name

RainbowMango · 2022-08-15T10:13:54Z

pkg/apis/cluster/types.go

+	Grade int
+	Count int


Need comments here.

RainbowMango · 2022-08-15T10:17:58Z

Hi @halfrost
@Poor12 created an issue(#2379) to track the whole task, he will work with you to finish it.

Poor12 · 2022-08-15T11:15:31Z

pkg/modeling/modeling.go

+		if diff < 0 {
+			return -1
+		}
+		if diff > 0 {


There is something wrong with the comparison function. For example, if a's limit is 2C 5G, b is 1C 8G, that will be CPU diff > 0, memory diff < 0. So I think should not return directly.

I also want to discuss this part with you. I'm comparing the quota by priority. Assuming that the priority of the cpu is high, the cpu is small means that this resource quota's priority is low.

But now there is a situation, as you said. If cpu is small, mem is large. So should this be considered big or small? How do you think about this situation?

Could you share your slack with me? we can talk about it on slack.

Poor12 · 2022-08-16T01:53:27Z

pkg/apis/cluster/types.go

+
+	// Ranges describes the resource quota ranges.
+	// +optional
+	Ranges []ResourceModelItem


One question：whether the order of the range represents the order of the following sort？

Ranges is the quota for different modeling resources. For example, assume that grade = 2, the resource corresponding to this modeling has 3 indicators, cpu, gpu, mem. Then Ranges is the value range of these three indicators.

Poor12 · 2022-08-16T01:53:55Z

pkg/apis/cluster/types.go

+type ResourceModelItem struct {
+	// Name is the name for the resource that you want to categorize.
+	// +optional
+	Name string


Name should be enum type

Let's discuss this line. For now, this structure can support 4 kinds of resources. In the actual production environment, are there more resource names? If there is, I will customize an enum structure to support more resource names.

const ( // CPU, in cores. (500m = .5 cores) ResourceCPU ResourceName = "cpu" // Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) ResourceMemory ResourceName = "memory" // Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024) ResourceStorage ResourceName = "storage" // Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024) // The resource name for ResourceEphemeralStorage is alpha and it can change across releases. ResourceEphemeralStorage ResourceName = "ephemeral-storage" )

Poor12 · 2022-08-16T01:55:14Z

pkg/apis/cluster/types.go

+	//     min: 1024 GB
+	//     max: MAXINT
+	// +optional
+	ResourceModels []ResourceModel


types.go in apis/cluster/v1alpha1 also needs to be modified.

halfrost · 2022-08-16T03:13:40Z

Hi @halfrost @Poor12 created an issue(#2379) to track the whole task, he will work with you to finish it.

Sounds great. We will try our best to complete this task together.

halfrost · 2022-08-16T04:39:38Z

@Poor12 I have already split API part out of this pr. Please review API part in this pr #2386

Poor12 · 2022-08-19T02:36:55Z

pkg/modeling/modeling.go

+	// Although the quota of each resource modeling is an interval. But the right boundary of each interval never coincides with the left boundary of the next interval.
+	// If the two overlap, it will cause ambiguity, and the modeling in the overlapping interval will belong to multiple intervals, which will cause an error.
+	// Then we can mark the interval only with the left boundary of each interval.
+	DefaultModel = []ResourceList{


unit of Default model is uncorrect.

Poor12 · 2022-08-19T02:37:44Z

pkg/modeling/modeling.go

+type ResourceName string
+
+// Resource names must be not more than 63 characters, consisting of upper- or lower-case alphanumeric characters,
+// with the -, _, and . characters allowed anywhere, except the first or last character.
+// The default convention, matching that for annotations, is to use lower-case names, with dashes, rather than
+// camel case, separating compound words.
+// Fully-qualified resource typenames are constructed from a DNS-style subdomain, followed by a slash `/` and a name.
+const (
+	// CPU, in cores. (500m = .5 cores)
+	ResourceCPU ResourceName = "cpu"
+	// Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
+	ResourceMemory ResourceName = "memory"
+	// Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024)
+	ResourceStorage ResourceName = "storage"
+	// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
+	// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
+	ResourceEphemeralStorage ResourceName = "ephemeral-storage"
+)
+
+// ResourceList is a set of (resource name, quantity) pairs.
+type ResourceList map[ResourceName]resource.Quantity
+
+// ResourceModel describes the modeling that you want to statistics.
+type ResourceModel struct {
+	// Grade is the index for the resource modeling.
+	// +optional
+	Grade int
+
+	// Ranges describes the resource quota ranges.
+	// +optional
+	Ranges []ResourceModelItem
+}
+
+// ResourceModelItem describes the detail of each modeling quota that ranges from min to max.
+type ResourceModelItem struct {
+	// Name is the name for the resource that you want to categorize.
+	// +optional
+	Name ResourceName
+
+	// Min is the minimum amount of this resource represented by resource name。
+	// +optional
+	Min resource.Quantity
+
+	// Max is the maximum amount of this resource represented by resource name。
+	// +optional
+	Max resource.Quantity
+}


These values should come from cluster.spec.ResourceModel.

Poor12 · 2022-08-19T02:38:15Z

pkg/modeling/modeling.go

+	// quantity is the the number of this node
+	// Only when the resourceLists are exactly the same can they be counted as the same node.
+	// +required
+	quantity int


Quantity may unneccassary

Although it is impossible to insert multiple resource nodes at once. But if the same resource node is inserted, it needs to be accumulated on this node. For example, if I insert a resource that has cpu 2c memory 4G 2 times, I need to accumulate it in clusterResourceNode.

Poor12 · 2022-08-19T02:38:27Z

pkg/modeling/modeling.go

+	// It maybe contain cpu, mrmory, gpu...
+	// User can specify which parameters need to be included before the cluster starts
+	// +required
+	resourceList ResourceList


corev1.ResourceList

Here need to use a custom ResourceList data structure. Because ResourceName is an enum maintained by API. The key of ResourceList is ResourceName, which is a custom data structure.

Then we need to provide a method to convert corev1.resourcelist to Resourcelist.

RainbowMango · 2022-08-22T04:02:24Z

pkg/generated/clientset/versioned/fake/register.go

+//	import (
+//	  "k8s.io/client-go/kubernetes"
+//	  clientsetscheme "k8s.io/client-go/kubernetes/scheme"
+//	  aggregatorclientsetscheme "k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset/scheme"
+//	)
 //
-//   kclientset, _ := kubernetes.NewForConfig(c)
-//   _ = aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)
+//	kclientset, _ := kubernetes.NewForConfig(c)
+//	_ = aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)


This change is not expected, and this might be the reason why CI fails.

Poor12

Two questions remaining.

Poor12 · 2022-08-22T09:17:44Z

pkg/modeling/modeling.go

+	for index := 0; index < len(rsList); index++ {
+		for i, name := range rsName {
+			tmpQuantity := rsList[index][name]
+			quantityNum, ok := tmpQuantity.AsInt64()


This method needs to consider the milliquantity of the CPU.

Poor12 · 2022-08-22T09:18:45Z

pkg/modeling/modeling.go

+			klog.Infof("ModelComparator: Unable to parse the values of %v's quantity2 in the cluster", defaultModelSorting[index])
+		}
+		diff = quantity1 - quantity2
+		if diff < 0 {


Now the value judgment mainly depends on the resource with the highest priority, which may lead to some misjudgments.

You said internal sorting rule can be ignored.

RainbowMango · 2022-08-22T13:47:59Z

@halfrost I tried to fix the CI failure and pushed the commits to your branch on your forked GitHub repo.

Now the CI failure is due to #2404 and is not available in the period August 22, 12:00 UTC - August 22, 16:00 UTC.
You can rebase your branch and push again, or I'll retrigger it tomorrow.

Signed-off-by: halfrost <ydz627@gmail.com>

Poor12 · 2022-08-23T09:43:03Z

Generallly look good to me，cc @RainbowMango

RainbowMango

Thanks. Generally looks good to me.
Given we have a lot of work to do, I'll move this forward and fix nits by following PRs.
/lgtm
/approve

karmada-bot · 2022-08-23T12:45:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot requested review from RainbowMango and XiShanYongYe-Chang August 12, 2022 01:27

karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 12, 2022

halfrost force-pushed the cluster-resource-modeling branch from c351b4d to 548d3b3 Compare August 15, 2022 07:33

Poor12 mentioned this pull request Aug 15, 2022

[umbrella]Karmada cluster resource modeling implementation #2379

Closed

8 tasks

Poor12 reviewed Aug 15, 2022

View reviewed changes

RainbowMango reviewed Aug 15, 2022

View reviewed changes

Poor12 reviewed Aug 15, 2022

View reviewed changes

Poor12 reviewed Aug 16, 2022

View reviewed changes

halfrost force-pushed the cluster-resource-modeling branch 3 times, most recently from 27aff8a to 56e4db8 Compare August 16, 2022 04:38

chaunceyjiang mentioned this pull request Aug 16, 2022

Add cluster resource modeling api #2386

Merged

Poor12 reviewed Aug 19, 2022

View reviewed changes

halfrost force-pushed the cluster-resource-modeling branch 10 times, most recently from fb9175c to 3a4a52e Compare August 21, 2022 01:21

halfrost force-pushed the cluster-resource-modeling branch 6 times, most recently from e200948 to 7d37774 Compare August 22, 2022 03:54

RainbowMango reviewed Aug 22, 2022

View reviewed changes

Poor12 mentioned this pull request Aug 22, 2022

Make changes to cluster-status-controller to adopt cluster resource models #2402

Merged

Poor12 reviewed Aug 22, 2022

View reviewed changes

RainbowMango force-pushed the cluster-resource-modeling branch from 7d37774 to 30f25ea Compare August 22, 2022 13:41

halfrost force-pushed the cluster-resource-modeling branch 9 times, most recently from f69612d to baae617 Compare August 23, 2022 08:55

Add modeling implement

5b1265e

Signed-off-by: halfrost <ydz627@gmail.com>

halfrost force-pushed the cluster-resource-modeling branch from baae617 to 5b1265e Compare August 23, 2022 08:56

RainbowMango approved these changes Aug 23, 2022

View reviewed changes

karmada-bot assigned RainbowMango Aug 23, 2022

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 23, 2022

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2022

karmada-bot merged commit aa2419c into karmada-io:master Aug 23, 2022

chaunceyjiang mentioned this pull request Nov 23, 2023

resourceModels supports extended resources #4307

Merged

Cluster resource modeling #2367

Cluster resource modeling #2367

Conversation

halfrost commented Aug 12, 2022 • edited Loading

karmada-bot commented Aug 12, 2022

Poor12 commented Aug 12, 2022

Poor12 commented Aug 15, 2022

Poor12 Aug 15, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango commented Aug 15, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

halfrost commented Aug 16, 2022

halfrost commented Aug 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Poor12 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RainbowMango commented Aug 22, 2022

Poor12 commented Aug 23, 2022

RainbowMango left a comment

Choose a reason for hiding this comment

karmada-bot commented Aug 23, 2022

halfrost commented Aug 12, 2022 •

edited

Loading

Poor12 Aug 15, 2022 •

edited

Loading