Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster resource modeling #2367

Merged
merged 1 commit into from
Aug 23, 2022

Conversation

halfrost
Copy link
Contributor

@halfrost halfrost commented Aug 12, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it:
This PR introduced a feature of cluster resource modeling

Which issue(s) this PR fixes:
Part of #772

Special notes for your reviewer:
This pr is for issue #772 cluster resource modeling code implementation part.

Does this PR introduce a user-facing change?:
None

My solution is the picture below:

The system defaults to 11 grade levels. The grading of each grade is as shown above. For example, the resource configuration of grade =2 model is as follows

resourceModel:
  - grade:
      cpu: "2"
      memory: 4Gi
    count: 34

grade = 2, which means cpu belongs to (2C, 4C], memory belongs to (2G, 4G], left open and right closed.

Users can also customize modeling. The user passes in a custom modeling. During initialization, it will be initialized according to the user's configuration. Users can customize multiple levels. Such as cpu, memory, gpu and so on. The system will sort them in order of priority. For example, if the user defines the order of cpu, memoryu, gpu. Then when comparing, the CPU will be compared first. If the cpu is the same, then compare the memory, and finally compare the gpu. modeling in the order of this comparison

Implementation:

The data structure shown in the figure is maintained in a one-dimensional array. There are 3 values ​​stored in item in the array, one value is count. This value indicates how many there are in this model. In addition, 2 pointers are stored. They point to a doubly linked list and a red-black tree, respectively.

In a paper, someone tested that when the number of nodes is less than 6, it is faster to look up nodes in a doubly linked list than in a red-black tree. So less than 6 nodes use double linked list. Red-black trees are used when there are more than 6 nodes. In the implementation of java's red-black tree, there is also such a design.

When the number of nodes is less than 6, the doubly linked list inserts nodes from small to large according to the size of the model. When the node is greater than 6 hours, first convert the double-linked list into a red-black tree, and then insert nodes into the red-black tree. A red-black tree is an AVL tree, which guarantees ordering. So the time complexity of insert, update and delete operations is O(logn)

When adding, deleting, and updating a resource model, insert nodes into this data structure. The data structure will count and maintain the count value of resourceModel.

@karmada-bot
Copy link
Collaborator

Welcome @halfrost! It looks like this is your first PR to karmada-io/karmada 🎉

@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Aug 12, 2022
@Poor12
Copy link
Member

Poor12 commented Aug 12, 2022

Hi, @halfrost, please check DCO and other CI failure.

@Poor12
Copy link
Member

Poor12 commented Aug 15, 2022

cc @RainbowMango to start CI

// +required
Quantity int

// when the the number of node is less than or equal to six, it will be sorted by linkedlist,
Copy link
Member

@Poor12 Poor12 Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the number

Please check the grammar in the notes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it's done.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked at the API part.

We also need to add the API to pkg/apis/cluster/v1alpha1.

@@ -104,6 +105,65 @@ type ClusterSpec struct {
// any resource that does not tolerate the Taint.
// +optional
Taints []corev1.Taint

// The model list of resource modeling in this cluster. Each modeling name and quota can be customized by the user.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should be started with ResourceModels xxx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each modeling name and quota can be customized by the user.

Does the modeling name can be customized? I guess the name should be allowed enumerated values.
Otherwise, for the name my-customized-resource, how do we know what it is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. What resource names are needed in the current production environment? In this version, I only support 4 kinds of resources. If there needs more resource name, I will customize an enum structure to support more resource names.

const (
	// CPU, in cores. (500m = .5 cores)
	ResourceCPU ResourceName = "cpu"
	// Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	ResourceMemory ResourceName = "memory"
	// Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024)
	ResourceStorage ResourceName = "storage"
	// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
	ResourceEphemeralStorage ResourceName = "ephemeral-storage"
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the extension name is defined by user, diffierent user may has different custom names, we can not predict the users' extension name

Comment on lines 290 to 353
Grade int
Count int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need comments here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@RainbowMango
Copy link
Member

Hi @halfrost
@Poor12 created an issue(#2379) to track the whole task, he will work with you to finish it.

if diff < 0 {
return -1
}
if diff > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something wrong with the comparison function. For example, if a's limit is 2C 5G, b is 1C 8G, that will be CPU diff > 0, memory diff < 0. So I think should not return directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also want to discuss this part with you. I'm comparing the quota by priority. Assuming that the priority of the cpu is high, the cpu is small means that this resource quota's priority is low.

But now there is a situation, as you said. If cpu is small, mem is large. So should this be considered big or small? How do you think about this situation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you share your slack with me? we can talk about it on slack.


// Ranges describes the resource quota ranges.
// +optional
Ranges []ResourceModelItem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question:whether the order of the range represents the order of the following sort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ranges is the quota for different modeling resources. For example, assume that grade = 2, the resource corresponding to this modeling has 3 indicators, cpu, gpu, mem. Then Ranges is the value range of these three indicators.

type ResourceModelItem struct {
// Name is the name for the resource that you want to categorize.
// +optional
Name string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name should be enum type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss this line. For now, this structure can support 4 kinds of resources. In the actual production environment, are there more resource names? If there is, I will customize an enum structure to support more resource names.

const (
	// CPU, in cores. (500m = .5 cores)
	ResourceCPU ResourceName = "cpu"
	// Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	ResourceMemory ResourceName = "memory"
	// Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024)
	ResourceStorage ResourceName = "storage"
	// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
	// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
	ResourceEphemeralStorage ResourceName = "ephemeral-storage"
)

// min: 1024 GB
// max: MAXINT
// +optional
ResourceModels []ResourceModel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

types.go in apis/cluster/v1alpha1 also needs to be modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@halfrost
Copy link
Contributor Author

Hi @halfrost @Poor12 created an issue(#2379) to track the whole task, he will work with you to finish it.

Sounds great. We will try our best to complete this task together.

@halfrost halfrost force-pushed the cluster-resource-modeling branch 3 times, most recently from 27aff8a to 56e4db8 Compare August 16, 2022 04:38
@halfrost
Copy link
Contributor Author

@Poor12 I have already split API part out of this pr. Please review API part in this pr #2386

// Although the quota of each resource modeling is an interval. But the right boundary of each interval never coincides with the left boundary of the next interval.
// If the two overlap, it will cause ambiguity, and the modeling in the overlapping interval will belong to multiple intervals, which will cause an error.
// Then we can mark the interval only with the left boundary of each interval.
DefaultModel = []ResourceList{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unit of Default model is uncorrect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 14 to 60
type ResourceName string

// Resource names must be not more than 63 characters, consisting of upper- or lower-case alphanumeric characters,
// with the -, _, and . characters allowed anywhere, except the first or last character.
// The default convention, matching that for annotations, is to use lower-case names, with dashes, rather than
// camel case, separating compound words.
// Fully-qualified resource typenames are constructed from a DNS-style subdomain, followed by a slash `/` and a name.
const (
// CPU, in cores. (500m = .5 cores)
ResourceCPU ResourceName = "cpu"
// Memory, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
ResourceMemory ResourceName = "memory"
// Volume size, in bytes (e,g. 5Gi = 5GiB = 5 * 1024 * 1024 * 1024)
ResourceStorage ResourceName = "storage"
// Local ephemeral storage, in bytes. (500Gi = 500GiB = 500 * 1024 * 1024 * 1024)
// The resource name for ResourceEphemeralStorage is alpha and it can change across releases.
ResourceEphemeralStorage ResourceName = "ephemeral-storage"
)

// ResourceList is a set of (resource name, quantity) pairs.
type ResourceList map[ResourceName]resource.Quantity

// ResourceModel describes the modeling that you want to statistics.
type ResourceModel struct {
// Grade is the index for the resource modeling.
// +optional
Grade int

// Ranges describes the resource quota ranges.
// +optional
Ranges []ResourceModelItem
}

// ResourceModelItem describes the detail of each modeling quota that ranges from min to max.
type ResourceModelItem struct {
// Name is the name for the resource that you want to categorize.
// +optional
Name ResourceName

// Min is the minimum amount of this resource represented by resource name。
// +optional
Min resource.Quantity

// Max is the maximum amount of this resource represented by resource name。
// +optional
Max resource.Quantity
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values should come from cluster.spec.ResourceModel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// quantity is the the number of this node
// Only when the resourceLists are exactly the same can they be counted as the same node.
// +required
quantity int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quantity may unneccassary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is impossible to insert multiple resource nodes at once. But if the same resource node is inserted, it needs to be accumulated on this node. For example, if I insert a resource that has cpu 2c memory 4G 2 times, I need to accumulate it in clusterResourceNode.

// It maybe contain cpu, mrmory, gpu...
// User can specify which parameters need to be included before the cluster starts
// +required
resourceList ResourceList
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corev1.ResourceList

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here need to use a custom ResourceList data structure. Because ResourceName is an enum maintained by API. The key of ResourceList is ResourceName, which is a custom data structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we need to provide a method to convert corev1.resourcelist to Resourcelist.

@halfrost halfrost force-pushed the cluster-resource-modeling branch 10 times, most recently from fb9175c to 3a4a52e Compare August 21, 2022 01:21
@halfrost halfrost force-pushed the cluster-resource-modeling branch 6 times, most recently from e200948 to 7d37774 Compare August 22, 2022 03:54
Comment on lines 36 to 43
// import (
// "k8s.io/client-go/kubernetes"
// clientsetscheme "k8s.io/client-go/kubernetes/scheme"
// aggregatorclientsetscheme "k8s.io/kube-aggregator/pkg/client/clientset_generated/clientset/scheme"
// )
//
// kclientset, _ := kubernetes.NewForConfig(c)
// _ = aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)
// kclientset, _ := kubernetes.NewForConfig(c)
// _ = aggregatorclientsetscheme.AddToScheme(clientsetscheme.Scheme)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not expected, and this might be the reason why CI fails.

Copy link
Member

@Poor12 Poor12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions remaining.

for index := 0; index < len(rsList); index++ {
for i, name := range rsName {
tmpQuantity := rsList[index][name]
quantityNum, ok := tmpQuantity.AsInt64()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to consider the milliquantity of the CPU.

klog.Infof("ModelComparator: Unable to parse the values of %v's quantity2 in the cluster", defaultModelSorting[index])
}
diff = quantity1 - quantity2
if diff < 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the value judgment mainly depends on the resource with the highest priority, which may lead to some misjudgments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You said internal sorting rule can be ignored.

@RainbowMango
Copy link
Member

@halfrost I tried to fix the CI failure and pushed the commits to your branch on your forked GitHub repo.

Now the CI failure is due to #2404 and is not available in the period August 22, 12:00 UTC - August 22, 16:00 UTC.
You can rebase your branch and push again, or I'll retrigger it tomorrow.

@halfrost halfrost force-pushed the cluster-resource-modeling branch 9 times, most recently from f69612d to baae617 Compare August 23, 2022 08:55
Signed-off-by: halfrost <ydz627@gmail.com>
@Poor12
Copy link
Member

Poor12 commented Aug 23, 2022

Generallly look good to me,cc @RainbowMango

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Generally looks good to me.
Given we have a lot of work to do, I'll move this forward and fix nits by following PRs.
/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Aug 23, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 23, 2022
@karmada-bot karmada-bot merged commit aa2419c into karmada-io:master Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants