[proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set #475

fjding · 2023-01-07T09:35:15Z

The kubernetes support pod-deletion-cost after v1.21， in my cloud scene， the user have the demands like this：
1、Define multiple logical node set, a deployment workload can schedule pods according to this node set order, and shrink in the opposite order at the same time.
2、At the same time, it also supports the maximum number of schedulable pods per node set。
BTW, I have implemented this feature and want to contribute it to the community, I hope everyone can discuss it together

KunWuLuan · 2023-04-03T02:58:10Z

My company also have a similar plugin like yours. We can find a time to have a discuss.

fjding · 2023-04-03T03:06:58Z

My company also have a similar plugin like yours. We can find a time to have a discuss.

Hi, we can collaborate on this proposal

fjding · 2023-04-03T03:46:47Z

@Huang-Wei @ffromani @seanmalloy @denkensk
Could you take a look at this proposal? We can discuss whether we need to create a keps

ffromani · 2023-04-03T05:01:14Z

@Huang-Wei @ffromani @seanmalloy @denkensk Could you take a look at this proposal? We can discuss whether we need to create a keps

I'll have a look later this week (beginning April 3 2023)

Huang-Wei · 2023-04-03T05:09:27Z

It will help us understanding the motivation(s) if you can elaborate on the real-world use-cases.

1、Define multiple logical node set, a deployment workload can schedule pods according to this node set order, and shrink in the opposite order at the same time.

What do you mean by "node set order"? is that a priority field of the NodeSet CR?

How a deployment's replicas are expected to be scheduled onto the matching NodeSets? and is the scheduling directives a hard or soft constraint?

2、At the same time, it also supports the maximum number of schedulable pods per node set

Where is this max num defined?

fjding · 2023-04-03T07:25:42Z

It will help us understanding the motivation(s) if you can elaborate on the real-world use-cases.

1、Define multiple logical node set, a deployment workload can schedule pods according to this node set order, and shrink in the opposite order at the same time.

What do you mean by "node set order"? is that a priority field of the NodeSet CR?

How a deployment's replicas are expected to be scheduled onto the matching NodeSets? and is the scheduling directives a hard or soft constraint?

2、At the same time, it also supports the maximum number of schedulable pods per node set

Where is this max num defined?

Hi, Thank you for your attention！
The motivation：
In cloud scenarios, some users prefer to use ECS first. When ECS is insufficient, they will consider using elastic containers, such as Alibaba Cloud's ECI. Because the cost of using ecs will be lower than the cost of eci.
@KunWuLuan Can you add your usage scenarios?

We will define CRD named ResourcePoliy, it's CR instance as follows:

Because ecs-pool is ranked before eci-pool, pods will be scheduled to ecs-pool first. If the number of pods scheduled into ecs-pool exceeds 100, pods will be scheduled to eci-pool

KunWuLuan · 2023-04-03T12:25:23Z

In our company's scenario, customers will deploy both spot instances and pay-as-you-go instances simultaneously. Customers want their business to run on spot instances first to save costs, and when spot instance resources are insufficient, they will run on pay-as-you-go instances. Moreover, during business peak periods, when neither type of instance has resources, the business Pod will be scheduled to ECI nodes.
In this case, they will deploy a resourcepolicy as follows:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: xxx
  namespace: xxx
spec:
  selector:
    key1: value1
  strategy: prefer
  units:
  - resource: ecs
    nodeSelector:
      type: spot
  - resource: ecs
    nodeSelector:
      type: pay-as-you-go
  - resource: eci

Huang-Wei · 2023-04-04T00:02:24Z

It seems @KunWuLuan is talking about the Alibaba cloud's feature described here: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/configure-priority-based-resource-scheduling. And @fjding is talking about a similar in-house implementation? (the design of maxReplicas is a bit strange though).

I'm open to host an abstracted version in scheduler-plugins.

BTW, not sure how you guys implement the node pool based preference, in scoring phase. My feeling is that to support it efficiently, we may need to bring some missing machinery to scheduler framework, you can check my comment in one of the sig-meeting: https://youtu.be/UhZBkFamoAg?t=1694

cc @denkensk

denkensk · 2023-04-04T03:23:51Z

It seems @KunWuLuan is talking about the Alibaba cloud's feature described here: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/configure-priority-based-resource-scheduling. And @fjding is talking about a similar in-house implementation? (the design of maxReplicas is a bit strange though).

Hmmm. I know it. Actually, I am the author of this feature in Alibaba cloud 😄. It takes me a long time to think of this name ResourcePoliy 😄 @fjding Did you reference this implementation before?

denkensk · 2023-04-04T03:33:50Z

If the number of pods scheduled into ecs-pool exceeds 100, pods will be scheduled to eci-pool.

Can you introduce your scenario for this? And Why do you need to schedule 100 to ecs-pool first? @fjding

denkensk · 2023-04-04T03:38:01Z

BTW, not sure how you guys implement the node pool based preference, in scoring phase. My feeling is that to support it efficiently, we may need to bring some missing machinery to scheduler framework, you can check my comment in one of the sig-meeting: https://youtu.be/UhZBkFamoAg?t=1694

Your comment is very useful in a real production environment. And I also care about this efficiency and memory usage if we need to memorize some history or status before. @Huang-Wei

fjding · 2023-04-04T03:51:15Z

It seems @KunWuLuan is talking about the Alibaba cloud's feature described here: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/configure-priority-based-resource-scheduling. And @fjding is talking about a similar in-house implementation? (the design of maxReplicas is a bit strange though).

I'm open to host an abstracted version in scheduler-plugins.

BTW, not sure how you guys implement the node pool based preference, in scoring phase. My feeling is that to support it efficiently, we may need to bring some missing machinery to scheduler framework, you can check my comment in one of the sig-meeting: https://youtu.be/UhZBkFamoAg?t=1694

cc @denkensk
The proposal I provided is being used on ByteDance's Volcano Engine, and the design was inspired by Alibaba Cloud's implementation. However, I personally think that maxReplicas is very useful, as in the following scenarios.

Users expect a Deployment's Pods to be distributed across different AZ (Available Zone) in a certain proportion.

A cluster has multiple AZs(Available Zones), and each AZ has a VK (virtual kubelet)，Users expect a Deployment's Pods to be distributed across different AZs in a certain proportion.

fjding · 2023-04-04T03:54:10Z

It seems @KunWuLuan is talking about the Alibaba cloud's feature described here: https://www.alibabacloud.com/help/en/container-service-for-kubernetes/latest/configure-priority-based-resource-scheduling. And @fjding is talking about a similar in-house implementation? (the design of maxReplicas is a bit strange though).

Hmmm. I know it. Actually, I am the author of this feature in Alibaba cloud 😄. It takes me a long time to think of this name ResourcePoliy 😄 @fjding Did you reference this implementation before?

@denkensk
Yes，The design was inspired by Alibaba Cloud's implementation，At the same time, some other functions were added

fjding · 2023-04-04T04:02:43Z

If the number of pods scheduled into ecs-pool exceeds 100, pods will be scheduled to eci-pool.

Can you introduce your scenario for this? And Why do you need to schedule 100 to ecs-pool first? @fjding

As I gave an example above, multi-AZ deployment is a good case， the openkruise also provides some cases.link

denkensk · 2023-04-04T06:41:36Z

A cluster has multiple AZs(Available Zones), and each AZ has a VK (virtual kubelet)，Users expect a Deployment's Pods to be distributed across different AZs in a certain proportion.
https://www.volcengine.com/docs/6460/177068

Thanks for your explanation @fjding . And I'm also glad that these ideas can be applied to your scenario. And also scheduler-plugins can be used in ByteDance's Volcano Engine

denkensk · 2023-04-04T06:58:34Z

And I think we also need to clarify the core requirements. If you want to deploy the pods across different AZs, why use Max other than Must? Because according to my experience, users always want to make sure the proportion is required other than prefer. @fjding

@KunWuLuan Do you have feedback from other users or more needs for a "resource policy"? We can discuss it here and make a more generic design together.

fjding · 2023-04-04T07:14:34Z

@denkensk Users often use multi-AZ scenarios for disaster recovery purposes. In elastic container scenarios, such as ByteDance's VCI, users cannot accurately predict the upper limit of VCI capacity. Therefore, they cannot disable the launch of a pod just because resources in one AZ are unavailable.

fjding · 2023-04-04T07:25:19Z

And I think we also need to clarify the core requirements. If you want to deploy the pods across different AZs, why use Max other than Must? Because according to my experience, users always want to make sure the proportion is required other than prefer. @fjding

@KunWuLuan Do you have feedback from other users or more needs for a "resource policy"? We can discuss it here and make a more generic design together.

BTW, the strategy is required and maxReplica can meet the scenario you mentioned “Must".

KunWuLuan · 2023-04-04T07:29:52Z

Do you have feedback from other users or more needs for a "resource policy"?

In my cloud scene. Our users will use ResourcePolicy to run a fixed number of Pods on ECS nodes (like maxReplicas in this design) and schedule the Pods that are scaled out during peak periods to Spot instances or ECI .

fjding · 2023-04-07T02:18:26Z

@Huang-Wei @denkensk @ffromani
After the above discussion, do you have any other questions? Can we now propose a complete KEP?
cc @KunWuLuan

fjding · 2023-04-20T09:45:28Z

@Huang-Wei
Hi, Are there any other issues with this proposal? If not, can we proceed with writing a KEPS document?

Huang-Wei · 2023-04-21T02:46:08Z

Sure, please go ahead to raise a KEP. We can continue the discussion in the KEP. Just keep in mind this repo is more focusing on the scheduling portion, and may leave the discussion of CRD spec details outside.

fjding · 2023-04-21T02:58:03Z

Sure, please go ahead to raise a KEP. We can continue the discussion in the KEP. Just keep in mind this repo is more focusing on the scheduling portion, and may leave the discussion of CRD spec details outside.

Thanks, @KunWuLuan we can do it together now

KunWuLuan · 2023-05-25T01:43:28Z

@fjding Hi, I have submit a draft for this feature.

k8s-triage-robot · 2024-01-21T05:21:00Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KunWuLuan · 2024-01-22T11:47:13Z

/remove-lifecycle stale

KunWuLuan · 2024-02-27T08:24:40Z

This CRD is widely used in both my company and fjding's, we have selected the same features we meet for our customers in the proposal. So we think that the CRD that we described in the proposal is a stable version and it will not be updated frequently.
Maybe we can host this CRD in scheduler-plugins instead of other place. HDYT? @fjding
cc @ffromani @Huang-Wei

k8s-triage-robot · 2024-05-27T08:49:19Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

KunWuLuan · 2024-05-27T11:50:53Z

/remove-lifecycle stale

fjding changed the title ~~add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set~~ [proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set Jan 7, 2023

KunWuLuan mentioned this issue May 25, 2023

[KEP] Add resource policy plugin #594

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 21, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 22, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set #475

[proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set #475

fjding commented Jan 7, 2023 •

edited

Loading

KunWuLuan commented Apr 3, 2023

fjding commented Apr 3, 2023

fjding commented Apr 3, 2023

ffromani commented Apr 3, 2023

Huang-Wei commented Apr 3, 2023

fjding commented Apr 3, 2023

KunWuLuan commented Apr 3, 2023 •

edited

Loading

Huang-Wei commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

KunWuLuan commented Apr 4, 2023

fjding commented Apr 7, 2023

fjding commented Apr 20, 2023

Huang-Wei commented Apr 21, 2023

fjding commented Apr 21, 2023

KunWuLuan commented May 25, 2023

k8s-triage-robot commented Jan 21, 2024

KunWuLuan commented Jan 22, 2024

KunWuLuan commented Feb 27, 2024

k8s-triage-robot commented May 27, 2024

KunWuLuan commented May 27, 2024

[proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set #475

[proposal] add a schedule plugin that support pod expands and shrinks according to the order of the defined logical node set #475

Comments

fjding commented Jan 7, 2023 • edited Loading

KunWuLuan commented Apr 3, 2023

fjding commented Apr 3, 2023

fjding commented Apr 3, 2023

ffromani commented Apr 3, 2023

Huang-Wei commented Apr 3, 2023

fjding commented Apr 3, 2023

KunWuLuan commented Apr 3, 2023 • edited Loading

Huang-Wei commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

denkensk commented Apr 4, 2023

denkensk commented Apr 4, 2023

fjding commented Apr 4, 2023

fjding commented Apr 4, 2023

KunWuLuan commented Apr 4, 2023

fjding commented Apr 7, 2023

fjding commented Apr 20, 2023

Huang-Wei commented Apr 21, 2023

fjding commented Apr 21, 2023

KunWuLuan commented May 25, 2023

k8s-triage-robot commented Jan 21, 2024

KunWuLuan commented Jan 22, 2024

KunWuLuan commented Feb 27, 2024

k8s-triage-robot commented May 27, 2024

KunWuLuan commented May 27, 2024

fjding commented Jan 7, 2023 •

edited

Loading

KunWuLuan commented Apr 3, 2023 •

edited

Loading