-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal for multiple pod template support #5085
base: master
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #5085 +/- ##
==========================================
+ Coverage 28.21% 28.29% +0.08%
==========================================
Files 632 632
Lines 43568 43635 +67
==========================================
+ Hits 12291 12345 +54
- Misses 30381 30388 +7
- Partials 896 902 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@RainbowMango - I've added this proposal to the discussion section of the community meeting tomorrow. By chance, would it be possible to move the meeting 30 minutes earlier? I've got a conflict at the moment. |
I'm ok with it since this is the only topic for this meeting. I'll send a notice to the mailing group and slack channel, and gather feedback. |
docs/proposals/scheduling/crd-scheduling-improvements/crd-scheduling-improvements.md
Outdated
Show resolved
Hide resolved
@RainbowMango Thanks so much for reviewing this with me during the community meeting! Just to add more context here, as I heard that is also work being done to support CRDs with multiple pod templates (like FlinkDeployment, or TensorFlow jobs for instance). For the FlinkDeployment, we cannot have replicas for the same job scheduled on different clusters - meaning we either schedule all pods on one cluster, or do not schedule at all. Once we schedule the CRD to a member cluster, all pod scheduling will be taken care of by the Flink operator. Thinking about this more, I think it makes more sense to approach this by using one of your suggestions, which was to make Components the top level API Definition, and have replicas defined within each individual component. If we need all replicas to be scheduled on one cluster, we can set the spreadConstraints on the related PropagationPolicy. |
295c2ae
to
8c59ff6
Compare
/assign |
646c98e
to
3f2e4c2
Compare
…uling Signed-off-by: mszacillo <mszacillo@bloomberg.net>
3f2e4c2
to
c026200
Compare
. . . | ||
|
||
// The total number of replicas scheduled by this resource. Each replica will represented by exactly one component of the resource. | ||
TotalReplicas int32 `json:"totalReplicas,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've included this field as a replacement for the existing Replicas
field, which is used very frequently within the Karmada codebase. Even though we are introducing the concept of components, Karmada will still ultimately be scheduling replicas - so I believe this slight refactor will make the implementation of this change less complex. This is again making an assumption that resources with more than 1 component will be scheduled on the same cluster.
|
||
For the proposed implementation, please refer to the next section. | ||
|
||
### Accurate Estimator Changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies if the explanation of the implementation proposal is a little bit dense. I can format this to be a little more clear, but otherwise please let me know if you have any questions that I can clarify.
Hi @RainbowMango @whitewindmills, I've gone ahead and made an update to the proposal doc with a more precise explanation of the implementation details. Please let me know if you have any comments or questions - and apologies in advance if this is a little dense, perhaps I can type this in LateX and attach an image of the algorithm specific sections. :) Quick note - there still needs to be some work done on how multiple components would work from a customized resource modeling perspective. I'll try to add a section to that this weekend. |
During maxReplica estimation, we will take the sum of all resource requirement for the CRD. | ||
|
||
Total_CPU = component_1.replicas * (component_1.cpu) + component_2.replicas * (component_2.cpu) = (1 * 1) + (2 * 1) = 3 CPU. | ||
Total_Memory = component_1.replicas * (component_1.memory) + component_2.replicas * (component_2.memory) = (1 * 2GB) + (2 * 1GB) = 8GB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 4G?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes :)
So this design doesn't consider replica division and also is not an completely accurate calculation cause fragment issue is unavoidable, am I right? |
Yes this approach would not consider divided replicas based off the use-cases we've compiled (#5115). At the moment, most CRDs get scheduled to a single cluster rather than spread across multiple. In terms of the precision of the calculation, you're right that it's not completely accurate. It's an estimate of how many CRDs could be fully packed on the destination cluster. However, we do guarantee that at least 1 CRD can be scheduled on a member. So at least there should never be a scenario where we schedule a CRD to a member cluster that does not have sufficient resources to hold it. |
Hi @mszacillo I'm going to take two weeks off and might be slow to respond during this time. I believe this feature is extremely important, and I'll be focusing on this once I get back. Given this feature would get controllers, and schedulers involved, it's not that easy to come up with an ideal solution in a short time. By the way, I guess, with the help of default resource interpreter of FlinkDeployment(thanks for your contribution, by the way), this probably not a blocker for you, am I right? Do you think the Application Failover Feature has a higher priority than this? |
Hi @RainbowMango, thanks for the heads up and enjoy your time off!
Yes, this is currently not a blocker for us. We can get around with the existing maxReplica estimation while we determine a solution for multiple podTemplate support, and we are instead focusing on the failover feature enhancements. For our MVP using Karmada we need two things:
After we've completed the above tickets our order of priority will be publishing the implementation for the later steps of the failover history proposal, and then working on the multiple pod template support. |
What type of PR is this?
/kind design
What this PR does / why we need it:
Described in document.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Proposal doc for CRD scheduling improvements. Posting proposal following discussion in community meeting.
Does this PR introduce a user-facing change?: