-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support for cluster-level resource propagation pause and resume capabilities #5118
base: master
Are you sure you want to change the base?
[Proposal] Support for cluster-level resource propagation pause and resume capabilities #5118
Conversation
/cc @CharlesQQ @a7i @chaunceyjiang @whitewindmills |
@XiShanYongYe-Chang: GitHub didn't allow me to request PR reviews from the following users: CharlesQQ. Note that only karmada-io members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #5118 +/- ##
=======================================
Coverage 28.23% 28.23%
=======================================
Files 632 632
Lines 43712 43712
=======================================
+ Hits 12343 12344 +1
+ Misses 30467 30466 -1
Partials 902 902
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@XiShanYongYe-Chang |
/assign |
6bfc022
to
81b39a1
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
81b39a1
to
2dd94a0
Compare
Hi @CharlesQQ @chaunceyjiang @whitewindmills @RainbowMango , it's ready to review again. |
2dd94a0
to
f5a3454
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi!
Thanks for your proposal!
In Solution One, the SchedulePause
of ResourceBinding
is propagated through PropagationPolicy
. If I create a PropagationPolicy
with a PauseStrategy
set to nil
by default, is it feasible to only update the SchedulePause
of ResourceBinding
to true via a webhook when the karmada detector
creates the ResourceBinding
? If this is feasible, does "propagation" mean that the associated ResourceBinding/Work
is only updated when the PropagationPolicy
updates the PauseStrategy
field?
I also have a suggestion: can this issue be considered as a user story? In brief, it involves pausing the ResourceBinding
at the time of creation through a webhook, and then using a custom controller to determine when certain ResourceBindings
can be scheduled.
+1 |
For solution 1, the source of user control is in the PropagationPolicy. If it is modified through webhook, it will be inconsistent with the declaration of PropagationPolicy. I think solution 2 might be better for your scenario. |
I understand that this is a specific implementation, not a user case. In fact, the two actions you point out can be done by controlling the pause of RB. Karmada is not aware of when you pause and resume. That's why I pointed out that solution 2 is more suitable for your use case. |
When the user sets pause at the pp or cr level, other components such as wenhook may also set pause/resume at the rb or work level. There will be a conflict here. We should explain this limitation: ) |
Yes, if you need to set it up via a webhook, the user needs to be aware of the restrictions. |
In fact, what I am concerned about is whether the new work object has been modified by the latest overridepolicy after resumption of propagation. This must be ensured. |
Good question! What you say should be the definitive expected behavior, because when you change the op, work is updated, and when work is resumed, it is updated with the latest spec. |
I would prefer to introduce this functionality in PropagationPolicy as it is more convenient for users and straightforward than introducing another API. Also proposing the API: // PropagationSpec represents the desired behavior of PropagationPolicy.
type PropagationSpec struct {
// Suspension declares the policy for suspending different aspects of propagation.
// nil means no suspension. no default values.
// +optional
Suspension *Suspension `json:"suspension,omitempty"`
}
// Suspension defines the policy for suspending different aspects of propagation.
type Suspension struct {
// Scheduling controls whether scheduling should be suspended.
// +optional
// Note: Postpone this until there is a solid use case.
// Scheduling *bool `json:"scheduling,omitempty"`
// Dispatching controls whether dispatching should be suspended.
// nil means not suspend, no default value, only accepts 'true'.
// Note: true means stop propagating to all clusters. Can not co-exist
// with DispatchingOnClusters which is used to suspend particular clusters.
// +optional
Dispatching *bool `json:"dispatching,omitempty"`
// DispatchingOnClusters declares a list of clusters to which the dispatching
// should be suspended.
// Note: Can not co-exist with Dispatching which is used to suspend all.
// +optional
DispatchingOnClusters *SuspendClusters `json:"dispatchingOnClusters,omitempty"`
}
// SuspendClusters represents a group of clusters that should be suspended from propagating.
// Note: No plan to introduce the label selector or field selector to select clusters yet, as it
// would make the system unpredictable.
type SuspendClusters struct {
// ClusterNames is the list of clusters to be selected.
// +optional
ClusterNames []string `json:"clusterNames,omitempty"`
} Note: ResourceBinding and Work also need similar field(s). |
Another question here, after scheduling or distribution paused, will there be any problems with resource occupation calculation? For example, will resources be pre-occupied after the work is paused? If resources are occupied and multiple workare resumed at the same time, will resource calculation errors be caused? |
For scheduling paused, resources will not be pre-occupied. For distribution pause, I think it will. |
…ilities Signed-off-by: changzhen <changzhen5@huawei.com>
f5a3454
to
30c5f03
Compare
/retest |
Hi guys, at the moment, I'm inclined towards Plan One, as it is more straightforward and convenient for users, without the need for learning new APIs. If we introduce a new API to describe the pause strategy for resources, it might seem a bit limited. Later on, if we provide a Rollout API, we can then focus on thinking about adding new APIs. How do you think? @CharlesQQ @chaunceyjiang @whitewindmills @RainbowMango @a7i @Monokaix @Vacant2333 |
I vote for solution one and sent the draft API on #5118 (comment). |
What type of PR is this?
/kind design
/kind documentation
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #1567, #4421, #4688
Special notes for your reviewer:
Other related issues: #4937
Does this PR introduce a user-facing change?: