-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track usage in capacity status #57
Track usage in capacity status #57
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
05ef630
to
fe181b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testing it locally, so far looks good, for now just one nit
fe181b6
to
bbd8503
Compare
bbd8503
to
b091b97
Compare
/hold /assign @ahg-g @ArangoGutierrez |
dbefd6f
to
27c3569
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments.
api/v1alpha1/capacity_types.go
Outdated
|
||
// runningWorkloads is the number of workloads currently assigned to this | ||
// capacity. | ||
RunningWorkloads int32 `json:"runningWorkloads,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running is a little confuse. Actually the workload is just assigned.
RunningWorkloads --> AssignedWorkloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes... not sure what I was thinking.
pkg/capacity/capacity.go
Outdated
// Usage reports the used resources and number of workloads assigned to the | ||
// capacity. | ||
func (c *Cache) Usage(capObj *kueue.Capacity) (map[corev1.ResourceName]map[string]kueue.Usage, int, error) { | ||
c.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can choose to use RWLock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it makes sense now that there are more than one reader
if err := k8sClient.Get(ctx, types.NamespacedName{Name: "prod-capacity"}, &capObj); err != nil { | ||
return false | ||
} | ||
return capObj.Status.RunningWorkloads == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check the usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do as part of #68
q.Add(requestForWorkloadCapacity(oldW)) | ||
} | ||
newW := e.ObjectNew.(*kueue.QueuedWorkload) | ||
if newW.Spec.AssignedCapacity != "" && newW.Spec.AssignedCapacity != oldW.Spec.AssignedCapacity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about the circumstances under which this Spec.AssignedCapacity
would change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't happen... but I tried to make the code resilient to that.
I guess one case could be reassignment after preemption, if the user decides to change the queue for the workload, or the queue changes capacity 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested locally and works, just a few comments
} | ||
|
||
// SetupWithManager sets up the controller with the Manager. | ||
func (r *CapacityReconciler) SetupWithManager(mgr ctrl.Manager) error { | ||
return ctrl.NewControllerManagedBy(mgr). | ||
For(&kueue.Capacity{}). | ||
Watches(&source.Kind{Type: &kueue.QueuedWorkload{}}, &assignedWorkloadHandler{}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Watches(&source.Kind{Type: &kueue.QueuedWorkload{}}, &assignedWorkloadHandler{}). | |
Watches(&source.Kind{Type: &kueue.QueuedWorkload{}}, &assignedWorkloadHandler{}). | |
WithLogger(r.log). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried that before, but the manager adds extra names... so it looks like this:
capacity-reconciler.controller.capacity
Better just have controller.capacity
to distinguish from the event handler logs.
Maybe I should rename the handler log to capacity-handler
?
log.Error(err, "Failed getting usage from capacity cache") | ||
// This is likely because the capacity was recently removed and we didn't | ||
// process that event yet. | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe in this scenario we want to requeue?
// Requeue tells the Controller to requeue the reconcile key. Defaults to false.
Requeue bool
to trigger a new try to reconcile the state of the capacity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because we should receive the event, which will put the Capacity in the workqueue.
Although it doesn't really matter, there is nothing to do if the capacity is deleted.
Looks good to me if you want to squash, i can lgtm if there are no pending comments from others |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
- Add new status fields - Reconcile when there are QueuedWorkload updates Change-Id: I44b96a286f9871a76e5657f1dbb1b4b31738c2f4
05282f1
to
a83f769
Compare
rebased |
/lgtm |
/retest why do we keep getting this failure? |
/hold cancel |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
This is important for the observability of kueue operations
Which issue(s) this PR fixes:
Fixes #7
Special notes for your reviewer:
I'm putting a minimal check in the integration test. I plan to add a more comprehensive test as part of #68