-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release-v1.10] Clean up reserved from resources that have been scheduled #830
[release-v1.10] Clean up reserved from resources that have been scheduled #830
Conversation
In a recent testing run, we've noticed we have have a scheduled `ConsumerGroup` [1] (see placements) being considered having reserved replicas in a different pod [2]. That makes the scheduler think that there is no space but the autoscaler says we have enough space to hold every virtual replica. [1] ``` $ k describe consumergroups -n ks-multi-ksvc-0 c9ee3490-5b4b-4d11-87af-8cb2219d9fe3 Name: c9ee3490-5b4b-4d11-87af-8cb2219d9fe3 Namespace: ks-multi-ksvc-0 ... Status: Conditions: Last Transition Time: 2023-09-06T19:58:27Z Reason: Autoscaler is disabled Status: True Type: Autoscaler Last Transition Time: 2023-09-06T21:41:13Z Status: True Type: Consumers Last Transition Time: 2023-09-06T19:58:27Z Status: True Type: ConsumersScheduled Last Transition Time: 2023-09-06T21:41:13Z Status: True Type: Ready Observed Generation: 1 Placements: Pod Name: kafka-source-dispatcher-6 Vreplicas: 4 Pod Name: kafka-source-dispatcher-7 Vreplicas: 4 Replicas: 8 Subscriber Uri: http://receiver5-2.ks-multi-ksvc-0.svc.cluster.local Events: <none> ``` [2] ``` "ks-multi-ksvc-0/c9ee3490-5b4b-4d11-87af-8cb2219d9fe3": { "kafka-source-dispatcher-3": 8 }, ``` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: openshift-cherrypick-robot, pierDipi The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Make SecretSpec field of consumers Auth omitempty (#780) * Expose init offset and schedule metrics for ConsumerGroup reconciler (#790) (#791) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Fix channel finalizer logic (knative-extensions#3295) (#795) Signed-off-by: Calum Murray <cmurray@redhat.com> Co-authored-by: Calum Murray <cmurray@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> * [release-v1.10] SRVKE-958: Cache init offsets results (#817) * Cache init offsets results When there is high load and multiple consumer group schedule calls, we get many `dial tcp 10.130.4.8:9092: i/o timeout` errors when trying to connect to Kafka. This leads to increased "time to readiness" for consumer groups. The downside of caching is that, in the case, partitions increase while the result is cached we won't initialize the offsets of the new partitions. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Add autoscaler leader log patch Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Scheduler handle overcommitted pods (#820) Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Set consumer and consumergroups finalizers when creating them (#823) It is possible that a delete consumer or consumergroup might be reconciled and never finalized when it is deleted before the finalizer is set. This happens because the Knative generated reconciler uses patch (as opposed to using update) for setting the finalizer and patch doesn't have any optimistic concurrency controls. Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Clean up reserved from resources that have been scheduled (#830) In a recent testing run, we've noticed we have have a scheduled `ConsumerGroup` [1] (see placements) being considered having reserved replicas in a different pod [2]. That makes the scheduler think that there is no space but the autoscaler says we have enough space to hold every virtual replica. [1] ``` $ k describe consumergroups -n ks-multi-ksvc-0 c9ee3490-5b4b-4d11-87af-8cb2219d9fe3 Name: c9ee3490-5b4b-4d11-87af-8cb2219d9fe3 Namespace: ks-multi-ksvc-0 ... Status: Conditions: Last Transition Time: 2023-09-06T19:58:27Z Reason: Autoscaler is disabled Status: True Type: Autoscaler Last Transition Time: 2023-09-06T21:41:13Z Status: True Type: Consumers Last Transition Time: 2023-09-06T19:58:27Z Status: True Type: ConsumersScheduled Last Transition Time: 2023-09-06T21:41:13Z Status: True Type: Ready Observed Generation: 1 Placements: Pod Name: kafka-source-dispatcher-6 Vreplicas: 4 Pod Name: kafka-source-dispatcher-7 Vreplicas: 4 Replicas: 8 Subscriber Uri: http://receiver5-2.ks-multi-ksvc-0.svc.cluster.local Events: <none> ``` [2] ``` "ks-multi-ksvc-0/c9ee3490-5b4b-4d11-87af-8cb2219d9fe3": { "kafka-source-dispatcher-3": 8 }, ``` Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com> * Ignore unknown fields in data plane contract (knative-extensions#3335) (#828) Signed-off-by: Calum Murray <cmurray@redhat.com> --------- Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com> Signed-off-by: Calum Murray <cmurray@redhat.com> Co-authored-by: Martin Gencur <mgencur@redhat.com> Co-authored-by: Matthias Wessendorf <mwessend@redhat.com> Co-authored-by: Calum Murray <cmurray@redhat.com> Co-authored-by: OpenShift Cherrypick Robot <openshift-cherrypick-robot@redhat.com>
This is an automated cherry-pick of #818
/assign pierDipi