Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release-v1.10] Clean up reserved from resources that have been scheduled #830

Conversation

openshift-cherrypick-robot

This is an automated cherry-pick of #818

/assign pierDipi

In a recent testing run, we've noticed we have have a scheduled
`ConsumerGroup` [1] (see placements) being considered having
reserved replicas in a different pod [2].

That makes the scheduler think that there is no space but the
autoscaler says we have enough space to hold every virtual
replica.

[1]
```
$ k describe consumergroups -n ks-multi-ksvc-0 c9ee3490-5b4b-4d11-87af-8cb2219d9fe3
Name:         c9ee3490-5b4b-4d11-87af-8cb2219d9fe3
Namespace:    ks-multi-ksvc-0
...
Status:
  Conditions:
    Last Transition Time:  2023-09-06T19:58:27Z
    Reason:                Autoscaler is disabled
    Status:                True
    Type:                  Autoscaler
    Last Transition Time:  2023-09-06T21:41:13Z
    Status:                True
    Type:                  Consumers
    Last Transition Time:  2023-09-06T19:58:27Z
    Status:                True
    Type:                  ConsumersScheduled
    Last Transition Time:  2023-09-06T21:41:13Z
    Status:                True
    Type:                  Ready
  Observed Generation:     1
  Placements:
    Pod Name:      kafka-source-dispatcher-6
    Vreplicas:     4
    Pod Name:      kafka-source-dispatcher-7
    Vreplicas:     4
  Replicas:        8
  Subscriber Uri:  http://receiver5-2.ks-multi-ksvc-0.svc.cluster.local
Events:            <none>
```

[2]
```
    "ks-multi-ksvc-0/c9ee3490-5b4b-4d11-87af-8cb2219d9fe3": {
      "kafka-source-dispatcher-3": 8
    },
```

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm label Sep 14, 2023
@openshift-ci
Copy link

openshift-ci bot commented Sep 14, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: openshift-cherrypick-robot, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot openshift-merge-robot merged commit d72d1a2 into openshift-knative:release-v1.10 Sep 14, 2023
openshift-merge-robot pushed a commit that referenced this pull request Sep 20, 2023
* Make SecretSpec field of consumers Auth omitempty (#780)

* Expose init offset and schedule metrics for ConsumerGroup reconciler (#790) (#791)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Fix channel finalizer logic (knative-extensions#3295) (#795)

Signed-off-by: Calum Murray <cmurray@redhat.com>
Co-authored-by: Calum Murray <cmurray@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* [release-v1.10] SRVKE-958: Cache init offsets results (#817)

* Cache init offsets results

When there is high load and multiple consumer group schedule calls,
we get many `dial tcp 10.130.4.8:9092: i/o timeout` errors when
trying to connect to Kafka.
This leads to increased "time to readiness" for consumer groups.

The downside of caching is that, in the case, partitions
increase while the result is cached we won't initialize
the offsets of the new partitions.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Add autoscaler leader log patch

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Scheduler handle overcommitted pods (#820)

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Set consumer and consumergroups finalizers when creating them (#823)

It is possible that a delete consumer or consumergroup might
be reconciled and never finalized when it is deleted before
the finalizer is set.
This happens because the Knative generated reconciler uses
patch (as opposed to using update) for setting the finalizer
and patch doesn't have any optimistic concurrency controls.

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Clean up reserved from resources that have been scheduled (#830)

In a recent testing run, we've noticed we have have a scheduled
`ConsumerGroup` [1] (see placements) being considered having
reserved replicas in a different pod [2].

That makes the scheduler think that there is no space but the
autoscaler says we have enough space to hold every virtual
replica.

[1]
```
$ k describe consumergroups -n ks-multi-ksvc-0 c9ee3490-5b4b-4d11-87af-8cb2219d9fe3
Name:         c9ee3490-5b4b-4d11-87af-8cb2219d9fe3
Namespace:    ks-multi-ksvc-0
...
Status:
  Conditions:
    Last Transition Time:  2023-09-06T19:58:27Z
    Reason:                Autoscaler is disabled
    Status:                True
    Type:                  Autoscaler
    Last Transition Time:  2023-09-06T21:41:13Z
    Status:                True
    Type:                  Consumers
    Last Transition Time:  2023-09-06T19:58:27Z
    Status:                True
    Type:                  ConsumersScheduled
    Last Transition Time:  2023-09-06T21:41:13Z
    Status:                True
    Type:                  Ready
  Observed Generation:     1
  Placements:
    Pod Name:      kafka-source-dispatcher-6
    Vreplicas:     4
    Pod Name:      kafka-source-dispatcher-7
    Vreplicas:     4
  Replicas:        8
  Subscriber Uri:  http://receiver5-2.ks-multi-ksvc-0.svc.cluster.local
Events:            <none>
```

[2]
```
    "ks-multi-ksvc-0/c9ee3490-5b4b-4d11-87af-8cb2219d9fe3": {
      "kafka-source-dispatcher-3": 8
    },
```

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Ignore unknown fields in data plane contract (knative-extensions#3335) (#828)

Signed-off-by: Calum Murray <cmurray@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Signed-off-by: Calum Murray <cmurray@redhat.com>
Co-authored-by: Martin Gencur <mgencur@redhat.com>
Co-authored-by: Matthias Wessendorf <mwessend@redhat.com>
Co-authored-by: Calum Murray <cmurray@redhat.com>
Co-authored-by: OpenShift Cherrypick Robot <openshift-cherrypick-robot@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants