Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix slow consumergroup reconciliation under load #3293

Merged

Conversation

pierDipi
Copy link
Member

In [1], we return an error, however, that method is called in the consumer group reconciler when a consumer is not ready which is a normal state at the beginning when consumers have been just created, so we shouldn't return an error because that causes the consumer group to be reconciled again with an exponentially increasing delay causing slow time to ready.

This is especially evident when scaling up with a high load = (therefore when dispatcher pod is slow to become ready).

[1] https://github.com/knative-sandbox/eventing-kafka-broker/blob/5cda5463aa2fa060179674fe7b3237abb836ee06/control-plane/pkg/apis/internals/kafka/eventing/v1alpha1/consumer_group_lifecycle.go#L57-L65

Fixes #3046

Proposed Changes

  • Fix slow consumergroup reconciliation under load

Release Note

Fix slow `ConsumerGroup` reconciliation under load

Docs

In [1], we return an error, however, that method is called in
the consumer group reconciler when a consumer is not ready
which is a normal state at the beginning when consumers have been
just created, so we shouldn't return an error because that causes
the consumer group to be reconciled again with an exponentially
increasing delay causing slow time to ready.

This is especially evident when scaling up with a high load =
(therefore when dispatcher pod is slow to become ready).

[1] https://github.com/knative-sandbox/eventing-kafka-broker/blob/5cda5463aa2fa060179674fe7b3237abb836ee06/control-plane/pkg/apis/internals/kafka/eventing/v1alpha1/consumer_group_lifecycle.go#L57-L65

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
@pierDipi
Copy link
Member Author

/cc @matzew

@knative-prow knative-prow bot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Aug 21, 2023
@codecov
Copy link

codecov bot commented Aug 21, 2023

Codecov Report

Merging #3293 (f82acff) into main (c402ae9) will increase coverage by 0.02%.
Report is 1 commits behind head on main.
The diff coverage is 0.00%.

@@             Coverage Diff              @@
##               main    #3293      +/-   ##
============================================
+ Coverage     61.78%   61.80%   +0.02%     
- Complexity      764      766       +2     
============================================
  Files           181      181              
  Lines         12220    12223       +3     
  Branches        266      266              
============================================
+ Hits           7550     7555       +5     
  Misses         4081     4081              
+ Partials        589      587       -2     
Flag Coverage Δ
java-unittests 71.67% <ø> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...afka/eventing/v1alpha1/consumer_group_lifecycle.go 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

@pierDipi
Copy link
Member Author

/test channel-integration-tests-ssl

// It is "normal" to have non-ready consumers, and we will get notified when their status change,
// so we don't need to return an error here which causes the object to be queued with an
// exponentially increasing delay.
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the explicit comment.

It seems reasonable

Copy link
Contributor

@matzew matzew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Aug 21, 2023
@knative-prow
Copy link

knative-prow bot commented Aug 21, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matzew, pierDipi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pierDipi
Copy link
Member Author

/retest-required

@matzew
Copy link
Contributor

matzew commented Aug 21, 2023

/test channel-integration-tests-ssl

1 similar comment
@pierDipi
Copy link
Member Author

/test channel-integration-tests-ssl

@matzew
Copy link
Contributor

matzew commented Aug 21, 2023

/test channel-integration-tests-sasl-ssl

@knative-prow knative-prow bot merged commit 887cad0 into knative-extensions:main Aug 21, 2023
33 of 36 checks passed
@pierDipi pierDipi deleted the KN-3046_slow-consumer-group-rec branch August 22, 2023 10:12
@pierDipi
Copy link
Member Author

/cherry-pick release-1.11

@pierDipi
Copy link
Member Author

/cherry-pick release-1.10

@knative-prow-robot
Copy link
Contributor

@pierDipi: new pull request created: #3298

In response to this:

/cherry-pick release-1.11

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot
Copy link
Contributor

@pierDipi: new pull request created: #3299

In response to this:

/cherry-pick release-1.10

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/control-plane lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Slow consumergroup reconciliation under load
3 participants