Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bad offset management #597

Merged

Conversation

slinkydeveloper
Copy link
Contributor

Signed-off-by: Francesco Guardiani francescoguard@gmail.com

Fixes #380

Proposed Changes

  • 🐛 Now offset management is correct

Release Note

:bug: Now offset management is correct

Docs

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
@google-cla google-cla bot added the cla: yes Indicates the PR's author has signed the CLA. label Feb 4, 2021
@knative-prow-robot knative-prow-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 4, 2021
@codecov
Copy link

codecov bot commented Feb 4, 2021

Codecov Report

Merging #597 (e770db8) into master (482f1c3) will decrease coverage by 8.23%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #597      +/-   ##
============================================
- Coverage     71.85%   63.62%   -8.24%     
============================================
  Files            72       28      -44     
  Lines          2587     1207    -1380     
  Branches        121        0     -121     
============================================
- Hits           1859      768    -1091     
+ Misses          584      364     -220     
+ Partials        144       75      -69     
Flag Coverage Δ Complexity Δ
java-unittests ? ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ Complexity Δ
control-plane/pkg/core/config/utils.go 72.41% <100.00%> (+0.98%) 0.00 <0.00> (ø)
control-plane/pkg/reconciler/broker/broker.go 78.71% <100.00%> (ø) 0.00 <0.00> (ø)
control-plane/pkg/reconciler/sink/kafka_sink.go 75.42% <100.00%> (ø) 0.00 <0.00> (ø)
control-plane/pkg/reconciler/trigger/trigger.go 71.66% <100.00%> (-0.24%) 0.00 <0.00> (ø)
...fka/broker/core/security/PlaintextCredentials.java
...a/broker/core/tracing/HeadersPropagatorSetter.java
...a/broker/core/security/KubernetesAuthProvider.java
.../kafka/broker/dispatcher/ConsumerRecordSender.java
...ka/broker/core/security/KubernetesCredentials.java
...nting/kafka/broker/core/tracing/TracingConfig.java
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 482f1c3...c8da4e4. Read the comment docs.

@slinkydeveloper
Copy link
Contributor Author

/retest

1 similar comment
@slinkydeveloper
Copy link
Contributor Author

/retest

Comment on lines 235 to 236
// If you don't want to lose hours in debugging, please don't remove this FQCNs :)
final Map<io.vertx.kafka.client.common.TopicPartition, io.vertx.kafka.client.consumer.OffsetAndMetadata>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOL 🤣

@slinkydeveloper slinkydeveloper mentioned this pull request Feb 8, 2021
7 tasks
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
consumer.commit(Map.of(
key,
new OffsetAndMetadata(shouldAck + 1, ""))
).onSuccess(ignored -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onSuccess on a new line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closed paren is right there 😄

// If the commit failed, there are 2 situations:
// * Nobody tried to commit again, so let's just restore the state
// * Somebody committed with an offset greater than this one, so we just discard the error
if (!(this.lastAckedMap.get(key) > shouldAck)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!(this.lastAckedMap.get(key) > shouldAck)) {
if (this.lastAckedMap.get(key) <= shouldAck) {

Copy link
Contributor Author

@slinkydeveloper slinkydeveloper Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to keep it this way, since it's consistent with the above comment

Comment on lines 125 to 126
long lastAckedBeforeThisOne = this.lastAckedMap.put(key, shouldAck);
SortedSet<Long> messagesImGoingToAck = this.pendingAcksMap.remove(key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated on success or on failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is discarded in case of success, copy pasted in case of failure (if we actually need to restore the state)

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
if (!(this.lastAckedPerPartition.get(topicPartition) > toAck)) {
this.lastAckedPerPartition.put(topicPartition, lastAckedBeforeThisOne);
this.pendingAcksPerPartition.compute(topicPartition, (k, actual) -> {
if (actual != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible since we call mutateStateAndCheckAck above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, because somebody could have asked meanwhile to add a new commit and that state could have been populated again

Comment on lines 155 to 164
if (!(this.lastAckedPerPartition.get(topicPartition) > toAck)) {
this.lastAckedPerPartition.put(topicPartition, lastAckedBeforeThisOne);
this.pendingAcksPerPartition.compute(topicPartition, (k, actual) -> {
if (actual != null) {
actual.addAll(messagesImGoingToAck);
return actual;
}
return messagesImGoingToAck;
});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I comment out these lines, tests are green.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if we set the state on success, there is no need to rollback.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be safer since the state could have changed since lastAckedBeforeThisOne was calculated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I comment out these lines, tests are green.

How could it be? Lemme check 🤔

Copy link
Contributor Author

@slinkydeveloper slinkydeveloper Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes of course it goes green, but if the system crashes after trying to commit, it will restart from the previous ack. Do we care? Otherwise I'll just remove that statement, it just makes the code more complicated

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
Copy link
Member

@pierDipi pierDipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thank you!

/kind bug

@knative-prow-robot knative-prow-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 8, 2021
@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2021
@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pierDipi, slinkydeveloper

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [pierDipi,slinkydeveloper]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@pierDipi
Copy link
Member

pierDipi commented Feb 8, 2021

Java tests failed with Connection reset in GH.

@knative-prow-robot knative-prow-robot merged commit f584a4e into knative-extensions:master Feb 8, 2021
pierDipi pushed a commit to pierDipi/eventing-kafka-broker that referenced this pull request Feb 27, 2021
* Fix knative-extensions#380

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

* Fix

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

* Comments

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

* Comments

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>

* Removed complicated code which was practically useless

Signed-off-by: Francesco Guardiani <francescoguard@gmail.com>
matzew pushed a commit to matzew/eventing-kafka-broker that referenced this pull request Feb 10, 2023
…ns#597)

* Periodically resync consumer groups to downscale statefulset

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Autoscaler scale down delay

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Scale machineset for rekt tests

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Increase is ready timeout

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

* Skip namespace cleanup so that we can more must gather info

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>

---------

Signed-off-by: Pierangelo Di Pilato <pierdipi@redhat.com>
Co-authored-by: Pierangelo Di Pilato <pierdipi@redhat.com>
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/data-plane cla: yes Indicates the PR's author has signed the CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate offset management
3 participants