Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producer should keep sent record batches intact when retrying (at least for EOS) #1070

Closed
buyology opened this issue Mar 21, 2018 · 2 comments
Labels
enhancement producer stale Issues and pull requests without any recent activity

Comments

@buyology
Copy link

While working on the TransactionalProducer/EOS the one remaining hurdle is that the producer currently splits the batches when retrying:

https://github.com/Shopify/sarama/blob/5fd60c2be0d4eaf97c800319c0e3602b9a13c452/async_producer.go#L878-L882

This implies that the broker is no longer guaranteed to see the exact same batches on retry.

The broker, however, expects just that — as illustrated by the checks for the following errors:

  • DUPLICATE_SEQUENCE_NUMBER:

https://github.com/apache/kafka/blob/5cc162f30eca28f699a622671669fefba56fb447/core/src/main/scala/kafka/log/ProducerStateManager.scala#L138

  • OUT_OF_ORDER_SEQUENCE_NUMBER:

https://github.com/apache/kafka/blob/5cc162f30eca28f699a622671669fefba56fb447/core/src/main/scala/kafka/log/ProducerStateManager.scala#L225

@eapache — Any thoughts/preferences on how to overcome this best?

(The work around while developing the TP, in order to change as little as possible of the logic in the producer, was to add a field with the retryable messages to ProducerMessage to essentially make it a union of an individual message and a retry batch and special case that treatment for EOS. But maybe that's too much of a work around to be a proper patch?)

@eapache
Copy link
Contributor

eapache commented Apr 6, 2018

🤔 this is really awkward, because the current producer architecture was completely designed around the single message as a unit of retriability.

Given that the producer already guarantees ordering though, you could do something like...

  • assign each ProducerMessage an ID for the batch
  • in the partitionProducer and brokerProducer, track those so that batches are not produced until fully present, and that messages from one batch don't sneak into previous (or new) batches)

Dunno how messy that gets.

@ghost
Copy link

ghost commented Feb 21, 2020

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

@ghost ghost added the stale Issues and pull requests without any recent activity label Feb 21, 2020
@ghost ghost closed this as completed Mar 23, 2020
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement producer stale Issues and pull requests without any recent activity
Projects
None yet
Development

No branches or pull requests

2 participants