Producer should keep sent record batches intact when retrying (at least for EOS) #1070

buyology · 2018-03-21T13:18:08Z

While working on the TransactionalProducer/EOS the one remaining hurdle is that the producer currently splits the batches when retrying:

https://github.com/Shopify/sarama/blob/5fd60c2be0d4eaf97c800319c0e3602b9a13c452/async_producer.go#L878-L882

This implies that the broker is no longer guaranteed to see the exact same batches on retry.

The broker, however, expects just that — as illustrated by the checks for the following errors:

DUPLICATE_SEQUENCE_NUMBER:

https://github.com/apache/kafka/blob/5cc162f30eca28f699a622671669fefba56fb447/core/src/main/scala/kafka/log/ProducerStateManager.scala#L138

OUT_OF_ORDER_SEQUENCE_NUMBER:

https://github.com/apache/kafka/blob/5cc162f30eca28f699a622671669fefba56fb447/core/src/main/scala/kafka/log/ProducerStateManager.scala#L225

@eapache — Any thoughts/preferences on how to overcome this best?

(The work around while developing the TP, in order to change as little as possible of the logic in the producer, was to add a field with the retryable messages to ProducerMessage to essentially make it a union of an individual message and a retry batch and special case that treatment for EOS. But maybe that's too much of a work around to be a proper patch?)

The text was updated successfully, but these errors were encountered:

eapache · 2018-04-06T18:14:03Z

🤔 this is really awkward, because the current producer architecture was completely designed around the single message as a unit of retriability.

Given that the producer already guarantees ordering though, you could do something like...

assign each ProducerMessage an ID for the batch
in the partitionProducer and brokerProducer, track those so that batches are not produced until fully present, and that messages from one batch don't sneak into previous (or new) batches)

Dunno how messy that gets.

ghost · 2020-02-21T23:04:42Z

Thank you for taking the time to raise this issue. However, it has not had any activity on it in the past 90 days and will be closed in 30 days if no updates occur.
Please check if the master branch has already resolved the issue since it was raised. If you believe the issue is still valid and you would like input from the maintainers then please comment to ask for it to be reviewed.

eapache added enhancement producer labels May 3, 2018

buyology mentioned this issue Sep 4, 2018

Added support for Idempotent Producer #1152

Merged

ghost added the stale Issues and pull requests without any recent activity label Feb 21, 2020

ghost closed this as completed Mar 23, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producer should keep sent record batches intact when retrying (at least for EOS) #1070

Producer should keep sent record batches intact when retrying (at least for EOS) #1070

buyology commented Mar 21, 2018

eapache commented Apr 6, 2018

ghost commented Feb 21, 2020

Producer should keep sent record batches intact when retrying (at least for EOS) #1070

Producer should keep sent record batches intact when retrying (at least for EOS) #1070

Comments

buyology commented Mar 21, 2018

eapache commented Apr 6, 2018

ghost commented Feb 21, 2020