-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sarama Async Producer Encounters 'Out of Order' Error: what are the reasons? #2803
Comments
With the setup described above, we encountered some instances of "The broker received an out of order sequence number" errors recently too. These occurrences are very rare too, but we are wondering if this could indicate an issue with how the messages are being pushed, leading to them being ordered incorrectly. |
So this appears to be an ordering issue / race condition between new batches being produced and batches being retried in the idempotent producer: Lines 1144 to 1148 in f21c512
This shouldn't occur with |
Thank you, @dnwe. Is there currently someone addressing this issue? If not, we're willing to assist and contribute to a solution. Could you provide some guidance on where we might start or what to look into? |
I was able to reproduce with a simple async producer that sets: config.Net.MaxOpenRequests = 1
config.Producer.Idempotent = true In my case, the trigger that causes the I don't see the same problem if I switch to using the sync producer in a loop (keeping the same configuration). I suspect this is because my test program will block until Kafka acks each message - effectively preventing the possibility of there being more than one request in flight at any time. |
Should be fixed by #2943 if someone can review |
Hi, @nevillus. Maybe I can know whether this error auto-recover? We plan to upgrade Sarama, if it can auto recover and won't cause actual disorder, maybe no big impact? Thanks a lot! |
Description
We are encountering an error (once every few weeks) while using the async producer in our Kafka setup. The error message encountered is as follows:
This error seems to originate from the following line in the Sarama library:
produce_set.go#L89
The occurrence of this error is sporadic, and we are struggling to understand the underlying cause or identify any corrective measures. It appears that, occasionally, messages are being added to the batch in an incorrect order.
We are seeking insights or suggestions on what might be triggering this error. Our investigations have considered network issues as a potential cause; however, we have not found any corresponding logs or indicators to substantiate this theory when the error occurs.
Versions
Configuration
Logs
We are facing the error detailed at the following location:
produce_set.go#L89
Additional Context
All messages are dispatched using an asynchronous producer, configured with a high retry count to ensure message delivery even in the event of transient Kafka broker failures. Despite this, we observe that occasionally a message fails to be added to the batch, rendering it ineligible for any retry mechanism in Sarama.
The text was updated successfully, but these errors were encountered: