-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dead lettering more than two times results in a crashed queue #216
Comments
Thanks, I think I've found the reason, here: https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L148 We assume that the |
Can you please provide a script that reproduces the issue?
|
I can confirm this issue, which is also happening for us after upgrading from RabbitMQ Our logic is similar to that described by @riyad, where messages are published through a dead-letter exchange potentially multiple times. I reinstalled our broker yesterday, and all was running fine until this afternoon, when errors were reported in RabbitMQ logs, after which CPU immediately started increasing fairly steadily, until the Erlang Peace,
|
We understand what the issue is but not what causes it. Can someone post a small sample that reproduces the issue? We will turn it into a test case. |
Sorry, I don't have a sample I'm able to post easily at present. But I can add to the above that it does indeed seem to be Peace, tiredpixel |
@tiredpixel thanks. A more specific question: do your apps set the |
@michaelklishin, our apps don't set or use the |
@tiredpixel thanks. We do have tests that result in a message being dead-lettered multiple times. I have a couple of ideas. |
You can provoke the crash of the queue with the Python script found in https://gist.github.com/riyad/7439eb545dabf287bcc9. |
@riyad perfect, thank you! |
@michaelklishin, understood; thank you for looking into it. :) Peace, tiredpixel |
I can reproduce the issue. Looking into it. |
FYI: I fixed a small issue with the demo script. It was missing a see: https://gist.github.com/riyad/7439eb545dabf287bcc9/revisions |
Cheers. |
@riyad I have a fix, does this output confirm correct execution? https://gist.github.com/michaelklishin/97355747f6b85dfe3aae — the exception is gone. |
Basically there should be an additional x-death entry for every time the message is deadlettered. |
I see
after 3 runs, so it must be working as expected. |
@michaelklishin To be sure it would be nice if you could provide the output of the script with an empty queue. |
|
Looks good :) |
So the culprit seems to be the fact that the script modifies headers (see the |
Hmm ... so you're saying Puka is to blame? Am I correct that the issue is that header fields were encoded with the "wrong" type when republishing (even when they're superficially the same)? |
Not Puka but I could only trigger this with Puka, it serializes integers differently from the Java client (e.g. JVM distinguishes between integers and longs, and many dynamically-typed languages use automatic promotion on overflow). |
Be less assertive about x-death value types, fixes #216
OK, a decent fix is in #221. We'll release |
Thank a lot for fixing this so quickly. 😄 |
Saw the fix. I think you can still check if the client sent one of the
|
@michaelklishin, oh, such a speedy patch—very much obliged to you! :) I'll look forward to 3.5.4 RC. :) |
@videlalvaro that would require extracting validators into |
I'll just want to prevent data corruption/future crashes if the user sends
|
If the user modifies |
Reset to default?
|
hi, =ERROR REPORT==== 14-Mar-2016::20:04:20 ===
** Reason for termination == Can you pls suggest me how i can delete the queue ? I am not able to delete this queue from Admin Console. |
restarting the node should make it possible to delete the queue before it hits the same exception again. The right thing to do is to upgrade.
|
Hello. I'm using v3.6.5 (also tested from 3.5.6) and retry queues (task => retry1 (dl/ttl) => task => retry2 (dl/ttl) => task => retry3 (dl/ttl) => dl).
Any way to make it work ? |
@pdoreau please post an code snippet that reproduces it to rabbitmq-users, our public mailing list |
@michaelklishin I posted a message on the user list but it doestn't appear.
Can that be the cause ? |
They were in the moderation queue, I approved them. |
The |
Here's the rabbitmq-users thread and so far it looks like the client or one of the libraries on top unintentionally modifies the |
We have a similar setup to #161 where we publish messages into (unique) timeout queues that drop the messages back into their original queues for retrying.
Our setup started to loose messages after we updated to RabbitMQ 3.5.3. Looking at the logs we found this:
We were able to reproduce this everytime a message was to drop out of the timeout queue (toq) for the third time (hence the "gen2" part of the queue name in the logs).
Full log (names of queues and exchanges redacted):
The text was updated successfully, but these errors were encountered: