Replies: 17 comments 26 replies
-
Hello, and thanks for using RabbitMQ. As I'm sure you found via a search, this is not an error that has been reported, ever... https://www.google.com/search?q=%22message_not_understood%22+rabbitmq ...so it must be unique to your environment. If you can provide actual reproduction steps, that would be very helpful. The steps you provided are insufficient. It might be helpful to have your full log files instead of a few lines. If you can provide them, ATTACH them to this discussion in compressed form. Your code should handle channel closures, obviously. |
Beta Was this translation helpful? Give feedback.
-
@GitJadhav no, this error cannot be caused by a "corrupt frame" because the (Erlang) process that runs into it is the process that takes a frame data structure and serializes it, then writes it to the connection socket. Somehow in your case it receives a different message which it does not know how to format because it is not a frame ( In general, client connections can fail for all kinds of reasons, your applications must be ready to handle that. That's why connection recovery sequence is a feature some clients support and other client libraries usually document (some authors think it should not be a client library feature because it won't always work perfectly well for every app). This is particularly true for consumers. You cannot claim that "due to this exception our system crashes", if your system cannot handle a connection or channel closure, it will eventually crash for a broad range of other reasons, be it a network connectivity loss on the host that hosts the app, or a genuine channel exception, it's only a matter of time. |
Beta Was this translation helpful? Give feedback.
-
Alternatively, a frame writer could log a warning and keep going instead of immediately terminating like it does today. I'm sure some members of the core team would not agree with this solution. That won't change the fact that your applications must handle connection failure. |
Beta Was this translation helpful? Give feedback.
-
Please find rabbitMQ logs with the {writer, message_not_understand} exception |
Beta Was this translation helpful? Give feedback.
-
@lukebakken @michaelklishin I really appreciate your time in reviewing this situation and sharing some insights We are deliberately shutting down on our side to as we see the 541 - INTERNAL ERROR as a red flag |
Beta Was this translation helpful? Give feedback.
-
Just want to add that we've seen this strange Only info that I have:
In RabbitMQ I only found one place in |
Beta Was this translation helpful? Give feedback.
-
@lukebakken Sharing some insights on the failure episodes - We have seen this failure 25-30 times since 10/7 to 11/22 Our RabbitMQ Specs
PATTERN ANALYSIS
As we were seeing busy_dist_port errors -- I have requested to enable tuning on the buffering side ( set MAX values ) to confirm if this will provide enough internal buffering resources to handle big payloads ( >2MB) 3.Trigger Source
5.We do see this erlang crash log captured in RabbitMQ log folder - Will that give us any leads ?? |
Beta Was this translation helpful? Give feedback.
-
I have opened the following issue: Please read the questions I have asked above and supply additional information in issue #9991. Thank you! |
Beta Was this translation helpful? Give feedback.
-
@lukebakken Responding to your queries QUESTION |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin @lukebakken RABBITMQ LOGS
As we are able to replicate this issue only in PROD ENV , and we were still seeing errors in logs RESULTS |
Beta Was this translation helpful? Give feedback.
-
Referring to this Update The goal at this point is to avoid connection termination. Those who would like to investigate these messages further now have a place to add extra logging or tracing. What Rabbit and Erlang version has this extra debug logging or tracing capabilities ? Current Rabbit/Erlang version As this RabbitMQ version 3.11.7 is outdated |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin @lukebakken |
Beta Was this translation helpful? Give feedback.
-
Nothing prevents us from shipping a 3.11 release in 2024 if we choose to.
…On Tue, 5 Dec 2023 at 13:08, GitJadhav ***@***.***> wrote:
@michaelklishin <https://github.com/michaelklishin> @lukebakken
<https://github.com/lukebakken>
The following fix was merged into 3.11.27
(see rabbit_writer: ignore unknown messages (backport #9994) (backport
#9996) by mergify[bot] ·
Pull Request #10005 · rabbitmq/rabbitmq-server
<#10005> ) with the goal
to ‘no longer kill the connection’ or terminate the connection .
I also see that “community support for 3.11” ends dec 31 2023.
Any ETA on when we can use the RabbitMQ version with the above fix - Will
we have release 3.11.27 before Dec 31 ??
—
Reply to this email directly, view it on GitHub
<#9803 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAIQWOXET6KUHILNT7FT3YH5PKTAVCNFSM6AAAAAA6R7ZFI6VHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM3TONRXGU2DI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***
com>
--
This electronic communication and the information and any files transmitted
with it, or attached to it, are confidential and are intended solely for
the use of the individual or entity to whom it is addressed and may contain
information that is confidential, legally privileged, protected by privacy
laws, or otherwise restricted from disclosure to anyone else. If you are
not the intended recipient or the person responsible for delivering the
e-mail to the intended recipient, you are hereby notified that any use,
copying, distributing, dissemination, forwarding, printing, or copying of
this e-mail is strictly prohibited. If you received this e-mail in error,
please return the e-mail to the sender, delete it from your computer, and
destroy any printed copy of it.
|
Beta Was this translation helpful? Give feedback.
-
@michaelklishin |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin I had a follow up question - As we are planning to upgrade our dependency stack to Rabbit 3.12.11 , with the consideration of 3.11.x EOL timelines |
Beta Was this translation helpful? Give feedback.
-
The latest comments from @gomoripeti referring to another customer observation link FYI.. @gomoripeti is saying stray message {Ref,Ok} is coming from Erlang/OTP ,like the inet driver or similar and that it might be fixed in a newer version than 25.3. With that mention - I have the following clarification - Appreciate if you can provide us an path forward |
Beta Was this translation helpful? Give feedback.
-
@michaelklishin @lukebakken |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
The following rabbitmq error is causing the NodeJS to crash and has led to the crash of our platform, where rabbitmq is saying I am going to abort this channel and nodejs is not designed to handle this exception.
The reason to pull a plug as it is “writer,message_not_understood” is an internal message and there could be several reasons for this including frame is corrupted.
Technically - whenever rabbitmq aborts a channel (i.e calls shutdown on channel ) and that message is invoked by our node socket library, node will crash leading to our platform crash.
Reproduction steps
1.Create a 3 Node RabbitMQ Cluster ( 3.11.7 )
2.Running with RabbitMQ defaults ( memory 40% , buffering - 128MB )
3.Our Platform crashed due to the following RabbitMQ Error
...
2023-10-25 05:59:13.610193-07:00 [error] <0.1010.0> {writer,message_not_understood,{#Ref<0.4068250614.1356857350.73626>,ok}}
2023-10-25 05:59:13.610367-07:00 [warning] <0.1010.0> Non-AMQP exit reason '{writer,message_not_understood,
2023-10-25 05:59:13.610367-07:00 [warning] <0.1010.0> {#Ref<0.4068250614.1356857350.73626>,ok}}'
Expected behavior
RabbitMQ Error must clarify the root cause of the symptom
Additional context
This rabbitmq message "writer,message_not_understood" is invoked by our node socket library, node will crash leading to our product platform crash .
Trying to understand this symptom and get some feedback from the community - if somebody has witnessed this same error.
Beta Was this translation helpful? Give feedback.
All reactions