Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead lettering more than two times results in a crashed queue #216

Closed
riyad opened this issue Jul 7, 2015 · 39 comments
Closed

Dead lettering more than two times results in a crashed queue #216

riyad opened this issue Jul 7, 2015 · 39 comments
Assignees
Milestone

Comments

@riyad
Copy link

riyad commented Jul 7, 2015

We have a similar setup to #161 where we publish messages into (unique) timeout queues that drop the messages back into their original queues for retrying.

Our setup started to loose messages after we updated to RabbitMQ 3.5.3. Looking at the logs we found this:

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
** Generic server <0.678.0> terminating
** Last message in was {drop_expired,1}
** When Server state == 
[...]
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,'-group_by_queue_and_reason/1-fun-0-',3,[]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
     {rabbit_dead_letter,group_by_queue_and_reason,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]}]}

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
Restarting crashed queue 'toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c' in vhost '/'.

We were able to reproduce this everytime a message was to drop out of the timeout queue (toq) for the third time (hence the "gen2" part of the queue name in the logs).

Full log (names of queues and exchanges redacted):

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
** Generic server <0.678.0> terminating
** Last message in was {drop_expired,1}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"/">>,queue,
                           <<"toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                          false,false,none,
                          [{<<"x-message-ttl">>,signedint,9000},
                           {<<"x-dead-letter-exchange">>,longstr,
                            <<"REDACTED_EXCHANGE.dlx">>},
                           {<<"x-expires">>,signedint,10000},
                           {<<"x-dead-letter-routing-key">>,longstr,
                            <<"REDACTED_QUEUE_NAME">>}],
                          <0.678.0>,[],[],[],undefined,[],[],live},
                         none,false,rabbit_priority_queue,
                         {passthrough,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {1,
                            {[],
                             [{msg_status,0,
                               <<156,89,98,190,170,144,189,166,217,107,135,227,
                                 40,191,240,59>>,
                               {basic_message,
                                {resource,<<"/">>,exchange,<<>>},
                                [<<"toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>],
                                {content,60,
                                 {'P_basic',undefined,undefined,
                                  [{<<"x-message-id">>,longstr,
                                    <<"96aba311-624f-4dc0-802b-70138917816c">>},
                                   {<<"x-puka-delivery-tag">>,signedint,4},
                                   {<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"exchange">>,longstr,<<>>},
                                       {<<"time">>,timestamp,1435052830},
                                       {<<"queue">>,longstr,
                                        <<"toq-gen1-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"toq-gen1-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>}]}]},
                                     {table,
                                      [{<<"count">>,signedint,1},
                                       {<<"exchange">>,longstr,<<>>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"toq-gen0-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>}]},
                                       {<<"queue">>,longstr,
                                        <<"toq-gen0-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c">>},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"time">>,timestamp,1435052826}]}]},
                                   {<<"x-origin-queue">>,longstr,
                                    <<"REDACTED_QUEUE_NAME">>}],
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined,undefined,
                                  undefined,undefined,undefined},
                                 <<32,0,0,0,3,242,12,120,45,109,101,115,115,97,
                                   103,101,45,105,100,83,0,0,0,36,57,54,97,98,
                                   97,51,49,49,45,54,50,52,102,45,52,100,99,48,
                                   45,56,48,50,98,45,55,48,49,51,56,57,49,55,
                                   56,49,54,99,19,120,45,112,117,107,97,45,100,
                                   101,108,105,118,101,114,121,45,116,97,103,
                                   73,0,0,0,4,7,120,45,100,101,97,116,104,65,0,
                                   0,3,26,70,0,0,1,136,5,99,111,117,110,116,73,
                                   0,0,0,1,8,101,120,99,104,97,110,103,101,83,
                                   0,0,0,0,4,116,105,109,101,84,0,0,0,0,85,137,
                                   43,30,5,113,117,101,117,101,83,0,0,0,150,
                                   116,111,113,45,103,101,110,49,45,95,116,101,
                                   115,116,95,112,97,121,109,101,110,116,95,
                                   114,101,102,117,110,100,95,111,110,95,102,
                                   105,110,97,108,95,100,101,97,116,104,46,112,
                                   121,58,58,116,101,115,116,95,112,117,98,108,
                                   105,115,104,95,111,110,95,102,105,110,97,
                                   108,95,100,101,97,116,104,95,48,48,52,57,48,
                                   101,54,53,45,49,57,98,49,45,52,55,52,53,45,
                                   98,56,49,100,45,99,53,57,100,49,52,52,56,53,
                                   57,54,101,45,57,54,97,98,97,51,49,49,45,54,
                                   50,52,102,45,52,100,99,48,45,56,48,50,98,45,
                                   55,48,49,51,56,57,49,55,56,49,54,99,6,114,
                                   101,97,115,111,110,83,0,0,0,7,101,120,112,
                                   105,114,101,100,12,114,111,117,116,105,110,
                                   103,45,107,101,121,115,65,0,0,0,155,83,0,0,
                                   0,150,116,111,113,45,103,101,110,49,45,95,
                                   116,101,115,116,95,112,97,121,109,101,110,
                                   116,95,114,101,102,117,110,100,95,111,110,
                                   95,102,105,110,97,108,95,100,101,97,116,104,
                                   46,112,121,58,58,116,101,115,116,95,112,117,
                                   98,108,105,115,104,95,111,110,95,102,105,
                                   110,97,108,95,100,101,97,116,104,95,48,48,
                                   52,57,48,101,54,53,45,49,57,98,49,45,52,55,
                                   52,53,45,98,56,49,100,45,99,53,57,100,49,52,
                                   52,56,53,57,54,101,45,57,54,97,98,97,51,49,
                                   49,45,54,50,52,102,45,52,100,99,48,45,56,48,
                                   50,98,45,55,48,49,51,56,57,49,55,56,49,54,
                                   99,70,0,0,1,136,5,99,111,117,110,116,73,0,0,
                                   0,1,8,101,120,99,104,97,110,103,101,83,0,0,
                                   0,0,12,114,111,117,116,105,110,103,45,107,
                                   101,121,115,65,0,0,0,155,83,0,0,0,150,116,
                                   111,113,45,103,101,110,48,45,95,116,101,115,
                                   116,95,112,97,121,109,101,110,116,95,114,
                                   101,102,117,110,100,95,111,110,95,102,105,
                                   110,97,108,95,100,101,97,116,104,46,112,121,
                                   58,58,116,101,115,116,95,112,117,98,108,105,
                                   115,104,95,111,110,95,102,105,110,97,108,95,
                                   100,101,97,116,104,95,48,48,52,57,48,101,54,
                                   53,45,49,57,98,49,45,52,55,52,53,45,98,56,
                                   49,100,45,99,53,57,100,49,52,52,56,53,57,54,
                                   101,45,57,54,97,98,97,51,49,49,45,54,50,52,
                                   102,45,52,100,99,48,45,56,48,50,98,45,55,48,
                                   49,51,56,57,49,55,56,49,54,99,5,113,117,101,
                                   117,101,83,0,0,0,150,116,111,113,45,103,101,
                                   110,48,45,95,116,101,115,116,95,112,97,121,
                                   109,101,110,116,95,114,101,102,117,110,100,
                                   95,111,110,95,102,105,110,97,108,95,100,101,
                                   97,116,104,46,112,121,58,58,116,101,115,116,
                                   95,112,117,98,108,105,115,104,95,111,110,95,
                                   102,105,110,97,108,95,100,101,97,116,104,95,
                                   48,48,52,57,48,101,54,53,45,49,57,98,49,45,
                                   52,55,52,53,45,98,56,49,100,45,99,53,57,100,
                                   49,52,52,56,53,57,54,101,45,57,54,97,98,97,
                                   51,49,49,45,54,50,52,102,45,52,100,99,48,45,
                                   56,48,50,98,45,55,48,49,51,56,57,49,55,56,
                                   49,54,99,6,114,101,97,115,111,110,83,0,0,0,
                                   7,101,120,112,105,114,101,100,4,116,105,109,
                                   101,84,0,0,0,0,85,137,43,26,14,120,45,111,
                                   114,105,103,105,110,45,113,117,101,117,101,
                                   83,0,0,0,104,95,116,101,115,116,95,112,97,
                                   121,109,101,110,116,95,114,101,102,117,110,
                                   100,95,111,110,95,102,105,110,97,108,95,100,
                                   101,97,116,104,46,112,121,58,58,116,101,115,
                                   116,95,112,117,98,108,105,115,104,95,111,
                                   110,95,102,105,110,97,108,95,100,101,97,116,
                                   104,95,48,48,52,57,48,101,54,53,45,49,57,98,
                                   49,45,52,55,52,53,45,98,56,49,100,45,99,53,
                                   57,100,49,52,52,56,53,57,54,101>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<"REDACTED_MESSAGE_BODY">>]},
                                <<156,89,98,190,170,144,189,166,217,107,135,
                                  227,40,191,240,59>>,
                                false},
                               false,false,false,false,queue_index,
                               {message_properties,1435052839418192,false,
                                55}}]}},
                           1,
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/queues/BB5UQYHDZNVEPMDAZY48D2OBL",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             []},
                            undefined,0,65536,
                            #Fun<rabbit_variable_queue.2.117761292>,
                            #Fun<rabbit_variable_queue.3.48316793>,
                            {0,nil},
                            {0,nil}},
                           {undefined,
                            {client_msstate,msg_store_transient,
                             <<185,208,160,140,133,214,105,177,135,249,236,77,
                               148,196,41,229>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,286792,
                              "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/REDACTED_NODE_NAME/msg_store_transient",
                             <0.393.0>,290889,282695,294986,299083}},
                           false,0,1,55,0,0,0,infinity,1,0,0,55,0,0,
                           {rates,0.1294337302079736,0.0,0.0,0.0,
                            {1435,52831,419020}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,0,0}},
                         {state,
                          {queue,[],[],0},
                          {inactive,1435052830418220,9010,1.0}},
                         10000,undefined,undefined,
                         {erlang,#Ref<0.0.0.3748>},
                         {state,fine,5000,undefined},
                         {0,nil},
                         9000,
                         {erlang,#Ref<0.0.0.3756>},
                         1435052839418192,
                         {state,
                          {dict,1,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],
                             [[<0.639.0>|#Ref<0.0.0.3755>]],
                             [],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},
                          delegate},
                         {resource,<<"/">>,exchange,<<"REDACTED_EXCHANGE.dlx">>},
                         <<"REDACTED_QUEUE_NAME">>,
                         undefined,undefined,1,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,ensure_xdeath_event_count,2,[]},
     {rabbit_dead_letter,'-group_by_queue_and_reason/1-fun-0-',3,[]},
     {lists,foldl,3,[{file,"lists.erl"},{line,1197}]},
     {rabbit_dead_letter,group_by_queue_and_reason,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]}]}

=ERROR REPORT==== 23-Jun-2015::11:47:19 ===
Restarting crashed queue 'toq-gen2-REDACTED_QUEUE_NAME-96aba311-624f-4dc0-802b-70138917816c' in vhost '/'.
@videlalvaro
Copy link
Contributor

Thanks, I think I've found the reason, here:

https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L148
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L156

We assume that the <<"count">> field is long therefore the case clause failure.

@michaelklishin
Copy link
Member

Can you please provide a script that reproduces the issue?

On 7/7/2015, at 15:24, Alvaro Videla notifications@github.com wrote:

Thanks, I think I've found the reason, here:

https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L148
https://github.com/rabbitmq/rabbitmq-server/blob/master/src/rabbit_dead_letter.erl#L156

We assume that the <<"count">> field is long therefore the case clause failure.


Reply to this email directly or view it on GitHub.

@tiredpixel
Copy link

I can confirm this issue, which is also happening for us after upgrading from RabbitMQ 3.5.0 to 3.5.3 on Ubuntu Server 14.04 LTS, running in AWS EC2. Initially, I was running dist Erlang R16B03, but now I'm running Erlang R18. Interaction with RabbitMQ is using Ruby, through library Bunny 1.7.0, (with library amq-protocol 1.9.2).

Our logic is similar to that described by @riyad, where messages are published through a dead-letter exchange potentially multiple times. I reinstalled our broker yesterday, and all was running fine until this afternoon, when errors were reported in RabbitMQ logs, after which CPU immediately started increasing fairly steadily, until the Erlang beam.smp process reported almost maxing out the CPUs. Messages being published or consumed are currently very low, around 1 message per 5 minutes. Number of messages in all queues is low, under 20 total. We're using durable queues and persistent messages. Memory is low, far below the watermark.

Peace,
tiredpixel


Status of node 'rabbit@i-x.example.com' ...
[{pid,596},
 {running_applications,
     [{rabbitmq_management,"RabbitMQ Management Console","3.5.3"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.5.3"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.5.3"},
      {rabbit,"RabbitMQ","3.5.3"},
      {webmachine,"webmachine","1.10.3-rmq3.5.3-gite9359c7"},
      {mochiweb,"MochiMedia Web Server","2.7.0-rmq3.5.3-git680dba8"},
      {amqp_client,"RabbitMQ AMQP Client","3.5.3"},
      {xmerl,"XML parser","1.3.8"},
      {mnesia,"MNESIA  CXC 138 12","4.13"},
      {inets,"INETS  CXC 138 49","6.0"},
      {os_mon,"CPO  CXC 138 46","2.4"},
      {sasl,"SASL  CXC 138 11","2.5"},
      {stdlib,"ERTS  CXC 138 10","2.5"},
      {kernel,"ERTS  CXC 138 10","4.0"}]},
 {os,{unix,linux}},
 {erlang_version,
     "Erlang/OTP 18 [erts-7.0] [source] [64-bit] [smp:2:2] [async-threads:30] [kernel-poll:true]\n"},
 {memory,
     [{total,383717416},
      {connection_readers,1325456},
      {connection_writers,7694952},
      {connection_channels,125132600},
      {connection_other,85150160},
      {queue_procs,19523888},
      {queue_slave_procs,0},
      {plugins,20670368},
      {other_proc,7088664},
      {mnesia,1589408},
      {mgmt_db,34545264},
      {msg_index,450864},
      {other_ets,3566880},
      {binary,44084160},
      {code,20075250},
      {atom,785313},
      {other_system,12034189}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,6416846028},
 {disk_free_limit,50000000},
 {disk_free,1036529143808},
 {file_descriptors,
     [{total_limit,16284},
      {total_used,133},
      {sockets_limit,14653},
      {sockets_used,99}]},
 {processes,[{limit,1048576},{used,26879}]},
 {run_queue,1193},
 {uptime,87068}]

Cluster status of node 'rabbit@i-x.example.com' ...
[{nodes,[{disc,['rabbit@i-x.example.com']}]},
 {running_nodes,['rabbit@i-x.example.com']},
 {cluster_name,<<"rabbit@i-x.example.com">>},
 {partitions,[]}]

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.755.0> terminating
** Last message in was {drop_expired,1}
** When Server state == {q,
                         {amqqueue,
                          {resource,<<"x_messenger">>,queue,
                           <<"ha.x_messenger.app_x_queue_error">>},
                          true,false,none,
                          [{<<"x-dead-letter-exchange">>,longstr,
                            <<"ha.x_messenger.app_x_queue">>},
                           {<<"x-message-ttl">>,signedint,300000}],
                          <0.755.0>,[],[],[],undefined,[],[],live},
                         none,false,rabbit_priority_queue,
                         {passthrough,rabbit_variable_queue,
                          {vqstate,
                           {0,{[],[]}},
                           {0,{[],[]}},
                           {delta,undefined,0,undefined},
                           {0,{[],[]}},
                           {2,
                            {[{msg_status,16,
                               <<145,236,191,185,197,127,102,104,95,63,130,119,
                                 35,220,135,98>>,
                               {basic_message,
                                {resource,<<"x_messenger">>,exchange,
                                 <<"ha.x_messenger.app_x_queue_error">>},
                                [<<"ha.x_messenger.app_x_queue">>],
                                {content,60,
                                 {'P_basic',<<"application/octet-stream">>,
                                  undefined,
                                  [{<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"queue">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"time">>,timestamp,1436273833},
                                       {<<"exchange">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"ha.x_messenger.app_x_queue">>}]}]}]}],
                                  2,0,undefined,undefined,undefined,
                                  <<"79d42774-1346-4bc9-a08d-3ba45c61a79a">>,
                                  undefined,undefined,undefined,undefined,
                                  undefined},
                                 <<184,128,24,97,112,112,108,105,99,97,116,105,
                                   111,110,47,111,99,116,101,116,45,115,116,
                                   114,101,97,109,0,0,1,25,7,120,45,100,101,97,
                                   116,104,65,0,0,1,12,70,0,0,1,7,5,99,111,117,
                                   110,116,73,0,0,0,1,6,114,101,97,115,111,110,
                                   83,0,0,0,7,101,120,112,105,114,101,100,5,
                                   113,117,101,117,101,83,0,0,0,59,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                                                      99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,95,101,114,114,111,114,4,116,
                                   105,109,101,84,0,0,0,0,85,155,204,169,8,101,
                                   120,99,104,97,110,103,101,83,0,0,0,59,104,
                                   97,46,100,99,95,109,101,115,115,101,110,103,
                                   101,114,95,112,114,111,100,46,118,49,46,116,
                                   114,97,110,115,102,111,114,109,101,114,95,
                                   100,99,95,97,114,99,104,105,118,101,115,95,
                                   112,97,114,116,115,95,101,114,114,111,114,
                                   12,114,111,117,116,105,110,103,45,107,101,
                                   121,115,65,0,0,0,58,83,0,0,0,53,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,2,0,36,55,57,100,52,50,55,55,
                                   52,45,49,51,52,54,45,52,98,99,57,45,97,48,
                                   56,100,45,51,98,97,52,53,99,54,49,97,55,57,
                                   97>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<133,162,116,114,171,99,114,97,119,108,95,
                                    115,116,101,112,115,166,116,114,95,115,
                                    101,116,131,166,112,114,101,102,105,120,
                                    169,99,114,97,119,108,46,51,54,54,174,
                                    116,114,97,110,115,102,111,114,109,101,
                                    100,95,97,116,183,50,48,49,53,45,48,55,
                                    45,48,55,32,48,50,58,53,56,58,51,48,32,
                                    85,84,67,172,116,114,97,110,115,102,111,
                                    114,109,95,105,100,210,138,95,31,211,166,
                                    105,110,112,117,116,115,129,168,99,114,
                                    97,119,108,95,105,100,205,1,110,164,112,
                                    97,114,116,132,164,105,100,95,102,184,53,
                                    53,57,97,98,48,56,57,54,57,50,100,51,53,
                                    51,51,97,49,101,56,48,52,48,48,164,105,
                                    100,95,116,184,53,53,57,97,98,49,101,51,
                                    54,57,50,100,51,53,51,51,99,100,56,52,48,
                                    54,48,48,164,105,100,95,111,205,35,40,
                                    164,105,100,95,105,10,172,99,97,108,108,
                                    98,97,99,107,95,107,101,121,131,168,112,
                                    105,101,99,101,95,105,100,205,15,183,177,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,105,100,205,2,12,181,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,115,101,116,95,105,
                                    100,205,1,6>>]},
                                <<145,236,191,185,197,127,102,104,95,63,130,
                                  119,35,220,135,98>>,
                                true},
                               true,false,false,true,queue_index,
                               {message_properties,1436279640057485,true,
                                270}}],
                             [{msg_status,15,
                               <<168,84,203,115,201,232,154,241,51,33,92,99,
                                 248,170,24,198>>,
                               {basic_message,
                                {resource,<<"x_messenger">>,exchange,
                                                                 <<"ha.x_messenger.app_x_queue_error">>},
                                [<<"ha.x_messenger.app_x_queue">>],
                                {content,60,
                                 {'P_basic',<<"application/octet-stream">>,
                                  undefined,
                                  [{<<"x-death">>,array,
                                    [{table,
                                      [{<<"count">>,signedint,1},
                                       {<<"reason">>,longstr,<<"expired">>},
                                       {<<"queue">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"time">>,timestamp,1436238931},
                                       {<<"exchange">>,longstr,
                                        <<"ha.x_messenger.app_x_queue_error">>},
                                       {<<"routing-keys">>,array,
                                        [{longstr,
                                          <<"ha.x_messenger.app_x_queue">>}]}]}]}],
                                  2,0,undefined,undefined,undefined,
                                  <<"788e03ca-672a-4c46-8133-6a2fc2c4f4f4">>,
                                  undefined,undefined,undefined,undefined,
                                  undefined},
                                 <<184,128,24,97,112,112,108,105,99,97,116,105,
                                   111,110,47,111,99,116,101,116,45,115,116,
                                   114,101,97,109,0,0,1,25,7,120,45,100,101,97,
                                   116,104,65,0,0,1,12,70,0,0,1,7,5,99,111,117,
                                   110,116,73,0,0,0,1,6,114,101,97,115,111,110,
                                   83,0,0,0,7,101,120,112,105,114,101,100,5,
                                   113,117,101,117,101,83,0,0,0,59,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,95,101,114,114,111,114,4,116,
                                   105,109,101,84,0,0,0,0,85,155,68,83,8,101,
                                   120,99,104,97,110,103,101,83,0,0,0,59,104,
                                   97,46,100,99,95,109,101,115,115,101,110,103,
                                   101,114,95,112,114,111,100,46,118,49,46,116,
                                   114,97,110,115,102,111,114,109,101,114,95,
                                   100,99,95,97,114,99,104,105,118,101,115,95,
                                   112,97,114,116,115,95,101,114,114,111,114,
                                   12,114,111,117,116,105,110,103,45,107,101,
                                   121,115,65,0,0,0,58,83,0,0,0,53,104,97,46,
                                   100,99,95,109,101,115,115,101,110,103,101,
                                   114,95,112,114,111,100,46,118,49,46,116,114,
                                   97,110,115,102,111,114,109,101,114,95,100,
                                   99,95,97,114,99,104,105,118,101,115,95,112,
                                   97,114,116,115,2,0,36,55,56,56,101,48,51,99,
                                   97,45,54,55,50,97,45,52,99,52,54,45,56,49,
                                   51,51,45,54,97,50,102,99,50,99,52,102,52,
                                   102,52>>,
                                 rabbit_framing_amqp_0_9_1,
                                 [<<133,162,116,114,171,99,114,97,119,108,95,
                                    115,116,101,112,115,166,116,114,95,115,
                                    101,116,131,166,112,114,101,102,105,120,
                                    169,99,114,97,119,108,46,51,54,54,174,
                                    116,114,97,110,115,102,111,114,109,101,
                                                                        100,95,97,116,183,50,48,49,53,45,48,55,
                                    45,48,55,32,48,50,58,53,56,58,51,48,32,
                                    85,84,67,172,116,114,97,110,115,102,111,
                                    114,109,95,105,100,210,138,95,31,211,166,
                                    105,110,112,117,116,115,129,168,99,114,
                                    97,119,108,95,105,100,205,1,110,164,112,
                                    97,114,116,132,164,105,100,95,102,184,53,
                                    53,57,97,98,53,102,53,54,57,50,100,51,53,
                                    51,53,48,49,54,53,48,55,48,48,164,105,
                                    100,95,116,184,53,53,57,97,98,55,51,57,
                                    54,57,50,100,51,57,54,50,49,98,55,48,48,
                                    54,48,48,164,105,100,95,111,205,50,200,
                                    164,105,100,95,105,14,172,99,97,108,108,
                                    98,97,99,107,95,107,101,121,131,168,112,
                                    105,101,99,101,95,105,100,205,15,187,177,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,105,100,205,2,12,181,
                                    116,114,97,110,115,102,111,114,109,97,
                                    116,105,111,110,95,115,101,116,95,105,
                                    100,205,1,6>>]},
                                <<168,84,203,115,201,232,154,241,51,33,92,99,
                                  248,170,24,198>>,
                                true},
                               true,false,false,true,queue_index,
                               {message_properties,1436279449181213,true,
                                270}}]}},
                           17,
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {qistate,
                            "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/queues/69RXCRA0YQFLNESM4YE1XHF0G",
                            {{dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             [{segment,0,
                               "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/queues/69RXCRA0YQFLNESM4YE1XHF0G/0.idx",
                               {array,16384,0,undefined,100000},
                               2}]},
                            #Ref<0.0.1572866.70861>,0,65536,
                            #Fun<rabbit_variable_queue.2.117761292>,
                            #Fun<rabbit_variable_queue.3.48316793>,
                            {0,nil},
                            {0,nil}},
                           {{client_msstate,msg_store_persistent,
                             <<13,24,107,90,238,140,22,42,193,193,247,117,63,
                               25,47,49>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,225357,
                              "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_persistent"},
                                                           rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_persistent",
                             <0.240.0>,229454,221260,233551,237648},
                            {client_msstate,msg_store_transient,
                             <<101,74,79,173,74,7,174,242,180,223,118,182,139,
                               198,89,135>>,
                             {dict,0,16,16,8,80,48,
                              {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                               []},
                              {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                []}}},
                             {state,204872,
                              "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_transient"},
                             rabbit_msg_store_ets_index,
                             "/var/lib/rabbitmq/mnesia/rabbit@i-x.example.com/msg_store_transient",
                             <0.233.0>,208969,200775,213066,217163}},
                           true,0,2,540,0,2,540,infinity,2,0,0,540,0,0,
                           {rates,0.005228732197132713,
                            1.7370371540135685e-303,1.7370371540135685e-303,
                            1.7370371540135685e-303,
                            {1436,279341,778502}},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           {0,nil},
                           0,0,0,17}},
                         {state,
                          {queue,[],[],0},
                          {inactive,1436238631386029,41942114405,1.0}},
                         undefined,undefined,undefined,undefined,
                         {state,fine,5000,undefined},
                         {0,nil},
                         300000,
                         {erlang,#Ref<0.0.3932161.80107>},
                         1436279449181213,
                         {state,
                          {dict,2,16,16,8,80,48,
                           {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
                           {{[],[],[],[],
                             [[<0.3455.6>|#Ref<0.0.3932161.80106>]],
                             [],[],[],[],[],[],[],
                             [[<0.3479.6>|#Ref<0.0.3932161.86474>]],
                             [],[],[]}}},
                          delegate},
                         {resource,<<"x_messenger">>,exchange,
                          <<"ha.x_messenger.app_x_queue">>},
                         undefined,undefined,undefined,1,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
          {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7229.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7229.6>,[],[],[],undefined,[],[],crashed},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449185300,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7231.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                                                               <0.7231.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449191151,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7233.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7233.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449196370,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                                                                    delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7235.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7235.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449201433,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
Restarting crashed queue 'ha.x_messenger.app_x_queue_error' in vhost 'x_messenger'.

=ERROR REPORT==== 7-Jul-2015::14:30:49 ===
** Generic server <0.7237.6> terminating
** Last message in was {'$gen_cast',init}
** When Server state == {q,{amqqueue,
                               {resource,<<"x_messenger">>,queue,
                                   <<"ha.x_messenger.app_x_queue_error">>},
                               true,false,none,
                               [{<<"x-dead-letter-exchange">>,longstr,
                                 <<"ha.x_messenger.app_x_queue">>},
                                {<<"x-message-ttl">>,signedint,300000}],
                               <0.7237.6>,[],[],[],undefined,[],[],live},
                           none,false,undefined,undefined,
                           {state,
                               {queue,[],[],0},
                               {active,1436279449206805,1.0}},
                           undefined,undefined,undefined,undefined,
                           {state,fine,5000,undefined},
                           {0,nil},
                           undefined,undefined,undefined,
                           {state,
                               {dict,0,16,16,8,80,48,
                                   {[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                    [],[]},
                                   {{[],[],[],[],[],[],[],[],[],[],[],[],[],
                                     [],[],[]}}},
                               delegate},
                           undefined,undefined,undefined,undefined,0,running}
** Reason for termination ==
** {{case_clause,{value,{<<"count">>,signedint,1}}},
    [{rabbit_dead_letter,x_death_event_key,3,[]},
     {rabbit_dead_letter,increment_xdeath_event_count,1,[]},
     {rabbit_dead_letter,update_x_death_header,2,[]},
     {rabbit_dead_letter,'-make_msg/5-fun-2-',8,[]},
     {rabbit_basic,map_headers,2,[]},
     {rabbit_dead_letter,make_msg,5,[]},
     {rabbit_dead_letter,publish,5,[]},
     {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',7,[]}]}

@michaelklishin
Copy link
Member

We understand what the issue is but not what causes it. Can someone post a small sample that reproduces the issue? We will turn it into a test case.

@tiredpixel
Copy link

Sorry, I don't have a sample I'm able to post easily at present. But I can add to the above that it does indeed seem to be rabbit_dead_letter.erl increment_xdeath_event_count case x_death_event_key(Info, <<"count">>, long) of line; when I patched that to disable the attempt to read and increment <<"count">> (which we're not currently using), instead following the undefined branch of the case, the broker freed up and CPU usage went right back down with existing messages being processed. My apologies for not being able to post a test case code sample.

Peace, tiredpixel

@michaelklishin
Copy link
Member

@tiredpixel thanks. A more specific question: do your apps set the count header? Or is it only ever touched by RabbitMQ on dead-lettering?

@tiredpixel
Copy link

@michaelklishin, our apps don't set or use the count header at all. It's only ever touched by RabbitMQ on dead-lettering. I noted that <<"count">> seems to be signedint at that point, but I wasn't able to spot why it's such rather than a long. As far as I got (before patching RabbitMQ as above) was that this seems to be for messages which have gone through the dead-letter exchange a few times, by which time <<"count">> appears to be already set—I suspect following update_x_death_header Info1 [{table, M}] matches branch being followed, if that helps.

@michaelklishin
Copy link
Member

@tiredpixel thanks. We do have tests that result in a message being dead-lettered multiple times. I have a couple of ideas. 3.5.4 won't be released without a fix for this.

@riyad
Copy link
Author

riyad commented Jul 7, 2015

You can provoke the crash of the queue with the Python script found in https://gist.github.com/riyad/7439eb545dabf287bcc9.
It requires you to install the Puka library and update the AMQP_URL in the script to point to your RabbitMQ.

@michaelklishin
Copy link
Member

@riyad perfect, thank you!

@tiredpixel
Copy link

@michaelklishin, understood; thank you for looking into it. :)
@riyad, thank you for contributing a test case to this investigation. :)

Peace, tiredpixel

@michaelklishin
Copy link
Member

I can reproduce the issue. Looking into it.

@riyad
Copy link
Author

riyad commented Jul 8, 2015

FYI: I fixed a small issue with the demo script. It was missing a wait() when publishing the first message. 😅

see: https://gist.github.com/riyad/7439eb545dabf287bcc9/revisions

@michaelklishin
Copy link
Member

Cheers.

@michaelklishin
Copy link
Member

@riyad I have a fix, does this output confirm correct execution? https://gist.github.com/michaelklishin/97355747f6b85dfe3aae — the exception is gone.

@riyad
Copy link
Author

riyad commented Jul 8, 2015

Basically there should be an additional x-death entry for every time the message is deadlettered.
If there is a message in the queue at the end of the script with all the x-deaths everything should be OK.
The important part is, the message must not be lost.

@michaelklishin
Copy link
Member

I see

rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   3

after 3 runs, so it must be working as expected.

@riyad
Copy link
Author

riyad commented Jul 8, 2015

@michaelklishin To be sure it would be nice if you could provide the output of the script with an empty queue.

@michaelklishin
Copy link
Member

λ ~/Development/RabbitMQ/debug/ rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   0
λ ~/Development/RabbitMQ/debug/ python server-216.py
connecting ...
setting up ...
publishing ...
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 7, 'routing_key': 'test.issue_216', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 1}, 'redelivered': False, 'delivery_tag': 1}
declaring timeout queue toq-gen0-test.issue_216_crashing_queue
publishing to toq-gen0-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 7, 'routing_key': 'test.issue_216', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 1}, 'redelivered': False, 'delivery_tag': 1}
fetched message: {'cluster_id': '', 'empty': True}
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 11, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 2, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue']}]}, 'redelivered': False, 'delivery_tag': 2}
declaring timeout queue toq-gen1-test.issue_216_crashing_queue
publishing to toq-gen1-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 11, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 2, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue']}]}, 'redelivered': False, 'delivery_tag': 2}
fetched message: {'cluster_id': '', 'empty': True}
fetched message: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 15, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 3, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen1-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen1-test.issue_216_crashing_queue']}, {'count': 1, 'exchange': '', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue'], 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56)}]}, 'redelivered': False, 'delivery_tag': 3}
declaring timeout queue toq-gen2-test.issue_216_crashing_queue
publishing to toq-gen2-test.issue_216_crashing_queue: {'body': 'foo', 'exchange': 'amq.topic', 'promise_number': 15, 'routing_key': 'test.issue_216_crashing_queue', 'message_count': 0, 'headers': {'x-puka-delivery-tag': 3, 'x-death': [{'count': 1L, 'exchange': '', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56), 'queue': 'toq-gen1-test.issue_216_crashing_queue', 'reason': 'expired', 'routing-keys': ['toq-gen1-test.issue_216_crashing_queue']}, {'count': 1, 'exchange': '', 'routing-keys': ['toq-gen0-test.issue_216_crashing_queue'], 'queue': 'toq-gen0-test.issue_216_crashing_queue', 'reason': 'expired', 'time': datetime.datetime(2015, 7, 8, 10, 53, 56)}]}, 'redelivered': False, 'delivery_tag': 3}
λ ~/Development/RabbitMQ/debug/ rabbitmqctl list_queues
Listing queues ...
test.issue_216_crashing_queue   1

@riyad
Copy link
Author

riyad commented Jul 8, 2015

Looks good :)

@michaelklishin
Copy link
Member

So the culprit seems to be the fact that the script modifies headers (see the filter_headers function). That's why our existing test suite didn't catch this and this is client-specific.

@riyad
Copy link
Author

riyad commented Jul 8, 2015

Hmm ... so you're saying Puka is to blame? Am I correct that the issue is that header fields were encoded with the "wrong" type when republishing (even when they're superficially the same)?

@michaelklishin
Copy link
Member

Not Puka but I could only trigger this with Puka, it serializes integers differently from the Java client (e.g. JVM distinguishes between integers and longs, and many dynamically-typed languages use automatic promotion on overflow).

michaelklishin added a commit that referenced this issue Jul 8, 2015
Be less assertive about x-death value types, fixes #216
@michaelklishin
Copy link
Member

OK, a decent fix is in #221. We'll release 3.5.4 RC very soon and if new edge cases are found, will add them to Bunny's test suite (one of the client test suites we use).

@riyad
Copy link
Author

riyad commented Jul 8, 2015

Thank a lot for fixing this so quickly. 😄

@videlalvaro
Copy link
Contributor

Saw the fix. I think you can still check if the client sent one of the
AMQP's "integer types". See here for an example
https://github.com/rabbitmq/rabbitmq-delayed-message-exchange/blob/master/src/rabbit_delayed_message_utils.erl#L39
we do the same when we handle queue arguments inside rabbit_amqqueue I
think.
On Wed, Jul 8, 2015 at 1:53 PM Michael Klishin notifications@github.com
wrote:

Closed #216 #216 via
508d76d
508d76d
.


Reply to this email directly or view it on GitHub
#216 (comment).

@tiredpixel
Copy link

@michaelklishin, oh, such a speedy patch—very much obliged to you! :) I'll look forward to 3.5.4 RC. :)

@michaelklishin
Copy link
Member

@videlalvaro that would require extracting validators into rabbit_misc or similar. We are under pressure to release 3.5.4 and the clusterer, so we'll do what you suggest for 3.5.5.

@videlalvaro
Copy link
Contributor

I'll just want to prevent data corruption/future crashes if the user sends
some stuff that's not even an int anymore. In that case if we do some math
with the count header, then we will have another crash.
On Wed, Jul 8, 2015 at 10:21 PM Michael Klishin notifications@github.com
wrote:

@videlalvaro https://github.com/videlalvaro that would require
extracting validators into rabbit_misc or similar. We are under pressure
to release 3.5.4 and the clusterer, so we'll do what you suggest for 3.5.5
.


Reply to this email directly or view it on GitHub
#216 (comment)
.

@michaelklishin
Copy link
Member

If the user modifies x-death events then it counts as a user error. I'm not sure what we can do if "count" is something other than a numerical value. Ignore it entirely?

@videlalvaro
Copy link
Contributor

Reset to default?
On Wed, Jul 8, 2015 at 10:48 PM Michael Klishin notifications@github.com
wrote:

If the user modifies x-death events then it counts as a user error. I'm
not sure what we can do if "count" is something other than a numerical
value. Ignore it entirely?


Reply to this email directly or view it on GitHub
#216 (comment)
.

@seetharamireddy540
Copy link

hi,
I am using Rabbit MQ 3.5.4 and experiencing the same issues ?

=ERROR REPORT==== 14-Mar-2016::20:04:20 ===
** Generic server <0.1073.0> terminating
** Last message in was {'$gen_cast',init}


                       undefined,undefined,undefined,undefined,0,running}

** Reason for termination ==
** {{case_clause,{<<"x-death">>,longstr,
<<"[{reason=expired,

Can you pls suggest me how i can delete the queue ? I am not able to delete this queue from Admin Console.

@michaelklishin
Copy link
Member

restarting the node should make it possible to delete the queue before it hits the same exception again. The right thing to do is to upgrade.

On 15 mar 2016, at 3:41, Ram notifications@github.com wrote:

hi,
I am using Rabbit MQ 3.5.4 and experiencing the same issues ?

=ERROR REPORT==== 14-Mar-2016::20:04:20 ===
** Generic server <0.1073.0> terminating
** Last message in was {'$gen_cast',init}

                   undefined,undefined,undefined,undefined,0,running}

** Reason for termination ==
** {{case_clause,{<<"x-death">>,longstr,
<<"[{reason=expired,

Can you pls suggest me how i can delete the queue ? I am not able to delete this queue from Admin Console.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#216 (comment)

@pdoreau
Copy link

pdoreau commented Nov 14, 2016

Hello. I'm using v3.6.5 (also tested from 3.5.6) and retry queues (task => retry1 (dl/ttl) => task => retry2 (dl/ttl) => task => retry3 (dl/ttl) => dl).
Dead lettering the 2nd time causes this error :

 =ERROR REPORT==== 14-Nov-2016::11:09:53 ===
 ** Generic server <0.932.0> terminating
 ** Last message in was {drop_expired,1}
...
=CRASH REPORT==== 14-Nov-2016::11:09:53 ===
   crasher:
     initial call: rabbit_prequeue:init/1
     pid: <0.932.0>
     registered_name: []
     exception exit: {function_clause,
                      [{rabbit_dead_letter,
                        '-queue_and_reason_matcher/2-fun-1-',
                        [{array,
                          [{array,[{longstr,<<"l">>},{longstr,<<"1">>}]},
                           {array,[{longstr,<<"S">>},{longstr,<<"expired">>}]},
                           {array,
                            [{longstr,<<"S">>},
                             {longstr,
                              <<"mytask_retry_1">>}]},
                           {array,
                            [{longstr,<<"T">>},{longstr,<<"1479121758">>}]},
                           {array,[{longstr,<<"S">>},{longstr,<<"retry">>}]},
                           {array,
                            [{longstr,<<"A">>},
                             {array,
                              [{longstr,
                                <<"mytask_retry_1">>}]}]}]}],
                        [{file,"src/rabbit_dead_letter.erl"},{line,178}]},
                       {lists,partition,4,[{file,"lists.erl"},{line,1302}]},
                       {rabbit_dead_letter,update_x_death_header,2,
                        [{file,"src/rabbit_dead_letter.erl"},{line,127}]},
                       {rabbit_dead_letter,'-make_msg/5-fun-2-',8,
                        [{file,"src/rabbit_dead_letter.erl"},{line,65}]},
                       {rabbit_basic,map_headers,2,
                        [{file,"src/rabbit_basic.erl"},{line,258}]},
                       {rabbit_dead_letter,make_msg,5,
                        [{file,"src/rabbit_dead_letter.erl"},{line,68}]},
                       {rabbit_dead_letter,publish,5,
                        [{file,"src/rabbit_dead_letter.erl"},{line,34}]},
                       {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',
                        7,
                        [{file,"src/rabbit_amqqueue_process.erl"},
                         {line,848}]}]}
       in function  gen_server2:terminate/3 (src/gen_server2.erl, line 1143)
     ancestors: [<0.931.0>,rabbit_amqqueue_sup_sup,rabbit_sup,<0.140.0>]
     messages: []
     links: [<0.931.0>]
     dictionary: [{{credit_to,<0.1089.0>},49},
                   {{xtype_to_module,direct},rabbit_exchange_type_direct},
                   {process_name,
                       {rabbit_amqqueue_process,
                           {resource,<<"myvhost">>,queue,
                               <<"mytask_retry_2">>}}},
                   {rand_seed,
                       {#{max => 288230376151711743,
                          next => #Fun<rand.8.41921595>,
                          type => exsplus,
                          uniform => #Fun<rand.9.41921595>,
                          uniform_n => #Fun<rand.10.41921595>},
                        [239153173037508270|193202472509615284]}},
                   {guid,{{2457094499,2660964459,2278303215,3311440446},1}},
                   {credit_flow_default_credit,{200,50}}]
     trap_exit: true
     status: running
     heap_size: 2586
     stack_size: 27
     reductions: 4190
   neighbours:
...

Any way to make it work ?

@michaelklishin
Copy link
Member

@pdoreau please post an code snippet that reproduces it to rabbitmq-users, our public mailing list

@pdoreau
Copy link

pdoreau commented Nov 14, 2016

@michaelklishin I posted a message on the user list but it doestn't appear.
May be not so usefull to post a snippet as I rely on several layers (swarrot, amqplib).
However, I noticed nested arrays are given to Amqplib to publish the retry2 message:

...
["x-death"]=>
      array(2) {
        [0]=>
        string(1) "A"
        [1]=>
        array(1) {
          [0]=>
          array(6) {
            ["count"]=>
            array(2) {
              [0]=>
              string(1) "l"
              [1]=>
              string(1) "1"
            }
            ["reason"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(7) "expired"
            }
            ["queue"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(36) "mytask_retry_1"
            }
            ["time"]=>
            array(2) {
              [0]=>
              string(1) "T"
              [1]=>
              string(10) "1479142401"
            }
            ["exchange"]=>
            array(2) {
              [0]=>
              string(1) "S"
              [1]=>
              string(5) "retry"
            }
            ["routing-keys"]=>
            array(2) {
              [0]=>
              string(1) "A"
              [1]=>
              array(1) {
                [0]=>
                string(36) "mytask_retry_1"
              }
            }
          }
        }
      }
...

Can that be the cause ?

@michaelklishin
Copy link
Member

They were in the moderation queue, I approved them.

@michaelklishin
Copy link
Member

The x-death value is an array of objects (in JSON terms), so we need to compare what your app is sending. It is certainly possible for apps to confuse RabbitMQ by sending invalid modified headers, we have seen this several times before.

@michaelklishin
Copy link
Member

Here's the rabbitmq-users thread and so far it looks like the client or one of the libraries on top unintentionally modifies the x-death header which is meant to be modified only by RabbitMQ itself. In any case, this is not the same issue.

@rabbitmq rabbitmq locked and limited conversation to collaborators Nov 14, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants