Unbound memory consumption on stats node #302

djcrumple · 2016-11-28T17:48:53Z

OS: Windows
Version: 3.6.5

I have a 3 node cluster, and always run out of memory on the stats node if I enabled stats collection. I've tried reducing the retention policies as far as possible, but stats still uses all the memory on the node. During testing, the non-stats nodes use about 1 GB of memory. The stats node uses more than 6 GB of memory before I kill the test.

Here's the most recent settings I've tried. If the only retention policy is 60 seconds long, should I expect the stats node to level off memory usage after 60 seconds?

{collect_statistics, coarse}   <-- Even though I set this to coarse, it always changes to fine. 
{collect_statistics_interval, 5000}
{rates_mode, basic}
{sample_retention_policies,
    [{global,   [{60, 5}]},
     {basic,    []},
     {detailed, []}]}

Here is the status of the node 15 minutes after starting the test. During the test, I create a fixed number of queues and consumers. Memory usage continues to increase long after all queues/consumers have been created. Producers are using direct reply-to queues. Does that impact stats?

Status of node rabbit@RMQTest01 ...

[{pid,4008},
 {running_applications,
     [{rabbitmq_tracing,"RabbitMQ message logging / tracing","3.6.5"},
      {rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
      {rabbit,"RabbitMQ","3.6.5"},
      {mnesia,"MNESIA  CXC 138 12","4.14.1"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.5"},
      {rabbit_common,[],"3.6.5"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {xmerl,"XML parser","1.3.12"},
      {os_mon,"CPO  CXC 138 46","2.4.1"},
      {compiler,"ERTS  CXC 138 10","7.0.2"},
      {ssl,"Erlang/OTP SSL application","8.0.2"},
      {public_key,"Public key infrastructure","1.2"},
      {crypto,"CRYPTO","3.7.1"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {inets,"INETS  CXC 138 49","6.3.3"},
      {asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},
      {syntax_tools,"Syntax tools","2.1"},
      {sasl,"SASL  CXC 138 11","3.0.1"},
      {stdlib,"ERTS  CXC 138 10","3.1"},
      {kernel,"ERTS  CXC 138 10","5.1"}]},
 {os,{win32,nt}},
 {erlang_version,
     "Erlang/OTP 19 [erts-8.1] [64-bit] [smp:4:4] [async-threads:64]\n"},
 {memory,
     [{total,6391440880},
      {connection_readers,113192864},
      {connection_writers,64656520},
      {connection_channels,516457952},
      {connection_other,203202816},
      {queue_procs,237182648},
      {queue_slave_procs,0},
      {plugins,59473568},
      {other_proc,50449040},
      {mnesia,18830944},
      {mgmt_db,4317939224},
      {msg_index,3071120},
      {other_ets,373931992},
      {binary,341174960},
      {code,24923912},
      {atom,1033401},
      {other_system,65919919}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,6870704128},
 {disk_free_limit,50000000},
 {disk_free,120430325760},
 {file_descriptors,
     [{total_limit,8092},
      {total_used,4042},
      {sockets_limit,7280},
      {sockets_used,4040}]},
 {processes,[{limit,1048576},{used,122000}]},
 {run_queue,1},
 {uptime,1300},
 {kernel,{net_ticktime,60}}]

The text was updated successfully, but these errors were encountered:

michaelklishin · 2016-11-28T17:52:20Z

Please post questions to rabbitmq-users or Stack Overflow. RabbitMQ uses GitHub issues for specific actionable items engineers can work on, not questions. Thank you.

michaelklishin · 2016-11-28T17:57:11Z

This is the most commonly expressed/asked complained/question this year. You will find dozens of threads about this in rabbitmq-users archives. A few things worth pointing out:

This will not be addressed because Make statistics collection and aggregation distributed across all cluster nodes #236 is nearly merged and will be in 3.6.7. For multi-node clusters, this is the only solution that can work in the medium term.
collect_statistics hasn't been effective for several feature versions. The plugin will log a warning if it is used. If you want to disable rates, set rates_mode to none. This is documented.
As discussed many times on rabbitmq-users and now mentioned in the docs, the thing that affects stats DB load most is the number of stats emitting entities (see the docs) and collection interval. You have it set to 5 seconds, which is useful in development environments but rarely needed when there is no human constantly using the UI. Set it to 30 or 60 seconds, it will reduce the load on the management plugin up to 6 or 12 times (compared to 5 seconds).
It is safe to restart the stats DB since all of its contents is entirely transient. How to do this is documented and has been mentioned dozens if not hundreds times on rabbitmq-users.

michaelklishin · 2016-11-28T18:02:06Z

Direct reply-to has no effect whatsoever on the stats DB. The fact that apps no longer declare any queues or exchanges or bindings doesn't mean that there are no more stats-emitting entities being created: connections and channels are the primary contributors and those can be leaked, for example. That's why rabbit.channel_max and features such as rabbitmq/rabbitmq-server#500 were introduced.

michaelklishin closed this as completed Nov 28, 2016

michaelklishin added the mailing list material label Nov 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unbound memory consumption on stats node #302

Unbound memory consumption on stats node #302

djcrumple commented Nov 28, 2016 •

edited by michaelklishin

Loading

michaelklishin commented Nov 28, 2016

michaelklishin commented Nov 28, 2016 •

edited

Loading

michaelklishin commented Nov 28, 2016 •

edited

Loading

Unbound memory consumption on stats node #302

Unbound memory consumption on stats node #302

Comments

djcrumple commented Nov 28, 2016 • edited by michaelklishin Loading

michaelklishin commented Nov 28, 2016

michaelklishin commented Nov 28, 2016 • edited Loading

michaelklishin commented Nov 28, 2016 • edited Loading

djcrumple commented Nov 28, 2016 •

edited by michaelklishin

Loading

michaelklishin commented Nov 28, 2016 •

edited

Loading

michaelklishin commented Nov 28, 2016 •

edited

Loading