Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

Unbound memory consumption on stats node #302

Closed
djcrumple opened this issue Nov 28, 2016 · 3 comments
Closed

Unbound memory consumption on stats node #302

djcrumple opened this issue Nov 28, 2016 · 3 comments

Comments

@djcrumple
Copy link

djcrumple commented Nov 28, 2016

OS: Windows
Version: 3.6.5

I have a 3 node cluster, and always run out of memory on the stats node if I enabled stats collection. I've tried reducing the retention policies as far as possible, but stats still uses all the memory on the node. During testing, the non-stats nodes use about 1 GB of memory. The stats node uses more than 6 GB of memory before I kill the test.

Here's the most recent settings I've tried. If the only retention policy is 60 seconds long, should I expect the stats node to level off memory usage after 60 seconds?

{collect_statistics, coarse}   <-- Even though I set this to coarse, it always changes to fine. 
{collect_statistics_interval, 5000}
{rates_mode, basic}
{sample_retention_policies,
    [{global,   [{60, 5}]},
     {basic,    []},
     {detailed, []}]}

Here is the status of the node 15 minutes after starting the test. During the test, I create a fixed number of queues and consumers. Memory usage continues to increase long after all queues/consumers have been created. Producers are using direct reply-to queues. Does that impact stats?

Status of node rabbit@RMQTest01 ...

[{pid,4008},
 {running_applications,
     [{rabbitmq_tracing,"RabbitMQ message logging / tracing","3.6.5"},
      {rabbitmq_management,"RabbitMQ Management Console","3.6.5"},
      {rabbitmq_management_agent,"RabbitMQ Management Agent","3.6.5"},
      {rabbit,"RabbitMQ","3.6.5"},
      {mnesia,"MNESIA  CXC 138 12","4.14.1"},
      {amqp_client,"RabbitMQ AMQP Client","3.6.5"},
      {rabbit_common,[],"3.6.5"},
      {rabbitmq_web_dispatch,"RabbitMQ Web Dispatcher","3.6.5"},
      {webmachine,"webmachine","1.10.3"},
      {mochiweb,"MochiMedia Web Server","2.13.1"},
      {xmerl,"XML parser","1.3.12"},
      {os_mon,"CPO  CXC 138 46","2.4.1"},
      {compiler,"ERTS  CXC 138 10","7.0.2"},
      {ssl,"Erlang/OTP SSL application","8.0.2"},
      {public_key,"Public key infrastructure","1.2"},
      {crypto,"CRYPTO","3.7.1"},
      {ranch,"Socket acceptor pool for TCP protocols.","1.2.1"},
      {inets,"INETS  CXC 138 49","6.3.3"},
      {asn1,"The Erlang ASN1 compiler version 4.0.4","4.0.4"},
      {syntax_tools,"Syntax tools","2.1"},
      {sasl,"SASL  CXC 138 11","3.0.1"},
      {stdlib,"ERTS  CXC 138 10","3.1"},
      {kernel,"ERTS  CXC 138 10","5.1"}]},
 {os,{win32,nt}},
 {erlang_version,
     "Erlang/OTP 19 [erts-8.1] [64-bit] [smp:4:4] [async-threads:64]\n"},
 {memory,
     [{total,6391440880},
      {connection_readers,113192864},
      {connection_writers,64656520},
      {connection_channels,516457952},
      {connection_other,203202816},
      {queue_procs,237182648},
      {queue_slave_procs,0},
      {plugins,59473568},
      {other_proc,50449040},
      {mnesia,18830944},
      {mgmt_db,4317939224},
      {msg_index,3071120},
      {other_ets,373931992},
      {binary,341174960},
      {code,24923912},
      {atom,1033401},
      {other_system,65919919}]},
 {alarms,[]},
 {listeners,[{clustering,25672,"::"},{amqp,5672,"::"},{amqp,5672,"0.0.0.0"}]},
 {vm_memory_high_watermark,0.4},
 {vm_memory_limit,6870704128},
 {disk_free_limit,50000000},
 {disk_free,120430325760},
 {file_descriptors,
     [{total_limit,8092},
      {total_used,4042},
      {sockets_limit,7280},
      {sockets_used,4040}]},
 {processes,[{limit,1048576},{used,122000}]},
 {run_queue,1},
 {uptime,1300},
 {kernel,{net_ticktime,60}}]
@michaelklishin
Copy link
Member

Please post questions to rabbitmq-users or Stack Overflow. RabbitMQ uses GitHub issues for specific actionable items engineers can work on, not questions. Thank you.

@michaelklishin
Copy link
Member

michaelklishin commented Nov 28, 2016

This is the most commonly expressed/asked complained/question this year. You will find dozens of threads about this in rabbitmq-users archives. A few things worth pointing out:

  1. This will not be addressed because Make statistics collection and aggregation distributed across all cluster nodes #236 is nearly merged and will be in 3.6.7. For multi-node clusters, this is the only solution that can work in the medium term.
  2. collect_statistics hasn't been effective for several feature versions. The plugin will log a warning if it is used. If you want to disable rates, set rates_mode to none. This is documented.
  3. As discussed many times on rabbitmq-users and now mentioned in the docs, the thing that affects stats DB load most is the number of stats emitting entities (see the docs) and collection interval. You have it set to 5 seconds, which is useful in development environments but rarely needed when there is no human constantly using the UI. Set it to 30 or 60 seconds, it will reduce the load on the management plugin up to 6 or 12 times (compared to 5 seconds).
  4. It is safe to restart the stats DB since all of its contents is entirely transient. How to do this is documented and has been mentioned dozens if not hundreds times on rabbitmq-users.

@michaelklishin
Copy link
Member

michaelklishin commented Nov 28, 2016

Direct reply-to has no effect whatsoever on the stats DB. The fact that apps no longer declare any queues or exchanges or bindings doesn't mean that there are no more stats-emitting entities being created: connections and channels are the primary contributors and those can be leaked, for example. That's why rabbit.channel_max and features such as rabbitmq/rabbitmq-server#500 were introduced.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants