-
Notifications
You must be signed in to change notification settings - Fork 166
3.6.2 M4: stats DB RAM use grows over the course of a few hours #185
Comments
The processing of the stats causes a massive amount of reductions, mainly from the two functions called here: https://github.com/rabbitmq/rabbitmq-management/blob/master/src/rabbit_mgmt_event_collector_utils.erl#L276, both as recursive functions or using the previous (3.6.0) implementation as list comprehensions. The new event collectors in #41 where no longer set to high priority, thus the scheduling was taking place very often and caused the memory built-up. With the priority set to high again the memory keeps stable. I reduced too the buffer size on channel and queue stats before start dropping them, as the fact that now we have three collectors can potentially cause a much larger joint message queue. |
I have also the same problem, when I do Short connection pressure measurement |
Please post questions to rabbitmq-users or Stack Overflow. RabbitMQ uses GitHub issues for specific actionable items engineers can work on, not questions. Thank you. |
This issue has been fundamentally addressed in #236 (and shipped in 3.6.7). The guide on Memory Usage was significantly expanded since April 2016. Please upgrade to at least 3.6.15 and use the tools described in the guide to collect relevant data. |
Moved from rabbitmq/rabbitmq-server#761:
I have encountered an issue where a two node PCF RabbitMQ (3.6.1.904, Erlang 18.1) cluster in AWS under load is having its queues blocked due to high memory usage on the node hosting the stats. The screenshot bellow illustrates this.
*Configuration *
There are 2 servers which have all queues mirrored and the following modification to the collection on stats applied:
[
{rabbit, [ {collect_statistics, none},
{collect_statistics_interval, 60000}] },
{rabbitmq_management, [ {rates_mode, none}] }
].
I used PerfTest to place the cluster under load using the following: ./runjava.sh -Xms2G com.rabbitmq.examples.PerfTest -h amqp://user:password@192.168.0.10/vhost_name -r 1 -R 1000 -x 10 -y 500
The intention of the above was to weight in favor of the consumers to promote throughout of messages.
* Issue Description*
After about an hour of running under this load, the memory on the node hosting the stats db steadily increases to the point where it hits the 6Gb high memory watermark and starts to block the producers. This is an issue because whilst the node with the stats db is using over 6Gb of memory, the other node still has plenty of headroom and was only consuming 3Gb of memory.
The producers remain blocked until the statsdb processes (my assumption) the stats, and then you see it free the memory and producing continues as expected.
The text was updated successfully, but these errors were encountered: