Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

3.6.2 M4: stats DB RAM use grows over the course of a few hours #185

Closed
michaelklishin opened this issue Apr 20, 2016 · 5 comments
Closed
Assignees
Labels
Milestone

Comments

@michaelklishin
Copy link
Member

Moved from rabbitmq/rabbitmq-server#761:

I have encountered an issue where a two node PCF RabbitMQ (3.6.1.904, Erlang 18.1) cluster in AWS under load is having its queues blocked due to high memory usage on the node hosting the stats. The screenshot bellow illustrates this.

screen shot 2016-04-19 at 4 34 17 pm

*Configuration *

There are 2 servers which have all queues mirrored and the following modification to the collection on stats applied:

[
{rabbit, [ {collect_statistics, none},
{collect_statistics_interval, 60000}] },
{rabbitmq_management, [ {rates_mode, none}] }
].

I used PerfTest to place the cluster under load using the following: ./runjava.sh -Xms2G com.rabbitmq.examples.PerfTest -h amqp://user:password@192.168.0.10/vhost_name -r 1 -R 1000 -x 10 -y 500

The intention of the above was to weight in favor of the consumers to promote throughout of messages.

* Issue Description*

After about an hour of running under this load, the memory on the node hosting the stats db steadily increases to the point where it hits the 6Gb high memory watermark and starts to block the producers. This is an issue because whilst the node with the stats db is using over 6Gb of memory, the other node still has plenty of headroom and was only consuming 3Gb of memory.

The producers remain blocked until the statsdb processes (my assumption) the stats, and then you see it free the memory and producing continues as expected.

@dcorbacho
Copy link
Contributor

I can reproduce it from stable. The problem is not in the data store but the event collectors.

The message queues of the event collectors seem empty when queried using erlang:process_info, but the memory consumption of rabbit_mgmt_channel_stats_collector reaches 2.4GB on its own. Forcing the garbage collection does not solve it.

I crashed the node by requesting the process state with sys:get_state, as it seemed to reach the system in a moment with a large burst of events in the queue and it couldn't allocate enough memory. Investigation ongoing.
screen shot 2016-04-21 at 10 59 20
screen shot 2016-04-21 at 11 00 06
screen shot 2016-04-21 at 10 59 33

@dcorbacho
Copy link
Contributor

The processing of the stats causes a massive amount of reductions, mainly from the two functions called here: https://github.com/rabbitmq/rabbitmq-management/blob/master/src/rabbit_mgmt_event_collector_utils.erl#L276, both as recursive functions or using the previous (3.6.0) implementation as list comprehensions.

The new event collectors in #41 where no longer set to high priority, thus the scheduling was taking place very often and caused the memory built-up. With the priority set to high again the memory keeps stable.

I reduced too the buffer size on channel and queue stats before start dropping them, as the fact that now we have three collectors can potentially cause a much larger joint message queue.

@y123456yz
Copy link

y123456yz commented Sep 6, 2016

I have also the same problem, when I do Short connection pressure measurement
some nodes will exit

@michaelklishin
Copy link
Member Author

Please post questions to rabbitmq-users or Stack Overflow. RabbitMQ uses GitHub issues for specific actionable items engineers can work on, not questions. Thank you.

@rabbitmq rabbitmq locked and limited conversation to collaborators Sep 6, 2016
@michaelklishin
Copy link
Member Author

This issue has been fundamentally addressed in #236 (and shipped in 3.6.7). The guide on Memory Usage was significantly expanded since April 2016. Please upgrade to at least 3.6.15 and use the tools described in the guide to collect relevant data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants