Configurable toggle or retention of per-channel stats #216

gmr · 2016-06-06T03:01:45Z

In an attempt to figure out what the core issue is with bottlenecks for us in the management plugin rates I created a stress testing tool. I believe the issue I am looking into is related to #214.

The tool @ https://github.com/gmr/mgmt-stress opens up N connections, then every 100 MS, it opens one or more channels and publishes a random number of messages.

With the default settings of the app, if the management plugin rates mode is not enabled, it can easy handle the channel turnover and message velocity.

When rates mode is set to basic, the memory quickly outgrows the docker container. If I change the channel count to 1, the app instead uses a single channel per connection that is opened on startup instead of creating a new channel every 100ms, publishing it, and closing it.

In testing, I also noticed that even after the channels go away, the memory that was used for the channels that churn does not go away.

I was wondering if a quick solution for this would be to be able to completely bypass channel stats but keep all of the others? Another option might be to destroy stats when channels go away. I know in my environment, I care about consumer channel info and don't care about channel info for publishers. We have many more publishers than consumers, and as a generalization, our publishers are transient while consumers are long-lived with the same channel.

I'm sure there are other strategies that could be employed with regard to tuning channel stats in the management interface. I find the rate information very useful as a generalization and don't like the idea of turning off all rates data because of data we don't really need.

michaelklishin · 2016-06-06T08:36:01Z

@gmr in other words, you find channel rate stats to be least useful? This can be done and maybe even disabled by default but we'd like to see what difference resolving #214 and #215 would make.

gmr · 2016-06-06T13:26:24Z

Sure, sorry for the wall of text... Yeah, least useful and I believe the cause of the issue.

hairyhum · 2016-06-06T14:42:36Z

Running with memory analysis showed that memory leak was caused by leak in ETS. Because old_stats_fine_index table wasn't cleared after channel closure it did overflow erlang VM binaries someway.
I'm not sure why, but it looks like ets can consume binaries memory while storing something like [{<0.28662.14>, {fine,{<0.28662.14>, {resource,<<"/">>,exchange,<<"amq.topic">>}}}}]

gmr · 2016-06-06T14:49:34Z

huh -- Do you think it'd leak on lists/strings? Might be an easy fix. Memory leak aside, it still seems like a reasonable thing to turn of channel stats. For brokers/clusters with lots of channel churn, I see value in being able to turn them off. I've been poking through the code to see how big of a change that would be.

michaelklishin · 2016-06-06T14:51:38Z

@gmr I think the point is that we leak ETS rows. If that is fixed, channel churn should (in theory) no longer cause any issues.

We are not closing this issue and it makes some sense to me either way.

gmr · 2016-06-06T14:52:40Z

Yeah, I got that :)

hairyhum · 2016-06-06T14:55:42Z

@gmr It leaks to binaries memory of VM e.g. erlang:memory(binary).. I don't know why. ETS should have it's own memory, but ETS table memory was mush smaller than binaires memory taken.

gmr · 2016-06-06T15:00:19Z

Oh sorry, mistook what you meant there.

michaelklishin · 2016-06-07T14:36:26Z

@gmr we did some research on what it would take to provide more fine grained configuration for channel stats. Unfortunately, this would affect queue and exchange rates, too. We'll see if there's a reasonably easy way to untangle them. One easy to implement idea I'm toying with is emitting the rates stats on a different schedule, e.g. once every 30 seconds by default.

We've also identified a line that causes memory growth to spike. Still investigating.

michaelklishin · 2016-06-07T14:40:31Z

Another option might be to destroy stats when channels go away

Yes, that's what is supposed to happen and we believe after a leak fixed in #214 there are no other rows to lean up. We are observing unexpected GC and memory allocation behaviour, though.

michaelklishin · 2016-06-07T19:42:21Z

After more research I'm afraid we cannot easily disable just channel stats because what is emitted as channel stats is later broken down into channel, exchange, and queue stats. Untangling these will not be backwards compatible with 3.6.2 and we cannot break internal protocol every point release.

We have a few more ideas and there's still hope we can make channel stats collector more memory efficient. Will file separate issues as we have some confirmation that some of those options are viable, thanks for sharing this idea.

michaelklishin · 2016-06-09T13:37:24Z

@gmr I believe we've discovered an elephant in this room: #221.

hairyhum mentioned this issue Jun 6, 2016

Clean up old_stats_fine_index #217

Merged

michaelklishin changed the title ~~Feature Request: Configurable toggle or retention of per-channel stats~~ Configurable toggle or retention of per-channel stats Jun 6, 2016

michaelklishin added effort-low usability enhancement no-decision labels Jun 6, 2016

michaelklishin assigned michaelklishin and hairyhum Jun 6, 2016

michaelklishin mentioned this issue Jun 6, 2016

Parallelise statistics DB collector #41

Closed

michaelklishin added wontfix and removed no-decision labels Jun 7, 2016

michaelklishin closed this as completed Jun 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable toggle or retention of per-channel stats #216

Configurable toggle or retention of per-channel stats #216

gmr commented Jun 6, 2016

michaelklishin commented Jun 6, 2016

gmr commented Jun 6, 2016

hairyhum commented Jun 6, 2016

gmr commented Jun 6, 2016

michaelklishin commented Jun 6, 2016

gmr commented Jun 6, 2016

hairyhum commented Jun 6, 2016

gmr commented Jun 6, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 9, 2016

Configurable toggle or retention of per-channel stats #216

Configurable toggle or retention of per-channel stats #216

Comments

gmr commented Jun 6, 2016

michaelklishin commented Jun 6, 2016

gmr commented Jun 6, 2016

hairyhum commented Jun 6, 2016

gmr commented Jun 6, 2016

michaelklishin commented Jun 6, 2016

gmr commented Jun 6, 2016

hairyhum commented Jun 6, 2016

gmr commented Jun 6, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 7, 2016

michaelklishin commented Jun 9, 2016