Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

Configurable toggle or retention of per-channel stats #216

Closed
gmr opened this issue Jun 6, 2016 · 12 comments
Closed

Configurable toggle or retention of per-channel stats #216

gmr opened this issue Jun 6, 2016 · 12 comments

Comments

@gmr
Copy link
Contributor

gmr commented Jun 6, 2016

In an attempt to figure out what the core issue is with bottlenecks for us in the management plugin rates I created a stress testing tool. I believe the issue I am looking into is related to #214.

The tool @ https://github.com/gmr/mgmt-stress opens up N connections, then every 100 MS, it opens one or more channels and publishes a random number of messages.

With the default settings of the app, if the management plugin rates mode is not enabled, it can easy handle the channel turnover and message velocity.

When rates mode is set to basic, the memory quickly outgrows the docker container. If I change the channel count to 1, the app instead uses a single channel per connection that is opened on startup instead of creating a new channel every 100ms, publishing it, and closing it.

In testing, I also noticed that even after the channels go away, the memory that was used for the channels that churn does not go away.

I was wondering if a quick solution for this would be to be able to completely bypass channel stats but keep all of the others? Another option might be to destroy stats when channels go away. I know in my environment, I care about consumer channel info and don't care about channel info for publishers. We have many more publishers than consumers, and as a generalization, our publishers are transient while consumers are long-lived with the same channel.

I'm sure there are other strategies that could be employed with regard to tuning channel stats in the management interface. I find the rate information very useful as a generalization and don't like the idea of turning off all rates data because of data we don't really need.

@michaelklishin
Copy link
Member

@gmr in other words, you find channel rate stats to be least useful? This can be done and maybe even disabled by default but we'd like to see what difference resolving #214 and #215 would make.

@gmr
Copy link
Contributor Author

gmr commented Jun 6, 2016

Sure, sorry for the wall of text... Yeah, least useful and I believe the cause of the issue.

@hairyhum
Copy link
Contributor

hairyhum commented Jun 6, 2016

Running with memory analysis showed that memory leak was caused by leak in ETS. Because old_stats_fine_index table wasn't cleared after channel closure it did overflow erlang VM binaries someway.
I'm not sure why, but it looks like ets can consume binaries memory while storing something like [{<0.28662.14>, {fine,{<0.28662.14>, {resource,<<"/">>,exchange,<<"amq.topic">>}}}}]

@gmr
Copy link
Contributor Author

gmr commented Jun 6, 2016

huh -- Do you think it'd leak on lists/strings? Might be an easy fix. Memory leak aside, it still seems like a reasonable thing to turn of channel stats. For brokers/clusters with lots of channel churn, I see value in being able to turn them off. I've been poking through the code to see how big of a change that would be.

@michaelklishin
Copy link
Member

@gmr I think the point is that we leak ETS rows. If that is fixed, channel churn should (in theory) no longer cause any issues.

We are not closing this issue and it makes some sense to me either way.

@gmr
Copy link
Contributor Author

gmr commented Jun 6, 2016

Yeah, I got that :)

@hairyhum
Copy link
Contributor

hairyhum commented Jun 6, 2016

@gmr It leaks to binaries memory of VM e.g. erlang:memory(binary).. I don't know why. ETS should have it's own memory, but ETS table memory was mush smaller than binaires memory taken.

@gmr
Copy link
Contributor Author

gmr commented Jun 6, 2016

Oh sorry, mistook what you meant there.

@michaelklishin michaelklishin changed the title Feature Request: Configurable toggle or retention of per-channel stats Configurable toggle or retention of per-channel stats Jun 6, 2016
@michaelklishin
Copy link
Member

@gmr we did some research on what it would take to provide more fine grained configuration for channel stats. Unfortunately, this would affect queue and exchange rates, too. We'll see if there's a reasonably easy way to untangle them. One easy to implement idea I'm toying with is emitting the rates stats on a different schedule, e.g. once every 30 seconds by default.

We've also identified a line that causes memory growth to spike. Still investigating.

@michaelklishin
Copy link
Member

Another option might be to destroy stats when channels go away

Yes, that's what is supposed to happen and we believe after a leak fixed in #214 there are no other rows to lean up. We are observing unexpected GC and memory allocation behaviour, though.

@michaelklishin
Copy link
Member

After more research I'm afraid we cannot easily disable just channel stats because what is emitted as channel stats is later broken down into channel, exchange, and queue stats. Untangling these will not be backwards compatible with 3.6.2 and we cannot break internal protocol every point release.

We have a few more ideas and there's still hope we can make channel stats collector more memory efficient. Will file separate issues as we have some confirmation that some of those options are viable, thanks for sharing this idea.

@michaelklishin
Copy link
Member

@gmr I believe we've discovered an elephant in this room: #221.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants