-
Notifications
You must be signed in to change notification settings - Fork 166
Configurable toggle or retention of per-channel stats #216
Comments
Sure, sorry for the wall of text... Yeah, least useful and I believe the cause of the issue. |
Running with memory analysis showed that memory leak was caused by leak in ETS. Because |
huh -- Do you think it'd leak on lists/strings? Might be an easy fix. Memory leak aside, it still seems like a reasonable thing to turn of channel stats. For brokers/clusters with lots of channel churn, I see value in being able to turn them off. I've been poking through the code to see how big of a change that would be. |
@gmr I think the point is that we leak ETS rows. If that is fixed, channel churn should (in theory) no longer cause any issues. We are not closing this issue and it makes some sense to me either way. |
Yeah, I got that :) |
@gmr It leaks to binaries memory of VM e.g. |
Oh sorry, mistook what you meant there. |
@gmr we did some research on what it would take to provide more fine grained configuration for channel stats. Unfortunately, this would affect queue and exchange rates, too. We'll see if there's a reasonably easy way to untangle them. One easy to implement idea I'm toying with is emitting the rates stats on a different schedule, e.g. once every 30 seconds by default. We've also identified a line that causes memory growth to spike. Still investigating. |
Yes, that's what is supposed to happen and we believe after a leak fixed in #214 there are no other rows to lean up. We are observing unexpected GC and memory allocation behaviour, though. |
After more research I'm afraid we cannot easily disable just channel stats because what is emitted as channel stats is later broken down into channel, exchange, and queue stats. Untangling these will not be backwards compatible with We have a few more ideas and there's still hope we can make channel stats collector more memory efficient. Will file separate issues as we have some confirmation that some of those options are viable, thanks for sharing this idea. |
In an attempt to figure out what the core issue is with bottlenecks for us in the management plugin rates I created a stress testing tool. I believe the issue I am looking into is related to #214.
The tool @ https://github.com/gmr/mgmt-stress opens up N connections, then every 100 MS, it opens one or more channels and publishes a random number of messages.
With the default settings of the app, if the management plugin rates mode is not enabled, it can easy handle the channel turnover and message velocity.
When rates mode is set to basic, the memory quickly outgrows the docker container. If I change the channel count to 1, the app instead uses a single channel per connection that is opened on startup instead of creating a new channel every 100ms, publishing it, and closing it.
In testing, I also noticed that even after the channels go away, the memory that was used for the channels that churn does not go away.
I was wondering if a quick solution for this would be to be able to completely bypass channel stats but keep all of the others? Another option might be to destroy stats when channels go away. I know in my environment, I care about consumer channel info and don't care about channel info for publishers. We have many more publishers than consumers, and as a generalization, our publishers are transient while consumers are long-lived with the same channel.
I'm sure there are other strategies that could be employed with regard to tuning channel stats in the management interface. I find the rate information very useful as a generalization and don't like the idea of turning off all rates data because of data we don't really need.
The text was updated successfully, but these errors were encountered: