Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

Make statistics collection and aggregation distributed across all cluster nodes #236

Closed
michaelklishin opened this issue Jun 24, 2016 · 13 comments

Comments

@michaelklishin
Copy link
Member

michaelklishin commented Jun 24, 2016

Follow-up to #41. Problem definition is the same and the solution there doesn't work for some workload because a single node collecting and aggregating all stats only can go so far.

So this issue is about making the collector distributed (stats are stored on every cluster node) for 3.6.x.

@michaelklishin michaelklishin added this to the 3.7.0 milestone Jun 24, 2016
@michaelklishin michaelklishin changed the title Make statistics collection and aggregation distributed Make statistics collection and aggregation distributed across all cluster nodes Jun 24, 2016
@sega-yarkin
Copy link

Is it make sense to append some parameters to choose which statistics I want to collect? For example, if I need statistics for exchanges and queues, but not need it for channels and connections, it will save some system resources.

@michaelklishin
Copy link
Member Author

Some have asked for this. This may be a good chance to make that possible. Currently exchange and queue stats are emitted by channels so you cannot disable one without the others.

@michaelklishin
Copy link
Member Author

@sega-yarkin let's not turn this issue into a support case. Please take this to the mailing list. Thanks.

@michaelklishin
Copy link
Member Author

We are considering if we should try to target 3.6.x with this. This is easily over half of our support load right now.

Introducing another breaking management plugin version in 3.6.x isn't cool at all, however.

@noahhaon
Copy link

@michaelklishin What would the breaking changes be?

@michaelklishin
Copy link
Member Author

@noahhaon mixed clusters will have to all run the new plugin, same as with 3.6.2. No breaking HTTP API changes planned.

@noahhaon
Copy link

Do users often upgrade rabbitmq-server without upgrading the plugins? I assume they are all packaged together now? I suppose the issue would be for rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and appears to be causing support issues for Pivotal. We would certainly love to see this feature, and it seems like it would be worth getting into 3.6.x, despite some pain during a rolling upgrade.

@michaelklishin
Copy link
Member Author

It's not about plugins being out of sync with the server but rather mixed
patch version clusters. But I definitely see your point.

On Thu, Jun 30, 2016 at 7:56 PM, noahhaon notifications@github.com wrote:

Do users often upgrade rabbitmq-server without upgrading the plugins? I
assume they are all packaged together now? I suppose the issue would be for
rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and
appears to be causing support issues for Pivotal. We would certainly love
to see this feature, and it seems like it would be worth getting into
3.6.x, despite some pain during a rolling upgrade.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#236 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAEQmrrfYHC-QOi0ncODG6TBSm_zu1Aks5qQ_UlgaJpZM4I9tBc
.

MK

Staff Software Engineer, Pivotal/RabbitMQ

@noahhaon
Copy link

Gotcha - well, maybe an out-of-cycle release of the plugin which would run on 3.6.x clusters? Then at least you're not violating least-surprise and breaking versioning semantics by bundling it with a 3.6.x release.

Not sure which is worse from a maintenance perspective, but I'd imagine many users with large clusters (including us) would be quite happy to install the plugin separately if it included this feature.

@michaelklishin
Copy link
Member Author

@noahhaon we are leaning towards shipping it in 3.6.5 or so. Most users would rather upgrade all nodes to 3.6.5 than continue fighting the issues with the existing collector.

@michaelklishin
Copy link
Member Author

A couple of updates:

  • So far we intend to ship this in a 3.6.x release
  • We will switch to Cowboy at the same time (3.7.0 already uses Cowboy) to reduce the delta between branches. The only user facing change is HTTP API response code changing from 201 to 204 in some cases — virtually no client libraries or users should be affected.

michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016
This reconfigures mgmt plugin to work better as of rabbitmq/rabbitmq-management#236.
We do the same in other HTTP API clients.
michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016
michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016
acogoluegnes added a commit to rabbitmq/hop that referenced this issue Nov 25, 2016
@michaelklishin
Copy link
Member Author

It has been merged and will be dogfooded in stable before cutting a milestone release for the community.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants
@michaelklishin @essen @noahhaon @dcorbacho @kjnilsson @sega-yarkin and others