Make statistics collection and aggregation distributed across all cluster nodes #236

michaelklishin · 2016-06-24T11:58:00Z

Follow-up to #41. Problem definition is the same and the solution there doesn't work for some workload because a single node collecting and aggregating all stats only can go so far.

So this issue is about making the collector distributed (stats are stored on every cluster node) for 3.6.x.

The text was updated successfully, but these errors were encountered:

sega-yarkin · 2016-06-24T15:24:08Z

Is it make sense to append some parameters to choose which statistics I want to collect? For example, if I need statistics for exchanges and queues, but not need it for channels and connections, it will save some system resources.

michaelklishin · 2016-06-24T15:51:46Z

Some have asked for this. This may be a good chance to make that possible. Currently exchange and queue stats are emitted by channels so you cannot disable one without the others.

michaelklishin · 2016-06-24T16:09:00Z

@sega-yarkin let's not turn this issue into a support case. Please take this to the mailing list. Thanks.

michaelklishin · 2016-06-30T14:30:05Z

We are considering if we should try to target 3.6.x with this. This is easily over half of our support load right now.

Introducing another breaking management plugin version in 3.6.x isn't cool at all, however.

noahhaon · 2016-06-30T15:13:01Z

@michaelklishin What would the breaking changes be?

michaelklishin · 2016-06-30T16:02:24Z

@noahhaon mixed clusters will have to all run the new plugin, same as with 3.6.2. No breaking HTTP API changes planned.

noahhaon · 2016-06-30T16:56:05Z

Do users often upgrade rabbitmq-server without upgrading the plugins? I assume they are all packaged together now? I suppose the issue would be for rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and appears to be causing support issues for Pivotal. We would certainly love to see this feature, and it seems like it would be worth getting into 3.6.x, despite some pain during a rolling upgrade.

michaelklishin · 2016-06-30T17:51:55Z

It's not about plugins being out of sync with the server but rather mixed
patch version clusters. But I definitely see your point.

On Thu, Jun 30, 2016 at 7:56 PM, noahhaon notifications@github.com wrote:

Do users often upgrade rabbitmq-server without upgrading the plugins? I
assume they are all packaged together now? I suppose the issue would be for
rolling upgrades, but as long as those failures are handled gracefully ...

As you mentioned, this is a real pain point for large RMQ clusters and
appears to be causing support issues for Pivotal. We would certainly love
to see this feature, and it seems like it would be worth getting into
3.6.x, despite some pain during a rolling upgrade.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#236 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAAEQmrrfYHC-QOi0ncODG6TBSm_zu1Aks5qQ_UlgaJpZM4I9tBc
.

MK

Staff Software Engineer, Pivotal/RabbitMQ

noahhaon · 2016-06-30T18:05:48Z

Gotcha - well, maybe an out-of-cycle release of the plugin which would run on 3.6.x clusters? Then at least you're not violating least-surprise and breaking versioning semantics by bundling it with a 3.6.x release.

Not sure which is worse from a maintenance perspective, but I'd imagine many users with large clusters (including us) would be quite happy to install the plugin separately if it included this feature.

michaelklishin · 2016-07-05T18:51:11Z

@noahhaon we are leaning towards shipping it in 3.6.5 or so. Most users would rather upgrade all nodes to 3.6.5 than continue fighting the issues with the existing collector.

michaelklishin · 2016-09-01T12:12:54Z

A couple of updates:

So far we intend to ship this in a 3.6.x release
We will switch to Cowboy at the same time (3.7.0 already uses Cowboy) to reduce the delta between branches. The only user facing change is HTTP API response code changing from 201 to 204 in some cases — virtually no client libraries or users should be affected.

…rabbitmq-management#236

This reconfigures mgmt plugin to work better as of rabbitmq/rabbitmq-management#236. We do the same in other HTTP API clients.

See rabbitmq/rabbitmq-management#236.

Preparing for rabbitmq/rabbitmq-management#236 to land.

Adapt to async event processing in rabbitmq/rabbitmq-management#236

michaelklishin · 2016-12-01T11:05:01Z

It has been merged and will be dogfooded in stable before cutting a milestone release for the community.

michaelklishin added bug effort-high usability enhancement labels Jun 24, 2016

michaelklishin added this to the 3.7.0 milestone Jun 24, 2016

michaelklishin assigned michaelklishin and dcorbacho Jun 24, 2016

michaelklishin changed the title ~~Make statistics collection and aggregation distributed~~ Make statistics collection and aggregation distributed across all cluster nodes Jun 24, 2016

michaelklishin mentioned this issue Jul 5, 2016

Parallelise statistics DB collector #41

Closed

michaelklishin mentioned this issue Jul 12, 2016

Use of atoms for index names leads to exhaustion #245

Closed

This was referenced Jul 19, 2016

'unknown' consumers in queues #254

Closed

Designated stats node(s) #194

Closed

michaelklishin mentioned this issue Aug 2, 2016

Numbers are lagging charts in queue stats display on overview page #268

Closed

michaelklishin modified the milestones: 3.6.x, 3.7.0 Sep 1, 2016

michaelklishin assigned essen and kjnilsson Sep 1, 2016

dcorbacho mentioned this issue Sep 1, 2016

Wait for stats to be published in new management plugin michaelklishin/rabbit-hole#84

Merged

essen mentioned this issue Sep 6, 2016

Backport switch to Cowboy from master rabbitmq/rabbitmq-web-dispatch#18

Merged

michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016

Adapt to the (more) asynchornous event processing nature in rabbitmq/…

b88a9ea

…rabbitmq-management#236

michaelklishin added a commit to rabbitmq/hop that referenced this issue Nov 24, 2016

Add a before_build script

0933b05

This reconfigures mgmt plugin to work better as of rabbitmq/rabbitmq-management#236. We do the same in other HTTP API clients.

michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016

Adapt to the upcoming management plugin

fcbe79c

See rabbitmq/rabbitmq-management#236.

michaelklishin added a commit to ruby-amqp/rabbitmq_http_api_client that referenced this issue Nov 24, 2016

Introduce a before build script

343d3b9

Preparing for rabbitmq/rabbitmq-management#236 to land.

acogoluegnes added a commit to rabbitmq/hop that referenced this issue Nov 25, 2016

Increase waiting time in tests

965f3ce

Adapt to async event processing in rabbitmq/rabbitmq-management#236

This was referenced Nov 28, 2016

Unbound memory consumption on stats node #302

Closed

JSON returned by /api/nodes contains a duplicate send_bytes key #305

Closed

dcorbacho mentioned this issue Dec 1, 2016

Use agent supervisor to restart DB michaelklishin/rabbit-hole#86

Merged

michaelklishin closed this as completed Dec 1, 2016

This was referenced Dec 2, 2016

channels unknown after restarting statistics database #306

Closed

Node health check API return 500 if health check fails. #307

Closed

michaelklishin added a commit to ruby-amqp/bunny that referenced this issue Dec 6, 2016

Adapt to new management plugin (rabbitmq/rabbitmq-management#236)

e395169

michaelklishin mentioned this issue Dec 21, 2016

[rfc] support message_stats -> * -> samples michaelklishin/rabbit-hole#88

Closed

michaelklishin mentioned this issue Jan 13, 2017

Remove dependency on mochiweb in master rabbitmq/rabbitmq-auth-backend-http#19

Closed

michaelklishin mentioned this issue Jan 22, 2017

RabbitMQ fail with HTTP 500: Internal Server Error #332

Closed

michaelklishin mentioned this issue Feb 1, 2017

Convenience method for fetching statistics DB node ruby-amqp/rabbitmq_http_api_client#2

Closed

michaelklishin mentioned this issue Mar 7, 2017

External port crash in rabbit_mgmt_external_stats:init/1 can affect other processes rabbitmq/rabbitmq-management-agent#12

Closed

This was referenced Mar 14, 2017

statistics_db_node inaccurate after failing to cluster #359

Closed

Convenience method for fetching statistics DB node michaelklishin/rabbit-hole#53

Closed

mattbennett mentioned this issue Sep 7, 2017

Fix flakey tests nameko/nameko#468

Merged

michaelklishin referenced this issue Sep 25, 2017

Re-introduce rabbit_mgmt_sup_sup for an easy restart

503a194

michaelklishin mentioned this issue Oct 12, 2017

connection process started with cowboy_protocol:start_link/4 at <0.18023.990> exit with reason rabbitmq/rabbitmq-java-client#315

Closed

michaelklishin mentioned this issue Mar 27, 2018

3.6.2 M4: stats DB RAM use grows over the course of a few hours #185

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make statistics collection and aggregation distributed across all cluster nodes #236

Make statistics collection and aggregation distributed across all cluster nodes #236

michaelklishin commented Jun 24, 2016 •

edited

Loading

sega-yarkin commented Jun 24, 2016

michaelklishin commented Jun 24, 2016

michaelklishin commented Jun 24, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jul 5, 2016

michaelklishin commented Sep 1, 2016

michaelklishin commented Dec 1, 2016

Make statistics collection and aggregation distributed across all cluster nodes #236

Make statistics collection and aggregation distributed across all cluster nodes #236

Comments

michaelklishin commented Jun 24, 2016 • edited Loading

sega-yarkin commented Jun 24, 2016

michaelklishin commented Jun 24, 2016

michaelklishin commented Jun 24, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jun 30, 2016

noahhaon commented Jun 30, 2016

michaelklishin commented Jul 5, 2016

michaelklishin commented Sep 1, 2016

michaelklishin commented Dec 1, 2016

michaelklishin commented Jun 24, 2016 •

edited

Loading