-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Histogram" statistics aggregator plugin #1662
Comments
hi @pauldix: Check my implementation as middleware (filters) , I did test with different kind of output plugin and it works fine since it pass metrics as it came from the input plugin it self. I get your idea about routing which implemented in Fluentd where user can match metrics based on tags and route it to different outputs but I think this is a different problem to tackle maybe in the future. Here is a sample (working example)
where is I think changing all the output plugins will require a lot of effort. I hope some time we will reach a sophisticated routing mechanism like the one implemented in fluentd. besides aggregation should happen in one place and it waste of space and cpu usage to implement and run it in multiple places. The big question, Why I haven't used TOML ? TOML is pretty limited when it comes to nested configuration. Data flows Input output2 Sample output:
Other filters may follow after we merge the pull request like: |
A few of my thoughts on this:
This looks fairly similar to what @pauldix proposed, but differs in that it is a general type that can aggregate metrics from all inputs and send them on to all outputs. [[histograms]]
## measurements to calculate histogram data for
## only one of measurement_include & measurement_exclude should be defined
measurement_include = ["*"]
# measurement_exclude = []
## fields to calculate histogram data for
## only one of field_include & field_exclude should be defined
field_include = ["*"]
# field_exclude = []
## If true, drop the original metric field(s), only sending the aggregates
## to output plugins.
drop_original = false
## Histogram functions to calculate for each field
functions = ["min", "max", "first", "last", "sum", "count", "stddev"]
## quantiles to collect for each metric field.
quantiles = [0.50, 0.95, 0.99]
[[histograms.tagdrop]]
cpu = ["cpu0", "cpu1", "cpu2"] |
Still one open question: How would we support metric "periods"? Do we want to support this within telegraf? A few problems that can arise:
My preference is not to support periods, and instead only calculate running cumulative metrics. This is the way that statsd does it, for example, and I think it's fair to leave the calculation of metric periods up to queryable datastores (influxdb, prometheus, etc.) |
We will also need to consider how we are going to handle metrics that are already counters. For example, in the For this feature, we might need to first begin adding statsd-style types to metrics (ie, counters, gauges, etc). This won't be too hard for system metrics, but will be challenging for other plugins. From the outset we would likely need to simply assume that all metrics are gauges unless specified otherwise by the input plugin. |
@sparrc don't worry about memory consummation since I'm using streaming algorithm which always has fixed size memory usage based on the number of buckets https://www.vividcortex.com/blog/2013/07/08/streaming-approximate-histograms/. I can only support one period (every 1 minute for example ), multi period is tricky. I'm assuming that all the metric are gauges from source but adding support for other metric type can be done also. One of the problem I had with toml that Histogram should only have one instance. I'm not sure how to represent this
TOML version (Not working)
|
I believe the correct form would be: [[filter.histogram]]
bucketsize = 20
flush_interval = "30s"
[[filter.histogram.rollup]]
name="interface_rollup"
tag="interface en*"
functions=["mean", 0.90]
pass= true
[[filter.histogram.rollup]]
name="bal2"
tag="tag2"
functions=["mean", 0.90, "sum"]
pass= true |
@sparrc I will change the config to TOML during the weekend, could you please review the code and let me know if there is extra stuff to be done. |
@pauldix do you agree on the above ? |
@alimousazy I would prefer if your implementation looked like the one that I wrote earlier: [[filters.histogram]]
## measurements to calculate histogram data for
## only one of measurement_include & measurement_exclude should be defined
measurement_include = ["*"]
# measurement_exclude = []
## fields to calculate histogram data for
## only one of field_include & field_exclude should be defined
field_include = ["*"]
# field_exclude = []
## If true, drop the original metric field(s), only sending the aggregates
## to output plugins.
drop_original = false
## Histogram functions to calculate for each field
functions = ["min", "max", "first", "last", "sum", "count", "stddev"]
## quantiles to collect for each metric field.
quantiles = [0.50, 0.95, 0.99]
[[histogram.tagdrop]]
cpu = ["cpu0", "cpu1", "cpu2"]
The reason I say this is because you can define multiple note that it should also support |
You can also add support for a And the last thing is that we still haven't answered when the timestamp should be set to? the end of the period? the middle? I'd like to see what other similar products do for this. |
@sparrc Given that this is going to require a fairly detailed config, could we not make that configurable? Per http://stackoverflow.com/questions/23847332/rrdtool-outputs-wrong-numbers that is what rrdtool does. (-t argument). Manually inspecting some of our ganglia .rrd files (i.e rrdtool dump /path/hostname/load_one.rrd) shows that ganglia is snapping to the "end" timestamp of the period for downsampled data, so (to us) that seems like a sensible default. |
sure, config option would work, I agree that |
@sparrc just to answer your question:- |
I would say the main performance hits are: (1) you need to keep a histogram object per-rollup and (2) you need to check every metric per-rollup. Either way, you need to do both of these things, so for me it's OK to define multiple histograms. The number of channels that the metric passes through shouldn't have a very large impact as they are quite lightweight data structures. Timers also do not need to block metrics. The metrics should pass through uninhibited, just matched and their value added.
I'm not sure I understand your reasoning here.....timers are not expensive, so it should be fine to have it based on seconds.
I don't think that "flush" interval should be applied to histograms, it's more like a histogram "period" |
see here for discussion of filter plugins (a pre-requirement for supporting histograms): #1726 |
Would this support blobbing and derivatives? I've got a looking glass project on my plate where I'll be querying roughly 1600-1800 devices via SNMP, pulling the octect counters and graphing a time-based derivative to show interface bandwidth for each device. However, a request also came in asking to retain a daily average of said bandwidth for two years to glean trend lines (again, on each separate device) :( . Could this potentially cover that requirement? |
@taishan69 You could have InfluxDB or Kapacitor run a continuous query: https://docs.influxdata.com/influxdb/v1.2/guides/downsampling_and_retention/ |
I don't think I can use Kapacitor with Elasticsearch as a backend, or did I misread the documentation? |
Feature Request
I'm attempting to bring discussion about #380 and #1364 into one place so we can talk about the design and implementation to move things forward. This has been requested pretty frequently so I think it's worth looking into.
Proposal
Give users the ability to pre-aggregate data in Telegraf. The most common use case for this is to calculate aggregates or downsamples that would get stored in other retention policies. I assume that users will want to aggregate everything from an input plugin or measurement or all measurements matching a pattern.
This might make sense to be implemented as a new type called
middleware
rather than as aninput
oroutput
plugin. However, it would need to be able to map between input and output plugins if we wanted to do something like that. Having that mapping would be tricky because we'd need a method for identifying each input and output, which currently doesn't exist.Alternately, it could just be implemented as part of the
influxdb
output plugin. This would probably keep things simpler in the short term. Doing it as middleware could be tricky because each output plugin has different options and you may want to set different options for different aggregations.So we'll go with the updated InfluxDB output plugin for our example.
Desired behavior:
If we implement it as something that could be added as an InfluxDB output plugin, you might have the following.
Use case
Gives users the ability to calculate downsamples and aggregates in a decentralized way, amortizing the cost of computing the aggregates across their Telegraf infrastructure as opposed to only in the database.
The text was updated successfully, but these errors were encountered: