-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CloudWatch Metric Input Plugin #935
Conversation
|
||
func (c *CloudWatch) SampleConfig() string { | ||
return ` | ||
[[inputs.cloudwatch]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the [[inputs.cloudwatch]] line, it will be auto-added
overall this looks great, thank you @joshhardy |
@joshhardy how did you configure Telegraf to monitor CloudWatch ? Do you have only one Telegraf to pull data ? If yes, this is a singke point of failure, isn't it ? |
@titilambert - we are just getting started with this (thus, open to suggestions), but right now we are running a single container of Telegraf on our Docker cluster specifically for monitoring CloudWatch. Failures in the container or loss of the host will cause the container to be spun up elsewhere in the cluster within a few seconds. Given the |
@joshhardy, @sparrc: I wrote a Cloudwatch input plugin yesterday as well, and submitted a PR before I noticed that you had just submitted one. We should unify them. It looks like the main difference is that my plugin only requires the user to configure the AWS region and CloudWatch namespace; it uses ListMetrics to discover the metrics and dimensions. I think that is valuable, wdyt? |
seems to me that both approaches have their advantages. I like that @ljosa's converts all strings to snake_case, which makes the metrics fit into the influxdb style a bit better. In this PR I like that the plugin has the ability to select Perhaps there could simply be an option: |
so I would say, keep the following from @ljosa:
keep the following from @joshhardy:
what do you guys think of that as a unified plugin? |
@sparrc @ljosa - sounds good to me. IMO, the best way to handle it would be to make more of the config options here optional.
That way, users can be prescriptive on the aspects they want to scope and be wide open on the stuff they don't care about filtering down. The other thing #936 allows is multiple namespaces in the same input... To me, it makes sense to define multiple Also, what is the recommended way for @ljosa and I to collaborate on implementing these changes? (apologies if this is a stupid question - I'm a bit new to the process) |
One possibility for the two of you collaborating would be for one person to give the other access to their fork, and then you can both work off the same branch (and don't do any force pushes). Another would be one person works in master of the fork while the other rebases those changes into their branch. |
@sparrc @ljosa - please review updates. I also updated the recorded Measurements and Tags to use more standard snake casing and updated the format a bit... I'm pleased with the output now, thanks for the input! There were some complications with pulling very large numbers of metrics - exhausting user connection limit when fetching with unbridled parallelism, taking forever with no parallelism, handling ListMetrics results across multiple pages, etc... |
thanks @joshhardy, I'll take a look tomorrow |
this looks good, but can you remove the azer/snakecase dependency? I've put the SnakeCase function into telegraf/internal (https://github.com/influxdata/telegraf/blob/master/internal/internal.go#L128) so you don't need to add an external dependency. |
## Metrics to Pull (optional) | ||
## Defaults to all Metrics in Namespace if nothing is provided | ||
## Refreshes Namespace available metrics every 1h | ||
[[inputs.cloudwatch.metrics]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we comment these by default since they're optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updating.
@ljosa what do you think? |
|
||
``` | ||
$ ./telegraf -config telegraf.conf -input-filter cloudwatch -test | ||
> cloudwatch_aws_elb,load_balancer_name=p-motd,region=us-east-1,unit=seconds,latency_average=0.0012776487302826093,latency_maximum=0.07484889030456543,latency_minimum=0.0006072521209716797,latency_sample_count=13493,latency_sum=17.239314317703247 1459474260000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line should have a space somewhere? from this it looks like everything is a tag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - there is a space between unit=seconds
and latency_average
, but I somehow messed it up in the copy+paste to the README. Fixing.
@sparrc - I updated as per your feedback and removed the dependency on |
@joshhardy it looks like you may have merged your fork. I'd prefer if contributors didn't do this because it creates a messy PR and git commit history (you can see that you now have 34 commits included here). Could you re-apply your changes as a single commit rebased off of HEAD? |
I meant to look at this today but got tied up, and won't have a chance to look at it in detail until Monday. But based on @joshhardy's comment it sounds great, and I don't see a reason to wait to merge it. |
@joshhardy looks like there was a problem with CircleCI's output, here is the error message it's getting from the unit tests:
|
@sparrc - apologies, I missed that behavior change for the test and did that wonky fork merge... I will fix both of those with a single commit rebased from HEAD as you suggested ASAP. |
no rush, I'll be cutting a 0.12.0 release soon and this won't be making it in either way :) |
Rebased commit of previously reviewed branch. Added cloudwatch client Mock and more rich unit tests.
@sparrc - please take a look, I rebased into a single commit. I also added some CloudWatch mocks, so the UTs cover much more than before.
|
I'll kick a rebuild |
great, thanks @joshhardy! |
We want to use the TICK stack for monitoring and alerting across our system.
We are very happy with the breadth of inputs for telegraf, but found that we would be unable to monitor and alert on statistics from AWS Elastic Load Balancers - a critical components for us.
This plugin allows us to pull Metrics Statistics from CloudWatch for our ELBs.
Although we built this plugin to monitor ELBs, the plugin call pull any CloudWatch Metric.