Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker: optionally add labels as tags #2425

Merged

Conversation

ssorathia
Copy link
Contributor

@ssorathia ssorathia commented Feb 17, 2017

The default in the docker input is to add all labels as tags. This created a couple problems for us:

  1. Using OpenTSDB as the backend, only 8 tags are allowed. If you combine this with AWS ECS, you run out of tags very quickly.
  2. Not all of the labels are useful, but some are.

This addresses the ability to both disable label's being added as tags (default is to add them) and optionally only select specific labels to add.

Required for all PRs:

  • CHANGELOG.md updated (we recommend not updating this until the PR has been approved by a maintainer)
  • Sign CLA (if not already signed)
  • README.md updated (if adding a new plugin)

@phemmer
Copy link
Contributor

phemmer commented Feb 18, 2017

Why not use taginclude?

@ssorathia
Copy link
Contributor Author

So here was my thinking on this. In this case, the docker plugin will create different metrics for things like cpu, network, disk, etc, and all of these could have different tags. In order for me to use taginclude, and still use all of the 'default' tags, I would need to include all of those different tags in the taginclude line. In addition, if this plugin ever changed and added new metrics, with new tags, one would have to keep track of any new additions (which may not always be easily evident). This makes using taginclude have a higher 'admin' cost, if you will.

IMO, the labels that are added to the container, and consequently as tags, are really user defined, and one may wish to include or not include those separately from the default tags that the plugin would provide. This patch makes it so that we can still keep all the default tags that go with the metrics, but not include (or optionally include) the labels that we may add to containers (for example a cost allocation label) that are not useful in this context.

I should also mention this is why tagexclude doesn't work well, since labels can be somewhat arbitrarily named and added.

@sparrc
Copy link
Contributor

sparrc commented Feb 21, 2017

@ssorathia but the addlabels option does the exact same thing as taginclude, doesn't it?

@ssorathia
Copy link
Contributor Author

ssorathia commented Feb 21, 2017

@sparrc Yes and no. Yes it does, but it is limited to docker labels. Taginclude applies to all of the tags that come from the plugin, whereas addlabels only affects adding (or not adding) the labels that are associated with the docker container thereby leaving all of the 'default' tags associated with the plugin intact.

@sparrc
Copy link
Contributor

sparrc commented Feb 21, 2017

I would prefer not to add the addlabels option, I know it's slightly less convenient but I think you can make do with taginclude and/or tagexclude

if you can remove addlabels I'll review the PR.

@ssorathia
Copy link
Contributor Author

@sparrc I understand how you feel, though it is a shame because it has already bitten me once, and that's why I've gone this way. I'll just maintain my fork or look for a better way as taginclude and tagexclude for this particular purpose is challenging for me. Thanks for the consideration.

@ssorathia ssorathia closed this Feb 21, 2017
@ssorathia
Copy link
Contributor Author

@sparrc So I'm going to reopen this for a hopeful reconsideration. I just realized there is another Feature request that, if I were to use the taginclude and not pay attention, I would not get the benefit of that, IMO, useful feature: #2407

I realize it seems like it's just a convenience thing, but when the entire application, with all of it's plugins, are compiled and added together into a single binary, these types of convenience things can make a big difference in terms of adoption and upgrading.

I would ask you to reconsider adding this option.

@ssorathia ssorathia reopened this Feb 22, 2017
@sparrc
Copy link
Contributor

sparrc commented Feb 22, 2017

but when the entire application, with all of it's plugins, are compiled and added together into a single binary, these types of convenience things can make a big difference in terms of adoption and upgrading.

this is exactly why I'm hesitant to add it. If we added a feature for every minor inconvenience we'd have a feature bloat problem. I'd prefer if the configuration remained simple and only add necessary features.

@ssorathia
Copy link
Contributor Author

@sparrc one could view it that way, and another way to view is that by not choosing to incorporate some convenience features, it could sway people away from utilizing the product because it because even more difficult to maintain. I agree you don't want to add every feature like that because it would add bloat, but I don't see that being the case here.

First off, there are very few plugins that have the capability to allow what amounts to arbitrary tags to a metric. Most plugins have a defined set of tags that are going to be added to a metric, but in this case, because labels come from the container, and are arbitrary in nature, the user has the ability to add an arbitrary number of tags to the metrics this plugin provides.

This is further complicated by the fact that this isn't even documented anywhere. The only way I found it is because I'm using opentsdb as the backend which has a limit to the number of tags that can be added. But even without this limitation, one could add 20-30 tags to any metric and it would happily send it along, which could cause other issues down the line.

So let's look at tagexclude. I hope we can agree that tagexclude would not work here because one cannot know the entire universe of arbitrary labels that could be added to a docker container. That makes tagexclude an unviable option.

So your suggestion is taginclude. taginclude requires me to know what are all the tags that the default plugin would produce. One would think that you could go to the documentation to find this, however, the documentation is wrong. It actually doesn't mention some of the tags that are added to the metrics. That means the only way to know is to actually go through the code or do a significant amount of debugging (which is what I had to do to figure this out). In addition, this would have to be done for every new release to ensure that you are getting the new features. I feel that is a harsh thing to ask people to do.

To me, this is not a convenience thing, but rather a usability item. I go back to the fact that there are not very many plugins that have the capability to add an arbitrary number of tags like this plugin does. This is a unique situation, and therefore I feel that it requires a unique way to address. I don't think it makes sense that the only way users would be able to properly limit arbitrary tags, but allow the default tags, would be to have them read through the code and figure out what are all the tags that the plugin provides.

I understand the concept of restricting bloat, but in this case, one could say that by adding a little bit of bloat here, we could be significantly restricting the bloat on the backend, especially when the bloat here really boils down to a couple of extra if statements.

@sparrc
Copy link
Contributor

sparrc commented Feb 23, 2017

This is further complicated by the fact that this isn't even documented anywhere.

then why haven't you added/fixed the README in your PR?

I go back to the fact that there are not very many plugins that have the capability to add an arbitrary number of tags like this plugin does.

I understand, but in your case where are the tags coming from? why do you have 20-30 docker labels on your containers? who is adding the labels to the containers in the first place? And if you don't seem to have any use of those labels, why are you adding them?

Unless I'm misunderstanding something, what I'm hearing is "we add pointless labels to our docker containers, and now we want a special way for telegraf to exclude them". Do you have control over the docker labels that get added? And if not, why not?

@ssorathia
Copy link
Contributor Author

@sparrc

then why haven't you added/fixed the README in your PR?

Because I didn't go through it extensively to figure it all out. I just found the items that were there and realized what the issue was. But, I should have added it, but that only affects what I have done, what happens when the next person adds something and it doesn't get documented? The point still stands that documentation is not always updated appropriately, and in this case, it's clear that it's not, so it can't be completely relied upon. I wouldn't mind going back in and adding them though.

I understand, but in your case where are the tags coming from?

The tags are coming from the labels on the container. The code is currently written to include ALL labels as tags.

why do you have 20-30 docker labels on your containers?

I don't necessarily have 20-30, but we do have quite a few. However, a container could have 20-30 because there isn't a constraint to the number of labels someone can put on a container.

who is adding the labels to the containers in the first place?

Whoever is creating the containers can add a label. You can add them as you create a container through docker-compose or just a docker run command or in our case, through an ECS dockerrun file. Nothing prevents a user from adding a label.

And if you don't seem to have any use of those labels, why are you adding them?

They are absolutely useful, just not in the context of TSD data. For example, you could have a label for internal_account_code so that all containers get allocated to a particular internal team. You may have a label for documentation link, or a label for a pager duty escalation, or the unit that this container belongs to, there are tons of different labels that you could have that would have no real relevance to TSD data, but do have relevance outside of performance metrics.

Unless I'm misunderstanding something, what I'm hearing is "we add pointless labels to our docker containers, and now we want a special way for telegraf to exclude them". Do you have control over the docker labels that get added? And if not, why not?

This is an absolutely inaccurate statement. The labels are extremely valuable as I have stated above, however, they are simply not valid when it comes to TSD data. There is no interest in filtering TSD data by documentation links, or account_codes. Labels can be used to describe a million things that would have no value when it comes to performance data, so why should they be included? In a large organization, there are labels for everything, and while there is some control over that, there is also flexibility to allow people to add labels so that they can use them in other meaningful ways, as I have identified above.

@sparrc
Copy link
Contributor

sparrc commented Feb 23, 2017

OK, fair enough, we'll try to review the PR in time for release 1.3

@sparrc sparrc added this to the 1.3.0 milestone Feb 23, 2017
@ssorathia
Copy link
Contributor Author

Awesome, thanks for the consideration!

@ssorathia ssorathia force-pushed the docker-optionally-add-labels branch from 7d2d471 to 641d48f Compare February 25, 2017 18:43
Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having addlabels and label_names options, we should have options that work like taginclude/tagexclude. I suggest naming them docker_label_include and docker_label_exclude. They should support glob filters and generally behave like taginclude/tagexclude. The filter package can be used to make this easy: https://github.com/influxdata/telegraf/blob/master/filter/filter.go

@ssorathia
Copy link
Contributor Author

@danielnelson I can for sure change that, but was somewhat keeping to the way it was done for container_names. With that said, I have a couple questions:

  1. What should be the order of operations? Should includes be processed before excludes or the other way?

  2. I assume that to exclude all labels the configuration would look like this: docker_label_exclude = [ "*" ] any concerns with resource utilization if there are a lot of labels on a container?

@danielnelson
Copy link
Contributor

  1. If a label is both included and excluded, then it should be excluded. https://github.com/influxdata/telegraf/blob/master/internal/models/filter.go#L211
  2. That is the correct way to exclude all. I don't think performance will be a problem but if it ends up that way we can special case exclude all.

@ssorathia
Copy link
Contributor Author

@danielnelson I made the changes you requested. Keeping with how taginclude/exclude work, I made it so that if neither include nor exclude is given, then the label will be added. Let me know if that is not right.

@danielnelson
Copy link
Contributor

@ssorathia Thanks, I'll try to take a look at it later this week.

- docker_data
- unit=bytes
- engine_host
- docker_metadata
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one also has engine_host

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct. I'll add it.

@@ -134,6 +148,15 @@ func (d *Docker) Gather(acc telegraf.Accumulator) error {
d.client = c
}

// Create label filters
if len(d.LabelInclude) != 0 {
d.LabelFilter.labelInclude, _ = filter.Compile(d.LabelInclude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an error occurs here, return it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, not sure why I did that!

}

if len(d.LabelExclude) != 0 {
d.LabelFilter.labelExclude, _ = filter.Compile(d.LabelExclude)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an error occurs here, return it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -604,6 +631,7 @@ func init() {
return &Docker{
PerDevice: true,
Timeout: internal.Duration{Duration: time.Second * 5},
//AddLabels: true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad shakeel!

@@ -244,13 +244,91 @@ func testStats() *types.StatsJSON {
return stats
}

func TestDockerGatherLabels(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split this up into multiple tests, or into table driven tests. You can use func (a *Accumulator) TagValue(measurement string, key string) string to test specific tags.

Add some basic tests for globs, for if include and exclude are set to the same value, exclude with include being [].

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that makes sense. I'll do that.

labels and the default is true.
*/

d.LabelExclude = append(d.LabelExclude, "*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this and update the assertions in the test. Since this is an end to end test it should capture the full behavior, and we should have had labels here to begin with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. That makes sense, just didn't want to poke around in it if it was meant to stay the same. I'll update.

## Note that an empty array for both will include all labels as tags
docker_label_include = []
docker_label_exclude = []

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this extra blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

@@ -134,6 +148,15 @@ func (d *Docker) Gather(acc telegraf.Accumulator) error {
d.client = c
}

// Create label filters
if len(d.LabelInclude) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this compile on every gather when using the new filters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understand the question. But yes, every time Gather runs, it would compile this filter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only have it compile once, you could use a init function guarded with a bool.

@ssorathia
Copy link
Contributor Author

@danielnelson I changed up what you had asked for, with one exception. The tests, I created them as a table driven test, but I used func (a *Accumulator) HasTag(measurement string, key string) bool instead of the TagValue function.

I had to fetch upstream master for those functions because they weren't in my branch, and now I have a ton of extra commits because I'm not too smart sometimes! If it's a problem let me know and I'll close it and issue a new PR.

the tests are failing due to a race condition, but it looks to be in the socket_listener plugin (and since I pulled all that in, it's now failing.)

@danielnelson
Copy link
Contributor

@ssorathia Preferably you can rebase this pull request to only contain the new commits, but another PR would also be acceptable.

@ssorathia ssorathia force-pushed the docker-optionally-add-labels branch from d5b3819 to 369c8c4 Compare March 27, 2017 23:42
@ssorathia
Copy link
Contributor Author

@danielnelson Ok, it should be a clean history now. With that said, the test is still failing, but it looks like it's on the race testing for riemann

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very close now, sorry about the random test failures we have a couple race conditions in the integration tests.

@@ -134,6 +148,15 @@ func (d *Docker) Gather(acc telegraf.Accumulator) error {
d.client = c
}

// Create label filters
if len(d.LabelInclude) != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should only have it compile once, you could use a init function guarded with a bool.

var expectedErrors []string
for _, label := range tt.expected {
if !acc.HasTag("docker_container_cpu", label) {
expectedErrors = append(expectedErrors, fmt.Sprintf("%s ", label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here you should just do the assertions, instead of gathering up errors.

var notexpectedErrors []string
for _, label := range tt.notexpected {
if acc.HasTag("docker_container_cpu", label) {
notexpectedErrors = append(notexpectedErrors, fmt.Sprintf("%s ", label))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here as above.

@danielnelson
Copy link
Contributor

Don't forget to sign the CLA and also go ahead and add to the changelog.

Shakeel Sorathia added 9 commits April 3, 2017 13:07
- removed some commented out debug code
- resolve conflicts
…agexclude

- updated tests
- updated readme to reflect new parameter names
- created a bool such that we only compile the filter on the first gather
- no longer collecting errors but rather displaying them as we go along
- Changelog updated
@ssorathia ssorathia force-pushed the docker-optionally-add-labels branch from b662241 to f3ee8aa Compare April 3, 2017 20:08
@ssorathia
Copy link
Contributor Author

@danielnelson I've made the changes you asked for. For the compiling of the filter, I made it so that it will only compile on the first gather, and from there on out, it will skip the compilation.

I've signed the CLA, as well as updated the changelog. Let me know if you see anything further.

@danielnelson danielnelson merged commit 35e4390 into influxdata:master Apr 3, 2017
rsingh2411 added a commit to rsingh2411/telegraf that referenced this pull request Apr 3, 2017
calerogers pushed a commit to calerogers/telegraf that referenced this pull request Apr 5, 2017
vlamug pushed a commit to vlamug/telegraf that referenced this pull request May 30, 2017
maxunt pushed a commit that referenced this pull request Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants