[WIP] Etcd integration for configuration #651

titilambert · 2016-02-05T05:57:48Z

Rebased PR #465

Hello !
I just started an example of what could be the etcd integration with telegraf

Here an example:
1 . Make a myconf.conf file, which will be stored in etcd, with the following content :

[tags]
  dc = "us-east-1"

[agent]
  interval = "10s"
  round_interval = true
  flush_interval = "10s"
  flush_jitter = "0s"
  debug = false
  hostname = ""


[[outputs.influxdb]]
  urls = ["http://localhost:8086"]
  database = "telegraf"
  precision = "s"


[[inputs.cpu]]
  percpu = true
  totalcpu = true
  drop = ["cpu_time*"]

2 . Send this file to etcd using the label mylabel

./telegraf  -etcd http://127.0.0.1:2379 -etcdwritelabel mylabel -etcdwriteconfig myconf.conf

3 . You can check if data is really written in etcd with

./etcdctl get /telegraf/labels/mylabel

3 . Now any telegraf agent can load this config use the label mylabel

./telegraf  -etcd http://127.0.0.1:2379 -etcdreadlabels mylabel
Config read with label mylabel
2015/12/27 20:58:34 Database creation failed: Get http://localhost:8086/query?db=&q=CREATE+DATABASE+IF+NOT+EXISTS+telegraf: dial tcp 127.0.0.1:8086: getsockopt: connection refused
2015/12/27 20:58:34 Starting Telegraf (version v0.2.4-16-ga0bb7db)
2015/12/27 20:58:34 Loaded outputs: influxdb
2015/12/27 20:58:34 Loaded plugins: cpu
2015/12/27 20:58:34 Tags enabled: dc=us-east-1 host=osselait
2015/12/27 20:58:34 Agent Config: Interval:{10s}, Debug:false, Hostname:"osselait", Flush Interval:{10s}

Notes:

DO NOT forget to change your etcd server URL
Tested with etcd 2.2.2

Agent config reading order:

/telegraf/main key
/telegraf/hosts/HOSTNAME key
/telegraf/labels/LABEL1 key
4./telegraf/labels/LABEL2 key

Features:

Main config file can be loaded in etcd
Each agent try automatically to find its own key in etcd (/telegraf/hosts/HOSTNAME)
Labels can be configured in config files, so labels could be from etcd
Etcd config watcher, that reload telegraf when a change is detected in etcd.
You can write all configuration of ALL your telegraf agent in a folder then send it to etcd.

Missing:

~~Add an option to select the root folder name in etcd (default "/telegraf")~~ DONE
~~Handle multiple etcd servers~~ DONE
~~Handle update/set/delete in etcd~~ DONE
~~Documentation~~ DONE

titilambert · 2016-02-05T06:16:47Z

New features:

Agent specific config

Each agent start reading the value of key /telegraf/hosts/HOSTNAME.conf in etcd to get its default config. So you can now set your labels in this file and just start your agent like this:

./telegraf -etcd http://127.0.0.1:2379

Configuration folder

You can now set your configuration in a folder like this:

testdata/
├── hosts
│   └── localhost.conf
├── labels
│   ├── influx.conf
│   ├── network2.conf
│   └── network.conf
└── main.conf

(An example is available here: https://github.com/titilambert/telegraf/tree/etcd/internal/etcd/testdata/test1)

Then you can send your configuration folder to etcd:

./telegraf -etcd http://127.0.0.1:2379  -etcdwriteconfigdir testdata/

Then you can start all your telegraf agents like this:

./telegraf  -etcd http://127.0.0.1:2379 -etcdreadlabels=influx,network

titilambert · 2016-02-13T06:21:55Z

@sparrc rebased ! (I don't give up ;) )

sparrc · 2016-02-13T17:49:01Z

great! Sorry I haven't had time to get a full review on this one, I've been slammed by some other features

sparrc · 2016-02-18T18:27:45Z

internal/config/config.go

@@ -19,6 +19,7 @@ import (
 	"github.com/influxdata/telegraf/plugins/serializers"

 	"github.com/influxdata/config"
+	"github.com/naoina/toml"


Can you remove the naoina/toml dependency here? I believe there is a function in influxdata/config that you can use to load and parse the toml file

titilambert · 2016-02-20T04:37:02Z

@sparrc changes done ! I add 2 tests to get more coverage.
And rebased :)

titilambert · 2016-02-20T05:07:55Z

@sparrc What you think about those features ?

BTW, it's still missing:

~~Add an option to select the root folder name in etcd (default "/telegraf")~~ DONE
~~Handle multiple etcd servers~~ DONE
~~Handle update/set/delete in etcd~~
~~Documentation~~ DONE

titilambert · 2016-02-28T01:50:58Z

Added option to select the root folder name in etcd (default "/telegraf")

titilambert · 2016-03-01T19:29:37Z

Rebased with the new toml lib

titilambert · 2016-03-03T01:10:59Z

Rebased !
@sparrc What about create a new command telegrafctl ?
Instead of:

./telegraf -etcd http://127.0.0.1:2379  -etcdwriteconfigdir testdata/

We will use

./telegrafctl -etcd http://127.0.0.1:2379  -etcdwriteconfigdir testdata/

sparrc · 2016-03-03T09:55:30Z

why do you want to do that? is there a precedent?

titilambert · 2016-03-03T15:21:44Z

It's just to separate daemon binary from utils binary. I don't know if it's a good idea, it's just to copy etcd/kubernetes/...

titilambert · 2016-03-16T20:06:51Z

@sparrc could you just confirm that this PR is in scope of Telegraf ? :)

sparrc · 2016-03-16T20:16:33Z

yes, it is :)

titilambert · 2016-03-16T20:16:47Z

@sparrc cool :)

titilambert · 2016-04-02T04:55:55Z

@sparrc Rebased !
I also add a parameter to be able to erase config in etcd
I think all it's here. You can begin to review it, I'm waiting for your feedbacks :)
Thanks !

balboah · 2016-04-06T08:42:59Z

Why not just use https://github.com/kelseyhightower/confd together with a kill HUP signal?

titilambert · 2016-04-06T12:08:46Z

@balboah This does add a single point of failure, doesn't it ?
And this is not really cool when you re using telegraf inside a container...

sparrc · 2016-04-06T21:42:27Z

@titilambert As I've looked through this PR more, and looked into etcd and configuration management options, I feel like this is going to get messy.

Yes, it's nice that we could have etcd directly baked into telegraf in some ways. But it's also complicated and it ignores all the other options that there are out there for achieving this (consul, redis, vault, etc)

As @balboah suggests, this seems more like a configuration management issue, so why not use configuration management tools to solve it? Why should telegraf become a configuration management tool on top of all the other things it does?

sparrc · 2016-04-06T21:47:24Z

ps: do you have any examples of a project similar to telegraf that integrates directly with etcd? Or are there any libraries that we could use to generically get conf files? (like a library version of confd?)

balboah · 2016-04-11T07:00:53Z

@titilambert not sure what you mean with single point of failure. In my case confd runs inside the container, monitors the etcd cluster for changes and updates the config file + sends the kill signal when there is some change.
As long as telegraf is good at handling that config file reloading (like re-connecting if influxdb hosts changes) all is good imo

titilambert · 2016-04-11T13:43:58Z

@balboah Single point of failure: What's happen if confd crashed ? We don't have any fallback for confd. I can not see how running confd inside a container can solve this issue. Docker can still restart condf but the single point of failure is now on Docker. The only single point of failure should be Telegraf.

@sparrc I'm agree with you, this PR limits Telegraf to Etcd. But, imo, confd seems a single point of failure.

do you have any examples of a project similar to telegraf that integrates directly with etcd?
Yes, Kubernetes, fleet, locksmith, vulcand, calico, flannel, ...

Or are there any libraries that we could use to generically get conf files? (like a library version of confd?)
Good question !!!

sparrc · 2016-04-11T16:30:10Z

@titilambert but what if your etcd server goes down? This could be a problem if etcd is integrated directly into telegraf. If you decouple the two services (telegraf and config management) then Telegraf is completely unaffected by any status or change in etcd.

titilambert · 2016-04-12T15:04:21Z

@sparrc etcd can run as a cluster (with at least 3 nodes) which means etcd isn't a single point of failure.
With confd I just can't see how we can eliminate this issue (because you need to run several confd daemons on the same machine)

balboah · 2016-04-13T11:45:15Z

sorry maybe I still don't understand, but I fail to see how integrating basically the same use case as confd into telegraf solves the availability issue differently? The code would be pretty much the same, the number of processes would be the same, how is the "single point of failure" different?
To clarify: confd talks with etcd nodes and writes a config file. Telegraf would do the same, or update configuration in its memory. Sure confd process could die because of bugs, but so could telegraf?

Also confd already supports toml templating that you're introducing, has plugins for different statements and supports more sources than etcd.

titilambert · 2016-04-13T13:31:10Z

@balboah With this PR Telegraf will never get config files. It loads conf directly from etcd in memory. This eliminate the write config step.
Of course, Telegraf can crash in both cases but you have one step less without confd.

@sparrc I think you're right ! Maybe we need to use https://github.com/spf13/viper ? We can use it to read config only from remote sources or for both remote and local files. What should be the best choice ?

myontop · 2016-04-16T10:25:15Z

when you plan to release a version with etcd integration?

johnrengelman · 2016-05-04T14:10:00Z

I'll chime in my 2 cents - I don't think telegraf should go down the road of supporting config backends directly. Once etcd is added, then there will be requests for consul and zookeeper, etc.
It becomes a maintenance nightmare.
A better approach is to have best practices for how to integrate with these services externally.
As for the SPOF argument, by using something like confd to write out a config file and have telegraf simply load that file, you are actually reducing the failure modes. In this scenario, if etcd or confd fails, then there is still a configuration file for telegraf to load which will allow it to startup up.

If integrated directly and etcd is down, then telegraf can't run because the configuration is coming from there.

It also protects telegraf from any changes in the APIs of these tools. You don't want to to have to release a new version of telegraf due to a compatibility issue with etcd.

titilambert · 2016-05-04T19:57:50Z

@johnrengelman what do you think about https://github.com/spf13/viper ?
I understand the use case:

In this scenario, if etcd or confd fails, then there is still a configuration file for telegraf to load which will allow it to startup up.

but I'm sure about using confd+telegraf in docker envs...
For example in Kubernetes environnement, this means adding a new container inside telegraf pod... this will multiplicate by 2 your number of containers, just for the configuration. And you prefer using ressources/containers for you own applications.

gunnaraasen · 2016-05-05T07:23:56Z

Not sure if this has been mentioned before. Docker's libkv is another potential option for supporting multiple distributed config stores.

sparrc · 2016-05-05T13:24:53Z

that library looks fantastic, thanks @gunnaraasen

titilambert · 2016-05-05T13:56:03Z

@sparrc what do you think ? I rewrite the PR with https://github.com/docker/libkv or https://github.com/spf13/viper ?

sparrc · 2016-05-10T20:43:28Z

closing this because I prefer to have this conversation in #272

titilambert force-pushed the etcd branch 2 times, most recently from 192aa91 to 9d7acb2 Compare February 5, 2016 06:03

sparrc mentioned this pull request Feb 10, 2016

Add configuration management service #674

Closed

titilambert force-pushed the etcd branch from 9d7acb2 to 382c401 Compare February 13, 2016 06:21

sparrc added the Needs Review label Feb 15, 2016

sparrc reviewed Feb 18, 2016
View reviewed changes

titilambert force-pushed the etcd branch 2 times, most recently from d32fab5 to 1d3f26d Compare February 20, 2016 04:36

titilambert force-pushed the etcd branch 2 times, most recently from 7849b11 to c05fba3 Compare February 20, 2016 04:58

titilambert force-pushed the etcd branch 2 times, most recently from e4f4f63 to 39818d6 Compare February 28, 2016 01:50

titilambert force-pushed the etcd branch from 39818d6 to e80365b Compare March 1, 2016 19:20

titilambert force-pushed the etcd branch from e80365b to afd0726 Compare March 3, 2016 01:09

This was referenced Mar 4, 2016

Config with etcd #193

Closed

Add integration for service discovery & kv config stores (dynamic config) #272

Open

titilambert force-pushed the etcd branch from afd0726 to 00755ad Compare March 7, 2016 19:56

titilambert mentioned this pull request Mar 17, 2016

Ability to use Telegraf as a library #867

Closed

titilambert force-pushed the etcd branch 2 times, most recently from fd743d8 to 47963a7 Compare April 2, 2016 04:30

[WIP] Etcd integration for configuration

c4d5fda

titilambert force-pushed the etcd branch from 47963a7 to c4d5fda Compare April 2, 2016 04:41

titilambert mentioned this pull request Apr 6, 2016

Dynamically set global tags #926

Closed

sparrc mentioned this pull request Apr 16, 2016

Feature request - remote configuration #1042

Closed

sparrc closed this May 10, 2016

3fr61n mentioned this pull request Aug 26, 2016

Refactor user input files to support more input format in the configuration files Juniper/open-nti#82

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Etcd integration for configuration #651

[WIP] Etcd integration for configuration #651

titilambert commented Feb 5, 2016

titilambert commented Feb 5, 2016

titilambert commented Feb 13, 2016

sparrc commented Feb 13, 2016

sparrc Feb 18, 2016

titilambert commented Feb 20, 2016

titilambert commented Feb 20, 2016

titilambert commented Feb 28, 2016

titilambert commented Mar 1, 2016

titilambert commented Mar 3, 2016

sparrc commented Mar 3, 2016

titilambert commented Mar 3, 2016

titilambert commented Mar 16, 2016

sparrc commented Mar 16, 2016

titilambert commented Mar 16, 2016

titilambert commented Apr 2, 2016

balboah commented Apr 6, 2016

titilambert commented Apr 6, 2016

sparrc commented Apr 6, 2016

sparrc commented Apr 6, 2016

balboah commented Apr 11, 2016

titilambert commented Apr 11, 2016

sparrc commented Apr 11, 2016

titilambert commented Apr 12, 2016

balboah commented Apr 13, 2016

titilambert commented Apr 13, 2016

myontop commented Apr 16, 2016

johnrengelman commented May 4, 2016

titilambert commented May 4, 2016 •

edited

Loading

gunnaraasen commented May 5, 2016

sparrc commented May 5, 2016

titilambert commented May 5, 2016

sparrc commented May 10, 2016

[WIP] Etcd integration for configuration #651

[WIP] Etcd integration for configuration #651

Conversation

titilambert commented Feb 5, 2016

titilambert commented Feb 5, 2016

Agent specific config

Configuration folder

titilambert commented Feb 13, 2016

sparrc commented Feb 13, 2016

sparrc Feb 18, 2016

Choose a reason for hiding this comment

titilambert commented Feb 20, 2016

titilambert commented Feb 20, 2016

titilambert commented Feb 28, 2016

titilambert commented Mar 1, 2016

titilambert commented Mar 3, 2016

sparrc commented Mar 3, 2016

titilambert commented Mar 3, 2016

titilambert commented Mar 16, 2016

sparrc commented Mar 16, 2016

titilambert commented Mar 16, 2016

titilambert commented Apr 2, 2016

balboah commented Apr 6, 2016

titilambert commented Apr 6, 2016

sparrc commented Apr 6, 2016

sparrc commented Apr 6, 2016

balboah commented Apr 11, 2016

titilambert commented Apr 11, 2016

sparrc commented Apr 11, 2016

titilambert commented Apr 12, 2016

balboah commented Apr 13, 2016

titilambert commented Apr 13, 2016

myontop commented Apr 16, 2016

johnrengelman commented May 4, 2016

titilambert commented May 4, 2016 • edited Loading

gunnaraasen commented May 5, 2016

sparrc commented May 5, 2016

titilambert commented May 5, 2016

sparrc commented May 10, 2016

titilambert commented May 4, 2016 •

edited

Loading