-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Tcp Forwarder Output plugin. #1526
Conversation
|
||
This plugin will send all metrics through TCP in the chosen format, this can be | ||
use by example with tcp listener input plugin | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please put a configuration example here
you should put a sync.Mutex on the plugin struct, lock it while gathering and closing so that they can't be called at the same time. |
@sparrc any more work needed on this PR ? |
I still need to do a final review and test it. There is currently a large backlog of PRs and this may take a while, so please be patient. |
sync.Mutex | ||
|
||
Server string | ||
Timeout string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the type of Timeout
to internal.Duration
, then you don't need to parse it, the toml parsing will deal with verifying that it's a valid timestamp.
in the config file it will still look like timeout = "5s"
Hi I'm testing this plugin searching for ways to "proxy" telegraf in firewalled servers (I'm testing it along other solutions like use of haproxy and the http service input plugin done in #1407) I'm using this config:
I placed recconect to false because it feels a bit of overhead according the help text When I setup a pair of tcp_input / tcp_output two things happen First of all I cannot start the "tcp_output_ed" if I don't start the "tcp_input_ed" telegraf ends without gather any metrics. Perhaps will we great that telegraf start collecting data and add it to the buffer even if there's no tcp endpoint listening on the other side. But I can workarround that and it is not my main concern My problem is that if I drop the "telegraf proxy" the proxied with the tcp_output starts to accumulate data in buffer as expected, but when the proxy with the tcp_input restarts again it seems that the conection is never retried and the proxied keeps logging
Is the connection ever retried? Sorry I tried to look at the code but golang seems Greek to me UPDATE: Tried to switch to
and dies |
For the first issues: when starting the plugin a connect function is called to make sure it can connect this is a default behavior of telegraf @sparrc Should we change that to the suggestion ? gathering metric before we can reach a endpoint, by not passing the first connection error ? @theist Can you share the whole configuration file for both side ? |
t.Lock() | ||
defer t.Unlock() | ||
|
||
var bp []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than making a slice of strings and then joining it, you should just create the string of metrics directly as you loop over the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea here was to leave a possibility to create/add a Join function in the serialiser interface (see line 143)
|
return err | ||
} | ||
// Prepare data | ||
t.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you locking this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that nothing will change the connection while trying to Write.
@tuier Hi sorry for the noise :) the behaviour I expect does not necessarily represent the views of the telegraf dev team, I'm only a sysadm doing some research for my tick monitoring stack and a tech blog post I placed my findings and the config for the two telegrafs in this gist The telegraf I'm using is a local telegraf compiled from master and with this PR merged, in a clean official golang docker container for linux 64. All tests are made on a 4.4.0-34-generic #53-Ubuntu SMP x86_64 GNU/Linux, on localhost launching the same telegraf on foreground with different config files on different terminals Hope it helps |
@tuier Can you explain the current behavior of the output plugin? What does it do if it can't connect? I generally agree with @theist that this plugin should be able to handle connection failures/restarts, whether that be in the middle of the run or at startup. I'm a little confused by why the plugin would need to "gather" metrics, since this is an output plugin. If the connection fails then it should return an error, and telegraf's accumulator will handle buffering the points until they can successfully be written. |
@theist your issues should be fixed now. @sparrc to be more precise, When telegraf start, it will call the Connect function from an output plugin. If the plugin return an error it will try a second time (after 15 second) and if it return an error again, telegraf will "fatal" and because of that stop. (https://github.com/influxdata/telegraf/blob/master/agent/agent.go#L61-L70 ,https://github.com/influxdata/telegraf/blob/master/cmd/telegraf/telegraf.go#L240-L243 ) Because of this two behavior, telegraf will stop if the TCP endpoint is not set before running this plugin. @sparrc would you prefer to ignore that error in the first Connect function ? in that case why would that function return a error, as the "fix" is to ignore that error. |
Is there any traction on this? |
Every change requested has been made, we are just waiting for more review or a green light ! |
@tuier can you clean up this PR? something must have gone wrong with a merge conflict, there shouldn't be 117 commits included in this PR I would like to try to have this merged for 1.2 |
This can be use to forward metric to a centralised endpoint (telegraf or not) Sample use case: Considering a Pool of server with telegraf and tcp_forwarder enabled, who will send data to some other sever with tcp_listener. This allow to have a better security, as credential for "real" output is not in every server of the Pool, and will allow to rotate credential much easier.
56d6a20
to
e791b0a
Compare
closes influxdata#1516 closes influxdata#1711 closes influxdata#1721 closes influxdata#1526
This can be use to forward metric to a centralized endpoint (telegraf or not)
Sample use case: