You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make sure the client cant talk to server[1-3]; we did ip route add x via 127.0.0.1 to null route it but you could use a firewall or just point it to IPs that are not running Kafka.
What we expect:
Kafka output fails and tries to reconnect 100 times
I can still see the CPU input plugin sending data to stdout
Once Kafka manages to connect then I see the data there as well
What actually happens:
Kafka tries to connect a couple of times
CPU input plugin data is never passed to the stdout
After kafka fails, Telegraf exits with an error
2. If the Kafka sasl_password is wrong and SASL auth enabled
This is trivial to reproduce - just change the sasl_password for a working config.
What we expect:
Kafka output fails and tries to reconnect X times
Everything else works fine
What actually happens:
Telegraf immediately fails to start with this error (process exits):
[root@x ~]# /usr/local/telegraf/bin/telegraf -config /etc/telegraf/telegraf.conf --config-directory /etc/telegraf/conf.d
2021-09-16T11:23:51Z I! Starting Telegraf build-50
...
2021-09-16T11:23:51Z E! [agent] Failed to connect to [outputs.kafka], retrying in 15s, error was 'kafka server: SASL Authentication failed.'
The text was updated successfully, but these errors were encountered:
There are a few retry mechanisms built into sarama (the library that telegraf uses for kafka support). I did a quick test in #9786 to see if they affect connection retries like the ones described in this issue. I configured telegraf to connect to localhost on a port that isn't listening. In this case the config.Producer.Retry and config.Admin.Retry settings don't seem to affect retries.
We will need to spend some more time understanding how sarama intends to handle connection failures and retries. If there is no provision for retrying connection failures in the library, we may need the plugin to detect failures and retry them.
There are 2 failures that we see in Telegraf 1.20-rc0 just in the Kafka plugin, despite #9051 that was supposed to fix this plugin:
1. If the Kafka backends are just down
Use this config to test:
Make sure the client cant talk to server[1-3]; we did ip route add x via 127.0.0.1 to null route it but you could use a firewall or just point it to IPs that are not running Kafka.
What we expect:
What actually happens:
2. If the Kafka sasl_password is wrong and SASL auth enabled
This is trivial to reproduce - just change the sasl_password for a working config.
What we expect:
What actually happens:
The text was updated successfully, but these errors were encountered: