Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inputs.phpfpm exit if unix socket doesn't exist #14261

Closed
antitbone opened this issue Nov 8, 2023 · 2 comments · Fixed by #14852
Closed

inputs.phpfpm exit if unix socket doesn't exist #14261

antitbone opened this issue Nov 8, 2023 · 2 comments · Fixed by #14852
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/m 2-4 day effort

Comments

@antitbone
Copy link

Relevant telegraf.conf

[[inputs.phpfpm]]
  urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status","/not_exist"]
  fieldpass = [
    "active_processes",
    "idle_processes",
    "total_processes",
    "max_active_processes",
    "accepted_conn",
    "listen_queue",
    "max_listen_queue",
    "slow_requests"
  ]

Logs from Telegraf

2023-11-08T14:39:41Z E! [inputs.phpfpm] Error in plugin: socket doesn't exist "/not_exist"

System info

Telegraf 1.23.4, CentOS Linux release 7.4

Docker

No response

Steps to reproduce

  1. start from a configuration with a single unix socket.
[[inputs.phpfpm]]
  urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status"]

the metrics are collected correctly

$ telegraf --test /etc/telegraf/telegraf.conf | grep phpfpm
2023-11-08T14:39:56Z I! Using config file: /etc/telegraf/telegraf.conf
> phpfpm,host=tutu.com,pool=7.4-default,url=/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status accepted_conn=622065i,active_processes=4i,idle_processes=32i,listen_queue=0i,max_active_processes=35i,max_listen_queue=0i,slow_requests=0i,total_processes=36i 1699454396000000000
  1. add a non-existent unix socket in the socket list of the inputs.phpfpm plugin
- urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status"]
+ urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status","/not_exist"]
  1. the plugin reports an error for the non-existent socket but no longer reports those for the functional socket.
2023-11-08T14:39:41Z E! [inputs.phpfpm] Error in plugin: socket doesn't exist "/not_exist"
$ telegraf --test /etc/telegraf/telegraf.conf | grep phpfpm
> nothing

Expected behavior

The phpfpm plugin should collect the functional socket metrics even if some are inaccessible

Actual behavior

the phpfpm plugin seems to exit at the first non-existent socket

urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status","/not_exist","/not_exist2"]

> 2023-11-08T14:55:37Z E! [inputs.phpfpm] Error in plugin: socket doesn't exist "/not_exist"
urls = ["/usr/local/php-7.4/var/run/7.4-default.sock:fpm_status","/not_exist2","/not_exist"]

> 2023-11-08T14:55:13Z E! [inputs.phpfpm] Error in plugin: socket doesn't exist "/not_exis2t"

If the file exists but is not a socket or does not have the correct rights, accessible sockets metrics remains functional.

2023-11-08T14:58:35Z E! [inputs.phpfpm] Error in plugin: dial unix /not_exist: connect: connection refused

2023-11-08T00:51:00Z E! [inputs.phpfpm] Error in plugin: dial unix /not_exist: connect: permission denied

The non-existence of the socket should not completely stop execution on accessible sockets

Additional info

No response

@antitbone antitbone added the bug unexpected problem or unintended behavior label Nov 8, 2023
@powersj
Copy link
Contributor

powersj commented Nov 8, 2023

the phpfpm plugin seems to exit at the first non-existent socket

That is correct, inside the Gather function, any error during the call to expandUrls will exit. You are currently hitting the socket does not existing during globUnixSocket. Which causes gather to exit.

In general, if telegraf cannot connect to an external service, whether that is a cloud service, socket, IP address, hostname, etc. then we fail. To some users this may indicate a typo, an incorrect authentication, or a real network issue. Rather than acting that everything is ok, giving the user the false sense that metrics will correctly be collected, we error.

We have a number of feature requests open to allow Telegraf to continue on connection failures like this. I am happy to see a PR that changes the logic, with opt-in via a configuration option.

@powersj powersj added feature request Requests for new plugin and for new features to existing plugins size/m 2-4 day effort help wanted Request for community participation, code, contribution and removed bug unexpected problem or unintended behavior labels Nov 8, 2023
@srebhan
Copy link
Member

srebhan commented Feb 20, 2024

@antitbone can you please test the binary in PR #14852 available as soon as CircleCI finished all tests!? Let me know if this fixes the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution size/m 2-4 day effort
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants