Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf unable to start without existing log file #7905

Closed
jmcarless opened this issue Jul 27, 2020 · 6 comments · Fixed by #7909
Closed

Telegraf unable to start without existing log file #7905

jmcarless opened this issue Jul 27, 2020 · 6 comments · Fixed by #7909

Comments

@jmcarless
Copy link

Relevant telegraf.conf:

[agent]
flush_interval = "10s"
flush_jitter = "5s"
interval = "10s"
metric_buffer_limit = 20000
round_interval = true

System info:

Telegraf version 1.15.1, Amazon Linux 2018.03

Steps to reproduce:

  1. Install telegraf version 1.15.1
  2. /var/log/telegraf directory exists, but has no files
  3. Start telegraf

Expected behavior:

Telegraf starts, and writes logs to /var/log/telegraf/telegraf.log

Actual behavior:

sudo service telegraf start
Starting the process telegraf [ OK ]
sh: /var/log/telegraf/telegraf.log: Permission denied

Telegraf does not start

Additional info:

When rolling back our version to 14.5.1, telgraf starts correctly and logs to /var/log/telegraf/telegraf.log.
Alternatively, if I manually create a blank log file named /var/log/telegraf/telegraf.log and chown it to telegraf, then version 1.15.1 will start correctly.

This unexpectedly broke metrics reporting for us today on new hosts.

@ssoroka
Copy link
Contributor

ssoroka commented Jul 27, 2020

this seems like a permission problem. Telegraf should have write permissions to the /var/log/telegraf folder. Make sure that the user running the telegraf process, or its group, has write access to this folder.

@ssoroka ssoroka closed this as completed Jul 27, 2020
@jmcarless
Copy link
Author

@ssoroka i rolled back telegraf to 1.14.5 on the same host, no permissions changes or any other changes to the host besides yum downgrading. and the process started just fine.
this is the only reason I filed an issue here with telegraf

@jmcarless
Copy link
Author

Did telegraf 1.15.1 add some new requirement to creating this log file? I'm working around things on my end, but would like to understand why a new telegraf version was responsble for breaking things.

@jarretlavallee
Copy link

We are unable to start the telegraf service on fresh installations on EL nodes as well. This looks like an RPM packaging change.

In the 1.14.5 RPM /var/log/telegraf is owned by telegraf:telegraf

# rpm -qvl telegraf
-rw-r--r--    1 root    root                      131 Jun 30 19:20 /etc/logrotate.d/telegraf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /etc/telegraf
-rw-r--r--    1 root    root                   235766 Jun 30 19:20 /etc/telegraf/telegraf.conf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /etc/telegraf/telegraf.d
-rwxr-xr-x    1 root    root                 69213184 Jun 30 19:20 /usr/bin/telegraf
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /usr/lib/.build-id
drwxr-xr-x    2 root    root                        0 Jun 30 19:21 /usr/lib/.build-id/87
lrwxrwxrwx    1 root    root                       28 Jun 30 19:21 /usr/lib/.build-id/87/58b4a9009b5001278739cd097e59d24c18f23e -> ../../../../usr/bin/telegraf
-rw-r--r--    1 root    root                     5803 Jun 30 19:20 /usr/lib/telegraf/scripts/init.sh
-rw-r--r--    1 root    root                      492 Jun 30 19:20 /usr/lib/telegraf/scripts/telegraf.service
drwxr-xr-x    2 telegraf telegraf                    0 Jun 30 19:21 /var/log/telegraf

In the 1.15.1 it is owned by root:root

# rpm -qvl telegraf
-rw-r--r--    1 root    root                      131 Jul 22 22:21 /etc/logrotate.d/telegraf
-rw-r--r--    1 root    root                   250761 Jul 22 22:21 /etc/telegraf/telegraf.conf
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /etc/telegraf/telegraf.d
-rwxr-xr-x    1 root    root                 69730912 Jul 22 22:21 /usr/bin/telegraf
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /usr/lib/.build-id
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /usr/lib/.build-id/3c
lrwxrwxrwx    1 root    root                       28 Jul 22 22:21 /usr/lib/.build-id/3c/1b944565dc487f5646d216f361977b5c6bb4c0 -> ../../../../usr/bin/telegraf
-rwxr-xr-x    1 root    root                     5803 Jul 22 22:21 /usr/lib/telegraf/scripts/init.sh
-rw-r--r--    1 root    root                      492 Jul 22 22:21 /usr/lib/telegraf/scripts/telegraf.service
drwxr-xr-x    2 root    root                        0 Jul 22 22:21 /var/log/telegraf

It looks like the debian packages are working because there is a chown in the post install script: https://github.com/influxdata/telegraf/blob/master/scripts/deb/post-install.sh#L52 This is not present in the RPM post install script.

@ssoroka
Copy link
Contributor

ssoroka commented Jul 28, 2020

Thanks for the follow up! Reopening

@ssoroka
Copy link
Contributor

ssoroka commented Jul 28, 2020

should be resolved. Might have to wait for the nightly for the rpm to build to test. I reproduced locally in a VM and it seems to resolve the issue, so I'm thinking this should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants