Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telegraf tail & logparser should handle deleted & newly created files #1829

Closed
akan1979 opened this issue Sep 29, 2016 · 24 comments
Closed

Telegraf tail & logparser should handle deleted & newly created files #1829

akan1979 opened this issue Sep 29, 2016 · 24 comments
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Milestone

Comments

@akan1979
Copy link

Bug report

Relevant telegraf.conf:

[[inputs.logparser]]

files to tail.

files = ["/u0/log/jetty/*.request.log"]

Read file from beginning.

from_beginning = false

System info:

Telegraf 1.0.0
Ubuntu 14.04.4
Jetty

Steps to reproduce:

  1. Jetty is configured to rotate the access log to a new file in the format YYYY_MM_DD.request.log daily
  2. When a new access log file is created, Telegraf does not pick up the file and thus no further metrics is collected from the access log.

Expected behavior:

Telegraf logparser picks up new file on the same pattern defined in the telegraf.conf

Actual behavior:

Telegraf does not pick up new rotated file automatically

Additional info:

Here's an fuser output of the log directory. The 7692 pid is Telegraf. It was started on 9/28.
/u0/log/jetty/2016_09_23.request.log: 7692
/u0/log/jetty/2016_09_24.request.log: 7692
/u0/log/jetty/2016_09_25.request.log: 7692
/u0/log/jetty/2016_09_26.request.log: 7692
/u0/log/jetty/2016_09_27.request.log: 1081 7692
/u0/log/jetty/2016_09_28.request.log: 1081 7692
/u0/log/jetty/2016_09_29.request.log: 1081 8769

The 2016_09_29.request.log was created by the 8769 process with is jetty. notice that 7692 does not hold a handle of that file.

Interestingly the pid 1081 belongs to logstash-forwarder, which is able to pick up the newly created file automatically.

@DmitryKiselev
Copy link

+1 on this one.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Oct 6, 2016
@sparrc sparrc added this to the 1.1.0 milestone Oct 6, 2016
@dgnorton
Copy link
Contributor

It's working as intended. Jetty should be configured so that it always writes to the same log file and logrotate should be used to do the rotation.

@akan1979
Copy link
Author

akan1979 commented Oct 20, 2016

@dgnorton Unfortunately this is not something that is out of the box standard to jetty.

Please refer to here:
http://wiki.eclipse.org/Jetty/Howto/Configure_Request_Logs

Either way, logparser should detect new files. There is not only 1 way to accomplish 1 task.

@dgnorton
Copy link
Contributor

@akan1979 what happens if retainDays is set to 2 billion (> 5M years) and yyyy_mm_dd is removed from the file name?

@njhartwell
Copy link

njhartwell commented Nov 9, 2016

This seems to be a problem when a file is (re)moved as well, e.g. given this config:

[[inputs.logparser]]
  files = ["/tmp/telegraf_input.log"]
  from_beginning=false
  [inputs.logparser.grok]
    patterns=["%{INT:measurement}"]

I see this behavior:

echo 1 > /tmp/telegraf_input.log
# start telegraf
echo 2 >> /tmp/telegraf_input.log
# telegraf picks up 2
rm /tmp/telegraf_input.log && touch /tmp/telegraf_input.log
# telegraf 1.0: nothing reported, telegraf 1.0.1 reports
# "Waiting for /tmp/telegraf_input.log to appear..."
echo 3 >> /tmp/telegraf_input.log
# Nothing happens :(

On the plus side, truncation detection seems to work, e.g.

echo 1 > /tmp/telegraf_input.log
echo 2 > /tmp/telegraf_input.log
# produce
# Re-opening truncated file /tmp/telegraf_input.log ...
# Successfully reopened truncated /tmp/telegraf_input.log

... but I suspect more than a few logging tools (e.g. logback) don't support rolling by truncation.

@sparrc sparrc modified the milestones: 1.3.0, 1.1.0 Nov 9, 2016
@sparrc
Copy link
Contributor

sparrc commented Nov 9, 2016

Internally telegraf has to spinup a thread for each file that it needs to tail. Currently it only does that once at startup, which is why deleted & created files are not handled.

I think it's worthwhile to handle these two cases in the tailer & logparser so I'm going to reopen this issue and put it into the 1.3 milestone.

@sparrc sparrc reopened this Nov 9, 2016
@sparrc sparrc changed the title Telegraf inputs.logparser does not tail newly created log file after the daemon started Telegraf tail & logparser should handle deleted & newly created files Nov 9, 2016
@sparrc sparrc added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed bug unexpected problem or unintended behavior labels Nov 9, 2016
@njhartwell
Copy link

Sounds good. FWIW, it looks like this lib solves addresses these problems. I haven't dug into the source yet, but it has a lot of ⭐️s so must be good :)

@sparrc
Copy link
Contributor

sparrc commented Nov 9, 2016

@njhartwell what lib are you talking about?

@njhartwell
Copy link

njhartwell commented Nov 9, 2016

Sorry, forgot to paste it https://github.com/hpcloud/tail

@sparrc
Copy link
Contributor

sparrc commented Nov 9, 2016

we are already using that library, it doesn't solve all of this

@mottati
Copy link

mottati commented Jan 30, 2017

Is this still slated to go into the next release of Telegraf? (1.3)

Other than restarting the Telegraf service after each log rotation, are there any other workarounds you can suggest for this?

We are trying to replace Logstash (which handles this case) with Telegraf, we can not proceed until this issue is resolved.

@sparrc
Copy link
Contributor

sparrc commented Jan 30, 2017

yes, still planned

nope, there is no other known workaround

@mottati I believe #2141 should solve this. Are you having problems with newly created files or only logrotated files?

@mottati
Copy link

mottati commented Jan 30, 2017

Our particular use case is we have Telegraf pointing to a test machine where we periodically reinstall our software with newer versions. The deployment is fairly simplistic, we shutdown our servers, rename the old deployment directory to backup.directory (which includes the log file we are monitoring), install the new software, and bring our services back up.

We can add a restart of the Telegraf service to this process until 1.3 is released.

Interestingly enough when I went to look into this this morning, I noticed that Telegraf was still holding on to a deleted file, our software had been reinstalled twice since I had last looked so the file was renamed into the backup directory in the first round, and deleted in the second.

sudo lsof -p 43888
COMMAND    PID     USER   FD   TYPE     DEVICE  SIZE/OFF       NODE NAME
telegraf 43888 telegraf  cwd    DIR        8,1     16384          2 /
telegraf 43888 telegraf  rtd    DIR        8,1     16384          2 /
telegraf 43888 telegraf  txt    REG        8,1  32548479     543967 /usr/bin/telegraf
telegraf 43888 telegraf  mem    REG        8,1   1921096     442382 /lib64/libc-2.12.so
telegraf 43888 telegraf  mem    REG        8,1    142640     442406 /lib64/libpthread-2.12.so
telegraf 43888 telegraf  mem    REG        8,1    154520     442375 /lib64/ld-2.12.so
telegraf 43888 telegraf    0r   CHR        1,3       0t0       4160 /dev/null
telegraf 43888 telegraf    1w   CHR        1,3       0t0       4160 /dev/null
telegraf 43888 telegraf    2w   REG      253,0     43741   28721156 /var/log/telegraf/telegraf.log
telegraf 43888 telegraf    3r   CHR        1,9       0t0       4165 /dev/urandom
telegraf 43888 telegraf    4r   REG      253,0 183324583   63406516 /data/solo.backup/soloots/logs/tasks.log (deleted)
telegraf 43888 telegraf    5u   REG        0,9         0       4158 [eventpoll]
telegraf 43888 telegraf    7r  IPv4 2235795701       0t0        TCP localhost.localdomain:45245->localhost.localdomain:sun-as-jpda (ESTABLISHED)
telegraf 43888 telegraf    8r   DIR       0,10         0          1 inotify
telegraf 43888 telegraf    9u   REG        0,9         0       4158 [eventpoll]
telegraf 43888 telegraf   10r  FIFO        0,8       0t0 2166411407 pipe
telegraf 43888 telegraf   11w  FIFO        0,8       0t0 2166411407 pipe
telegraf 43888 telegraf   12u  IPv4 2226014191       0t0        TCP localhost.localdomain:55572->localhost.localdomain:d-s-n (ESTABLISHED)

Note the deleted tasks.log file that Telegraf has open above.

Than could be another issue you may want to investigate.

@kevinhatry
Copy link

👍

The tail plugin Readme explicitly states that it should behave like a tail -F meaning that :

it will follow the name of the given file, so that it will be compatible with log-rotated files, and that it will retry on inaccessible files.

but this is not the case as it keeps open the renamed/deleted file.
So it looks more a bug than an enhancement.

@sparrc
Copy link
Contributor

sparrc commented Feb 16, 2017

this was closed by #2141, it will be available in 1.3

@sparrc sparrc closed this as completed Feb 16, 2017
@mottati
Copy link

mottati commented Mar 10, 2017

I tried testing this today using the same technique that was listed at the beginning of this issue. I used the latest nightly build of Telegraf. Here is what I found:

telegraf version:

telegraf --version
Telegraf vdev~n201703100819 (git: master 13f314a5076a47b208aeef5bdab12470837e26c1)

Configuration File

[agent]
   flush_interval=2

[[inputs.logparser]]
  files = ["/tmp/telegraf/input.log"]
  from_beginning=true
  [inputs.logparser.grok]
    patterns=["%{INT:measurement}"]

[[outputs.file]]
  files = ["stdout"]

Test Script

#!/bin/bash
# Start with 1 line of data in the input.log
echo 0 > input.log
rm nohup.out
nohup telegraf --config config --debug 2>/dev/null  &
pid=$!

for i in {1..5}
do
    echo "Writing line $i"
    echo $i >> input.log
    sleep 10
    rm input.log
done
kill -9 $pid
cat nohup.out

Test Execution

$ ./test
Writing line 1
Writing line 2
Writing line 3
Writing line 4
Writing line 5
logparser_grok,host=node2087.svc.devpg.pdx.wd measurement="0" 1489182278368364318
logparser_grok,host=node2087.svc.devpg.pdx.wd measurement="1" 1489182278368407112

Conclusion

Assuming that this PR is actually in the build I used, it does not appear that this fix accomplishes what was requested e.g. discovery of newly created files with the same name of the original file.

We are currently using LogStash to parse our log that has this behavior. This is the last remaining issue we have that will allow us to use a pure TICK stack.

@danielnelson
Copy link
Contributor

I can duplicate your report @mottati. It seems that recreated files are not detected, but truncated files are. There is also a bug where if the file is truncated with a file of the same size it is not detected.

@danielnelson danielnelson reopened this Mar 10, 2017
@danielnelson
Copy link
Contributor

influxdata/tail#3

@danielnelson
Copy link
Contributor

I merged a fix for the problem into our tail fork.

@geekdave
Copy link

@danielnelson Thanks! Do you know when this fix will make it into a released version of Telegraf?

@danielnelson
Copy link
Contributor

The fix for this was in the 1.3.0 release, though I have heard that there are remaining issues. #2847

@hd-COO7
Copy link

hd-COO7 commented May 28, 2020

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

@gassanpavel
Copy link

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

Yes, got this bug at telegraf 1.17.0 with [inputs.tail] plugin on ubuntu 20.04
I run telegraf and tail -F --lines=0 /var/log/voodoo/voodoo_metric.log in separate window/
I wrote some loglines to my logfile, and both - telegraf and tail see it and telegraf parse it correctly.
Then, I remove /var/log/voodoo/voodoo_metric.log and touch it, and tail say:
tail: '/var/log/voodoo/voodoo_metric.log' has become inaccessible: No such file or directory tail: '/var/log/voodoo/voodoo_metric.log' has appeared; following new file server_iops{operation="data_read"} 976 1594660047713
but telegraf still dont see new logfile(

@mr-manuel
Copy link

mr-manuel commented Jan 5, 2021

This still seems like an issue for [inputs.tail] for Telegraf 1.14.2. Once the file has been deleted and recreated with same name tail doesn't work anymore.

Yes, got this bug at telegraf 1.17.0 with [inputs.tail] plugin on ubuntu 20.04
I run telegraf and tail -F --lines=0 /var/log/voodoo/voodoo_metric.log in separate window/
I wrote some loglines to my logfile, and both - telegraf and tail see it and telegraf parse it correctly.
Then, I remove /var/log/voodoo/voodoo_metric.log and touch it, and tail say:
tail: '/var/log/voodoo/voodoo_metric.log' has become inaccessible: No such file or directory tail: '/var/log/voodoo/voodoo_metric.log' has appeared; following new file server_iops{operation="data_read"} 976 1594660047713
but telegraf still dont see new logfile(

I have the same issue when using Telegraf (1.17.0) [inputs.tail] and Grok Pattern with Linux (Debian Buster latest version) build-in logrotation that rotates the logfile. After the log file gets rotatet Telegraf don't log the new data and output this error:

2021-01-05T23:00:00Z E! [inputs.tail] Tailing "/var/log/mylogfile.log": Unable to open file /var/log/mylogfile.log: open /var/log/mylogfile.log: permission denied
2021-01-05T23:01:01Z E! [inputs.tail] Stopping tail on "/var/log/mylogfile.log": Unable to open file /var/log/mylogfile.log: open /var/log/mylogfile.log: permission denied

After restarting the telegraf service all works fine again until the next log rotation. As workaround I restart the telegraf service after the log rotation, but thats not a clean solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

No branches or pull requests