You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had quite a few instances where LSF hasn't been able to ship logs to logstash. This is usually due to firewall errors, network interruptions or misconfiguration of LSF/logstash. Most recently this was because the certificate I created had an expiry of 1 month.
While it's possible to monitor the logstash end (did I receive any events from this host in the last 30 minutes?), it'd be much nicer to be able to get LSF's side of the story, too.
I recommend that this work in a similar way to "show slave status" in mysql: a human & machine readable state file which shows:
the datetime of the most recently read log line
the datetime of the most recently successfully shipped log line
the number of log lines in the send queue
A nagios or scout plugin could then easily error if the time between those two points is too large.
The text was updated successfully, but these errors were encountered:
At our organisation, we're also keen to logstash-forwarder gain more intelligent ways to be managed and monitored.
On Linux, I see it does output to syslog, but of course, unless I'm using something like rsyslog to output to elasticsearch directly or another syslog server, I'm not going to know from a 'central' location how my thousands of logash-forwarders (agents are doing).
Perhaps a heartbeat/health function or API could be added, just like there are plans for the main logstash service at elastic/logstash#2611
We've had quite a few instances where LSF hasn't been able to ship logs to logstash. This is usually due to firewall errors, network interruptions or misconfiguration of LSF/logstash. Most recently this was because the certificate I created had an expiry of 1 month.
While it's possible to monitor the logstash end (did I receive any events from this host in the last 30 minutes?), it'd be much nicer to be able to get LSF's side of the story, too.
I recommend that this work in a similar way to "show slave status" in mysql: a human & machine readable state file which shows:
A nagios or scout plugin could then easily error if the time between those two points is too large.
The text was updated successfully, but these errors were encountered: