Systemd Log Watchdog

Staleness-based watchdog for systemd services. Staleness is assessed based on recency of log messages

Intro

A watchdog for any systemd service that allows detecting processes/services/units that have frozen. It restarts them based on staleness/oldness of log messages. This is strictly designed to work for Linux based systems, and has only been tested on Ubuntu. However, it should work for other Linux-based OSes too.

systemd unit-configuration does allow the following internally:

Restart, RestartSec: Restarts the service by assessing the "deadness" of the service
WatchDogSec: Restarts the service by assessing the "liveness" of the service

However, these configurations also come with their fair share of limitations. Restart relies on the processes exiting for systemd to act upon it. And the inbuilt WatchDog of systemd requires the service to actively send out signals (called sd_notify) for systemd to NOT kill it. Sometimes though, we just want a simple solution that can restart freezing processes which have STOPPED sending log messages too. Furthermore, systemd's inbuilt Watchdog only comes with systemd version > 240, which is not available in earlier OS versions like Ubuntu 18.

Installation

sudo -H pip3 install git+https://github.com/detecttechnologies/sdlogwatchdog.git@main

NOTE: As the program adds a systemd-service, the pip-installation has to be run with sudo as mentioned above. It is important that this is followed even when using conda/venv.

Usage

The syntax for usage is:

sudo systemctl start sdlogwatchdog@"service-to-be-monitored=stale-timeout".service

The stale-timeout is the maximum permissible time for which the service can be left alive without it being killed and restarted. It supports any format for time specification supported by python-dateutil. Additionally, if no units are passed (ref: examples below), then it takes the units as seconds
If you would like to run this log-freshness-based watchdog for multiple systemd units parallely, then you can do so
```
sudo systemctl start sdlogwatchdog@process1
sudo systemctl start sdlogwatchdog@"process2=7m 5s"
```

Examples:

Let us say you want to monitor a systemd-service called my-program1.service. If it doesn't throw any log messages for 25 seconds, you would like it to be restarted. Then, the command you need to run is
```
sudo systemctl start sdlogwatchdog@"my-program1=25".service
# OR 
sudo systemctl start sdlogwatchdog@"my-program1=25s".service
```
Suppose you want to monitor a unit called my-program2.service, with a staleness timeout of 1 day , 3 hours, and 10 minutes:
```
sudo systemctl start sdlogwatchdog@"my-program2=1d 3h 10m".service
```

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
sdlogwatchdog		sdlogwatchdog
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
sdlogwatchdog@.service		sdlogwatchdog@.service
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Systemd Log Watchdog

Intro

Installation

Usage

Examples:

About

Releases

Packages

Languages

License

detecttechnologies/sdlogwatchdog

Folders and files

Latest commit

History

Repository files navigation

Systemd Log Watchdog

Intro

Installation

Usage

Examples:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages