Originally by Norman Gray, modified by Michael Hirsch.
sigwatch
is a library of routines to provide simple signal watching for Fortran programs. This allows a minimal level of control of a running program from outside it, for example to tell it to checkpoint itself on receipt of a signal.
- Version 1.0, 2011 February 2
- Version 1.1, 2021 August: make work on Windows and use Fortran 2003
iso_c_binding
.
It is often useful to have some simple signal handling in larger Fortran programs, for example to handle the INT interrupt signal generated by ^C, and have a program shut itself down cleanly; or to handle one of the user signals USR1 or USR2, for example to have a program checkpoint itself, in case it crashes at some later stage. However, signal handling is tricky in Fortran (because the function that is registered as a signal handler is later called by value rather than by reference), so this library provides functions to make it easier.
NOTE: This library was originally designed for Linux 10 years ago. It was subsequently modified to work on Windows to capture Ctrl-C SIGINT
. On MacOS, SIGHUP is among the signals that can be captured. The library may need further improvements to work more generally on MacOS, Windows, and other non-Linux platforms.
On Unix, there is a smallish set of signals which may be sent to a running process, which the process can either catch or ignore. For example, the INT signal is sent to a process by pressing the interrupt character (usually ^C), HUP is sent when a controlling terminal logs out, and KILL can be sent either by hand or by the system when it is forcing processes to die. The default action of the INT signal is to terminate a process, and by default the HUP signal is ignored. The KILL signal is one of those which cannot be caught or ignored, but always has its effect. There are also two signals, called USR1 and USR2 which are ignored by default, have no default meaning, and are provided for user convenience.
Each signal has a numeric value -- for example HUP is 1 and KILL is 9 -- and after finding a process's PID with the ps(1) command, you can send signals to it with the kill(1) command:
% kill -HUP <pid>
or
% kill -1 <pid>
Signals thus provide a limited mechanism for communicating with a running program. A useful way to use this is to have the program watch for signal USR1, say, and examine this by calling function getlastsignal at the end of a loop. If this returns a non-zero response, you might make your program checkpoint itself -- save its state for later restart -- in case the program crashes or has to be stopped for some reason.
For more details about signals, see the man pages for signal(3) or signal(7), depending on your platform.
A program prepares to receive signals by calling one of the watchsignalname or watchsignal functions, and calls getlastsignal at any point to retrieve the last signal which was sent to the process.
The arguments to watchsignalname are signame, a character string containing the name of the signal to watch for, and response, an integer which will be returned by getlastsignal after the specified signal has been caught. The signal names which the function recognizes are those most likely to be useful, namely HUP, INT, USR1 and USR2.
The integer response is the number which will subsequently be returned by getlastsignal, after this signal is caught. If this response is passed as -1, the signal number associated with this name is what will be returned. Note that, although both HUP and INT have generally fixed numbers, the numbers associated with signals USR1 and USR2 are different on different unix variants.
If you need to catch another signal for some reason (make sure you understand the default behavior of the given signal first, however) you can give that signal as a number to the watchsignal function, and when that signal is later caught, the corresponding number is what will be returned by getlastsignal.
The getlastsignal function returns the response associated with the last signal which was caught, or zero if no signal has been caught so far, or since the last call to getlastsignal. That is, any caught signal is returned only once.
The installed signal handler does not re-throw the signal after it has caught it; this would defeat the purpose of this library for those signals, such as HUP and INT, for which the default action is to kill the process. Also, there is no way to tell if the signal was received by being re-thrown by another handler, installed after this one. If all of this matters to you, then this library cannot reasonably help you, and you have no hope but to learn to love the sigaction(2) manpage.
When installing the handler, these functions replace any previous signal handler. If that was a non-default one (for example, one put there by an MPI environment) this could potentially change the behavior of your program in an unhelpful fashion. To warn you of this, these functions return +1 in this case; this is a success return value, but also a warning that you should understand what that previous signal handler was doing there.
The sigwatchversion function returns the version number of the library, as an integer formed from the version number by major_version * 1000 + minor_version, So that the version number 1.2, for example, would be returned as integer 1002.
Both watchsignalname and watchsignal return 0 if the signal watching was installed successfully, and -1 if there was an error. If there was a non-default signal handler already installed, it is replaced, but the routine returns 1 to warn you of this.
The function getlastsignal returns the response associated with the last signal caught, or zero if there has been no signal caught since the last time this function was invoked.
cmake -B build
cmake --build build
% build/demo & # start in the background ($! now has the PID)
[1] 15131
watchsignal 10: 0
watchsignal HUP: 0
% lastsig= 0
lastsig= 0
lastsig= 0
kill -HUP $! # send the HUP signal to the process
lastsig=99 # saw it!
% lastsig= 0