-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker/Childs are left behind if main app crashes #14
Comments
Under certain conditions (for example reboot) the statusengine process comes up without parent pid / master process See example here:
|
Okay found a problem It seems if the mysql connection cannot be established, all the worker processes are already spawned but the main process crashes with ERROR |
Okay, the bug is not the init scripts, it's the statusegine application leaving childs/worker behind if the main process exits |
You can see here that statusengine is not running
I'm shutting down the mysql-server and even delete the socket file (yes I know that's not nice).
When we now start the statusengine application it will crash with an error that mysql connection can't be established. (Maybe there are also other cases where statusengine might crash on startup)
After the main script has crashed you can check your process list and find many orphanded statusengine processes without parent-pid (aka master) idling around.
|
Yes thats right, if the parent process of statusengine crashs, the child processes are orphanded. You need to kill them manually :/ It looks like, that your MySQL credentials are wrong in https://github.com/nook24/statusengine/blob/master/cakephp/app/Config/database.php#L78-L86
If this is working, your user has the permissions for the database. May be you are using nagios instead of naemon as database name... Please make sure, that you are using the current database schema: https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Config/Schema/legacy_schema_innodb.php
To debug Statusengine, you should run it in foreground mode using:
I guess you did this to get the error message you pasted above :) At the moment Statusengine only silently catches the MySQL error "General error: 2006 MySQL server has gone away" https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Model/LegacyAppModel.php#L198-L203 to reconnect if you may be restart your MySQL server. Please check that there are no orphanded child processes are running on your system if you fire up Statusengine |
In this case it was a cluster configuration where statusengine start/stop/status is controlled via pacemaker. And we found a situation where orphaned statusengine childs were left behind. Since the init-script doesn't do any additional checks for orphaned childs and the childs themself don't care if the master connection is lost I opened this issue. IMO the childs should be killed before a hard exit (catched exception) or the childs should kill themself if the connection to it's master process is lost. Just leaving them behind can create additional problems. |
Thats true, let me check if i can hack a quick fix for you, sounds not hard to check if the parent process is dead or not... |
Another case would be during startup where the scripts are started in the wrong way (statusengine before MySQL). |
Child processes now check the pid of their parent process. If this changes or is 1 the parent proces is dead and the child processes will exit
I patched Statusengine right now and it looks not to bad for the first try:
Check that everything is running and happy:
Kill the parent process and see what happens:
/var/log/statusengine.log reports now:
Check the init script:
You can test this on your environment if you checkout the branch: https://github.com/nook24/statusengine/tree/issue_14 Unfortunately my init script is not the best and need to be ported to systemd i guess... |
Looks good so far. Now I just have to get the right starting order so it doesn't crash on startup. Btw. The fact that the master process has died is not visible in the logfile. You only see this if you start it manually. If you start it via init script and the process dies due to unavailable database connectivity the start-stop-daemon doesn't say anything or writes anything to the log. I would suggest if you don't catch the exception to do something with it that at least stuff gets logged so it can be reviewed if something happens. |
I add the boot order to the documentation: https://statusengine.org/documentation.php#boot-order I will check if i can add the errors to the logfile... |
Looks better, but now the error output on the commandline is missing. Since it's a critical error it should be printed to STDERR. Here's the log output
I'm wondering why do you try to create a new log handler/resource? Can't you use your Console/Command/Task/LogfileTask.php logger? For logging stuff I'm opening a different issue with other things in mind. |
My plan is to remove this class... If you want i can enable stderr output as well |
Merged into new Version 1.6.0 |
When using the provided start script, several worker process are being started and forked into background without any real parent pid.
start-stop-daemon creates a pid file of the main php script which then gets closed after all worker processes are being created.
When calling the init script with the status parameter you get an error that statusengine is not running cause the pid file created by start-stop-daemon has already ended but the forked childs still live on.
When you try to stop statusengine it tells you that statusengine has already been stopped (the process to the pid file .... same as status).
The text was updated successfully, but these errors were encountered: