Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker/Childs are left behind if main app crashes #14

Closed
dhoffend opened this issue Jan 12, 2016 · 15 comments
Closed

Worker/Childs are left behind if main app crashes #14

dhoffend opened this issue Jan 12, 2016 · 15 comments

Comments

@dhoffend
Copy link
Contributor

When using the provided start script, several worker process are being started and forked into background without any real parent pid.

start-stop-daemon creates a pid file of the main php script which then gets closed after all worker processes are being created.

When calling the init script with the status parameter you get an error that statusengine is not running cause the pid file created by start-stop-daemon has already ended but the forked childs still live on.

When you try to stop statusengine it tells you that statusengine has already been stopped (the process to the pid file .... same as status).

@dhoffend
Copy link
Contributor Author

Under certain conditions (for example reboot) the statusengine process comes up without parent pid / master process

See example here:

root      1520     1  0 17:11 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      1521     1  0 17:11 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      1522     1  0 17:11 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      1523     1  0 17:11 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      1524     1  0 17:11 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w

@dhoffend
Copy link
Contributor Author

Okay found a problem It seems if the mysql connection cannot be established, all the worker processes are already spawned but the main process crashes with ERROR

@dhoffend
Copy link
Contributor Author

Okay, the bug is not the init scripts, it's the statusegine application leaving childs/worker behind if the main process exits

@dhoffend dhoffend changed the title Init script status/stop unusable Worker/Childs are left behind if main app crashes Jan 12, 2016
@dhoffend
Copy link
Contributor Author

You can see here that statusengine is not running

# ps -ef | grep statusengine
root      4663  2063  0 17:56 pts/0    00:00:00 grep --color=auto statusengine

I'm shutting down the mysql-server and even delete the socket file (yes I know that's not nice).

# /etc/init.d/mysql stop
 * Stopping MySQL database server mysqld [ OK ]
# rm /var/run/mysqld/mysqld.sock

When we now start the statusengine application it will crash with an error that mysql connection can't be established. (Maybe there are also other cases where statusengine might crash on startup)

# /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w

Welcome to CakePHP v2.5.4 Console
---------------------------------------------------------------
App : app
Path: /opt/statusengine/cakephp/app/
---------------------------------------------------------------
Starting Statusengine version: 1.5.1...
THIS IS LEGACY MODE!
Error: Database connection "Mysql" is missing, or could not be created.
#0 /opt/statusengine/cakephp/lib/Cake/Model/Datasource/DboSource.php(260): Mysql->connect()
#1 /opt/statusengine/cakephp/lib/Cake/Model/ConnectionManager.php(105): DboSource->__construct(Array)
#2 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3524): ConnectionManager::getDataSource('legacy')
#3 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1136): Model->setDataSource('legacy')
#4 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3551): Model->setSource('instances')
#5 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1369): Model->getDataSource()
#6 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1553): Model->schema()
#7 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2333): Model->create()
#8 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2672): StatusengineLegacyShell->createInstance()
#9 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(210): StatusengineLegacyShell->forkWorker()
#10 /opt/statusengine/cakephp/lib/Cake/Console/Shell.php(440): StatusengineLegacyShell->main()
#11 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(209): Shell->runCommand('-w', Array)
#12 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(66): ShellDispatcher->dispatch()
#13 /opt/statusengine/cakephp/app/Console/cake.php(47): ShellDispatcher::run(Array)
#14 {main}

After the main script has crashed you can check your process list and find many orphanded statusengine processes without parent-pid (aka master) idling around.

# ps -ef | grep -v grep | grep statusengine
root      5013     1  0 17:58 pts/0    00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      5014     1  0 17:58 pts/0    00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      5015     1  0 17:58 pts/0    00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      5016     1  0 17:58 pts/0    00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root      5017     1  0 17:58 pts/0    00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w

@nook24
Copy link
Owner

nook24 commented Jan 12, 2016

Yes thats right, if the parent process of statusengine crashs, the child processes are orphanded. You need to kill them manually :/

It looks like, that your MySQL credentials are wrong in https://github.com/nook24/statusengine/blob/master/cakephp/app/Config/database.php#L78-L86
Have you tried, if your are able to login with the given username and password?

mysql -u statusengine -p
Enter password:
mysql> use naemon
Database changed
mysql> select current_state from naemon_servicestatus LIMIT 0, 1;
+---------------+
| current_state |
+---------------+
|             3 |
+---------------+
1 row in set (0.00 sec)

mysql>

If this is working, your user has the permissions for the database. May be you are using nagios instead of naemon as database name...

Please make sure, that you are using the current database schema: https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Config/Schema/legacy_schema_innodb.php
You can update/create your database for Statusengine by executing:

/opt/statusengine/cakephp/app/Console/cake schema update --plugin Legacy --file legacy_schema_innodb.php --connection legacy

To debug Statusengine, you should run it in foreground mode using:

/opt/statusengine/cakephp/app/Console/cake statusengine_legacy -w

I guess you did this to get the error message you pasted above :)

At the moment Statusengine only silently catches the MySQL error "General error: 2006 MySQL server has gone away" https://github.com/nook24/statusengine/blob/master/cakephp/app/Plugin/Legacy/Model/LegacyAppModel.php#L198-L203 to reconnect if you may be restart your MySQL server.
On every other MySQL error Statusengine will crash :(

Please check that there are no orphanded child processes are running on your system if you fire up Statusengine

@dhoffend
Copy link
Contributor Author

In this case it was a cluster configuration where statusengine start/stop/status is controlled via pacemaker. And we found a situation where orphaned statusengine childs were left behind.

Since the init-script doesn't do any additional checks for orphaned childs and the childs themself don't care if the master connection is lost I opened this issue.

IMO the childs should be killed before a hard exit (catched exception) or the childs should kill themself if the connection to it's master process is lost. Just leaving them behind can create additional problems.

@nook24
Copy link
Owner

nook24 commented Jan 12, 2016

Since the init-script doesn't do any additional checks for orphaned childs and the childs themself don't care if the master connection is lost I opened this issue.

Thats true, let me check if i can hack a quick fix for you, sounds not hard to check if the parent process is dead or not...

@dhoffend
Copy link
Contributor Author

Another case would be during startup where the scripts are started in the wrong way (statusengine before MySQL).

nook24 added a commit that referenced this issue Jan 12, 2016
Child processes now check the pid of their parent process. If this changes or is 1 the parent proces is dead and the child processes will exit
@nook24
Copy link
Owner

nook24 commented Jan 12, 2016

I patched Statusengine right now and it looks not to bad for the first try:

root@statusengine-demo:/opt/statusengine/cakephp/app# service statusengine start
Starting Statusengine

Check that everything is running and happy:

root@statusengine-demo:/opt/statusengine/cakephp/app# ps -eaf |grep status
root     10689 10656  0 18:37 pts/4    00:00:00 tailf /var/log/statusengine.log
root     10835 10707  0 18:37 pts/5    00:00:00 watch gearadmin --status
root     11266     1  7 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11267 11266  2 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11268 11266  1 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11269 11266  1 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11270 11266  1 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11271 11266  0 18:40 ?        00:00:00 /usr/bin/php5 -q /opt/statusengine/cakephp/app/Console/cake.php -working /opt/statusengine/cakephp/app statusengine_legacy -w
root     11279 10077  0 18:40 pts/1    00:00:00 grep --color=auto status

Kill the parent process and see what happens:

root@statusengine-demo:/opt/statusengine/cakephp/app# kill -9 11266; sleep 1; ps -eaf  |grep status
root     10689 10656  0 18:37 pts/4    00:00:00 tailf /var/log/statusengine.log
root     10835 10707  0 18:37 pts/5    00:00:00 watch gearadmin --status
root     11319 10077  0 18:40 pts/1    00:00:00 grep --color=auto status

/var/log/statusengine.log reports now:

[11270] My parent process is gone I guess I am orphaned and will exit now!
[11268] My parent process is gone I guess I am orphaned and will exit now!
[11271] My parent process is gone I guess I am orphaned and will exit now!
[11269] My parent process is gone I guess I am orphaned and will exit now!
[11267] My parent process is gone I guess I am orphaned and will exit now!

Check the init script:

root@statusengine-demo:/opt/statusengine/cakephp/app# service statusengine status
No /usr/bin/php5 found running; none killed.
Statusengine is not running

You can test this on your environment if you checkout the branch: https://github.com/nook24/statusengine/tree/issue_14

Unfortunately my init script is not the best and need to be ported to systemd i guess...
But pacemaker should launch all your processes, not SysVinit.

@nook24 nook24 added the testing label Jan 12, 2016
@dhoffend
Copy link
Contributor Author

Looks good so far. Now I just have to get the right starting order so it doesn't crash on startup.

Btw. The fact that the master process has died is not visible in the logfile. You only see this if you start it manually. If you start it via init script and the process dies due to unavailable database connectivity the start-stop-daemon doesn't say anything or writes anything to the log.

I would suggest if you don't catch the exception to do something with it that at least stuff gets logged so it can be reviewed if something happens.

@nook24
Copy link
Owner

nook24 commented Jan 13, 2016

I add the boot order to the documentation: https://statusengine.org/documentation.php#boot-order

I will check if i can add the errors to the logfile...

@nook24
Copy link
Owner

nook24 commented Jan 13, 2016

@dhoffend can you please checkout my last commit SHA: 1e7b44b ? I guess this is what you are looking for.

@dhoffend
Copy link
Contributor Author

Looks better, but now the error output on the commandline is missing. Since it's a critical error it should be printed to STDERR.

Here's the log output

exception 'MissingConnectionException' with message 'Database connection "Mysql" is missing, or could not be created.' in /opt/statusengine/cakephp/lib/Cake/Model/Datasource/Database/Mysql.php:191
Stack trace:
#0 /opt/statusengine/cakephp/lib/Cake/Model/Datasource/DboSource.php(260): Mysql->connect()
#1 /opt/statusengine/cakephp/lib/Cake/Model/ConnectionManager.php(105): DboSource->__construct(Array)
#2 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3524): ConnectionManager::getDataSource('legacy')
#3 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1136): Model->setDataSource('legacy')
#4 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(3551): Model->setSource('instances')
#5 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1369): Model->getDataSource()
#6 /opt/statusengine/cakephp/lib/Cake/Model/Model.php(1553): Model->schema()
#7 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2331): Model->create()
#8 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(2670): StatusengineLegacyShell->createInstance()
#9 /opt/statusengine/cakephp/app/Console/Command/StatusengineLegacyShell.php(208): StatusengineLegacyShell->forkWorker()
#10 /opt/statusengine/cakephp/lib/Cake/Console/Shell.php(440): StatusengineLegacyShell->main()
#11 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(209): Shell->runCommand('-w', Array)
#12 /opt/statusengine/cakephp/lib/Cake/Console/ShellDispatcher.php(66): ShellDispatcher->dispatch()
#13 /opt/statusengine/cakephp/app/Console/cake.php(47): ShellDispatcher::run(Array)
#14 {main}[5114] My parent process is gone I guess I am orphaned and will exit now!
[5115] My parent process is gone I guess I am orphaned and will exit now!
[5116] My parent process is gone I guess I am orphaned and will exit now!
[5117] My parent process is gone I guess I am orphaned and will exit now!
[5118] My parent process is gone I guess I am orphaned and will exit now!

I'm wondering why do you try to create a new log handler/resource? Can't you use your Console/Command/Task/LogfileTask.php logger?

For logging stuff I'm opening a different issue with other things in mind.

@nook24
Copy link
Owner

nook24 commented Jan 14, 2016

I'm wondering why do you try to create a new log handler/resource? Can't you use your Console/Command/Task/LogfileTask.php logger?

My plan is to remove this class...

If you want i can enable stderr output as well

@nook24 nook24 mentioned this issue Feb 6, 2016
@nook24 nook24 added the resolved label Feb 6, 2016
@nook24
Copy link
Owner

nook24 commented Feb 6, 2016

Merged into new Version 1.6.0

@nook24 nook24 closed this as completed Feb 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants