Sambamba - missing supervision of thread error conditions / progress #228

nosepy · 2016-06-27T07:43:21Z

Hi,
in connection with issue #214, which is corrected already, I see an other problem with sambamba. There one thread was hanging because of a wrong version of l4z. And the parallel processing did not terminate because of missing supervision of the parallel threads for error conditions. The correction for #214 corrected the immediate problem but did not improve the general sambamba behavior in case of problems of a single or multiple threads.

I assume that sambamba should supervise the threads for error conditions and progress and should terminate all threads immediately in case of an error condition / missing progress in a single thread as this will never lead to a successful sambama run if one of the parts crashes. Or more generally speaking I think appropriate error handling for parallelization means: a parallelized version of a functionality should recognize if one of its parts is not working properly and should terminate the entire functionality immediately in this situation. This makes error situations immediately visible and avoids unnecessary waste of processing resources.

Thank you for your time.

Cheers,

Johannes

pjotrp · 2016-06-27T12:51:39Z

Hi Johannes,

Sambamba has two strategies for parallel processing. For most functionality it uses a D threadpool which behaves correctly when threads fail. It is not supervised, but will behave well on error. See the information on Exceptions in http://dlang.org/phobos/std_parallelism.html#.TaskPool

For mpileup, however, we use a different strategy by running samtool processes. Because it is a simple strategy these processes are not supervised and can misbehave. We had not anticipated this feature to be so successful! If you want to run mpileup correctly it is probably best to use samtools directly and put in your own supervision system (right @lomereiter?)

Say hi to Sepp, Oswaldo and Ulrich from me :).

lomereiter · 2016-06-27T16:35:59Z

Pjotr summarized it well, I'll just add my 2 cents.

The sad reality is, I don't have time resources for considering and handling all possible error conditions. I'm also not an end user of the tool anymore, so I rely on others' bug reports and try to add regression tests where it makes sense. Mpileup in particular is more of a fun experiment, I'm almost surprised that it works. (Multithreading brings quite a bit of headache already, adding multiple processes to the mix complicates the matter even more.)

nosepy · 2016-06-28T08:02:23Z

Thank you for the quick response. I understand your concerns related to the necessary effort and the diversity of error situations.

I thought of something like this: a single supervision process starts the parallelization process which starts its working threads for parallelized processing. The parallelization process keeps track of which threads are active in a form that can be accessed from the supervision process, e.g. through a shared memory segment. If the shell command executed in the worker returns with non-success return code the active thread is marked as failed as indication for the supervisor. For supervision of hanging threads I think it could be sufficient for the supervising process to periodically check the accumulated thread cpu time of active threads as an indication that the thread is still alive.

The main influence to the parallelization process is maintaining the list of active thread PIDs and marking nonsuccessful threads based on the shell return code. The supervising process periodically checks for erronous thread entries signalled from the parallelization thread, checks the thread progress based on thread cpu time and terminates everything in case of error or a hanging thread.

Through this approach it should be possible to keep the supervising process simple and the influence to the parallelization process limited and the supervision independent of the used parallelization model.

pjotrp · 2016-06-28T09:36:15Z

Good idea. Feel free to add it!

pjotrp closed this as completed Feb 24, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sambamba - missing supervision of thread error conditions / progress #228

Sambamba - missing supervision of thread error conditions / progress #228

nosepy commented Jun 27, 2016

pjotrp commented Jun 27, 2016

lomereiter commented Jun 27, 2016

nosepy commented Jun 28, 2016

pjotrp commented Jun 28, 2016

Sambamba - missing supervision of thread error conditions / progress #228

Sambamba - missing supervision of thread error conditions / progress #228

Comments

nosepy commented Jun 27, 2016

pjotrp commented Jun 27, 2016

lomereiter commented Jun 27, 2016

nosepy commented Jun 28, 2016

pjotrp commented Jun 28, 2016