Correction for hanging dump thread - #242 #247

nosepy · 2016-09-21T15:05:59Z

On specific BAM sets the output thread was hanging on an empty queue after dumping all chunks successfully dependent on load and thread scheduling.

The correction is based on using the total number of chunks as termination condition for the dump thread instead of the queue content and an indication for running workers. To avoid a hanging queuing thread in case of exceptions and unnecessary delay of program termination the join conditions for the workers and the dump thread were moved from the exit scope to the success branch at the end of the main function.

lomereiter · 2016-09-21T21:48:59Z

Many thanks for your debugging effort!

sambrightman · 2016-09-26T12:36:21Z

This still looks potentially race-y to me:

what stops the worker grabbing the final chunk and increasing total_num_ just before dumpFinished is called again?
can the empty condition trigger early if the BAM reading is slower than pileup? I think probably not, but remember reading some code somewhere that looked like this could happen.

nosepy · 2016-09-28T16:14:11Z

For access to curr_num a synchronization is not necessary in dumpFinished because it is only written by the queue thread (under queue lock) and parallel read/write access is not possible in one queue thread. The read access in the worker in queueResult happens within a queue lock.

Yes, there could be a parallel access to total_num_ when the worker is setting it and the queuing thread is checking it at the same time which in rare cases could lead to reading an inconsistent value. I assumed that a 4 byte access to an integer is always atomic both for read and write, but it seems that this is not always ensured dependent on hardware architecture and alignment of the integer. So some kind of synchronization or atomic access is needed. The writing in the worker happens within the mutex_ lock. Read access in dumpFinished can also be done with a mutex_ lock but likely a CAS (CompareAndSwap) based atomic read/write (cas in core/atomic.d) would be better because of reduced dependency of worker and queue thread. @lomereiter What is your opinion?

Related to the second topic:
A chunk empty condition before the entire reading of the BAM would lead to premature termination of some or all worker threads. If this can happen, this could lead to incomplete pileup processing and would definitely be an error in Sambamba independent of any queuing handling. (Processing of the entire file must happen independent of parallelization approaches and speeds of parallel running software units.) So if chunk empty condition can occur before the entire BAM is partitioned into chunks I think another issue should be opened.

lomereiter · 2016-09-28T18:40:57Z

@nosepy I also wouldn't worry about 32-bit integer reads/writes on any of the common architectures. Variables are aligned anyway, unaligned access can happen only in case of manual heap allocation (GC.malloc returns aligned blocks). Using atomicLoad/atomicStore would perhaps convey the intention better, but practically it shouldn't make any difference.

I don't understand what it means for empty condition to be triggered 'early' since it's triggered on every chunk arrival, and exit is determined by nextChunk returning null.

All in all this approach looks correct to me.

sambrightman · 2016-09-30T09:15:04Z

Off-by-one error in my thinking on this one, agree it looks fine.
I had a recollection that the ChunkRange was being filled in parallel by reader threads and it could be .empty mid-run. You are right that if this happens, you have other problems anyway.

Correction for hanging output thread - biod#242

6c0f3c3

lomereiter merged commit 50102d4 into biod:master Sep 21, 2016

lomereiter mentioned this pull request Sep 28, 2016

sambamba mpileup stop working #249

Closed

nosepy mentioned this pull request Oct 3, 2016

mpileup - hang occurs often #242

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correction for hanging dump thread - #242 #247

Correction for hanging dump thread - #242 #247

nosepy commented Sep 21, 2016

lomereiter commented Sep 21, 2016

sambrightman commented Sep 26, 2016

nosepy commented Sep 28, 2016 •

edited

Loading

lomereiter commented Sep 28, 2016

sambrightman commented Sep 30, 2016

Correction for hanging dump thread - #242 #247

Correction for hanging dump thread - #242 #247

Conversation

nosepy commented Sep 21, 2016

lomereiter commented Sep 21, 2016

sambrightman commented Sep 26, 2016

nosepy commented Sep 28, 2016 • edited Loading

lomereiter commented Sep 28, 2016

sambrightman commented Sep 30, 2016

nosepy commented Sep 28, 2016 •

edited

Loading