-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Queue completed pileup chunk results, allowing out-of-sequence workers to continue #238
Conversation
|
||
alias Tuple!(size_t, "num", char[], "data") Result; | ||
alias Array!(Result) ResultQueue; | ||
private BinaryHeap!(ResultQueue, "a > b") result_queue_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a.num > b.num"
would be less confusing.
Thanks, looks good. What I worry about is relying on output to be fast enough. An alternative approach could be:
|
Yes, although this appears to be a more scalable setup, and writing immediately to disk doesn't seem to be a problem, in practice anyone with e.g. bgzip straight after could see problems (and indeed in my tests it appears that a slow output pipeline makes issue #237 behaviour worse). This has a trivial workaround though - reduce the thread count to the effective worker thread count from before (e.g. we had 12 with only 4 or 5 active, so reduce to 5). Maybe your idea of checking the queue size is still a good one though to prevent it hurting people by accident - and maybe emit a message to indicate that this likely means you need to optimise the output? |
From interactive use point of view, the workaround is trivial indeed. However, the tool (though not |
to continue We noticed that in general mpileup shows suboptimtal thread utilisation (issue biod#237). Ordering of output results is currently the responsibility of worker threads, so many of the are waiting on the condition variable of ChunkDispatcher. This patch closes biod#237 by: * Queueing results in a min heap. * Writing from queue to file via a separate thread. * Allowing worker threads to process subsequent chunks immediately. * Moderately increasing in memory usage for short queues. We currently see that the output thread cannot keep up with the workers, leading to large increases in memory and runtime due to queueing, even though processing time is decreased and thread utilisation is full. Using a lower thread count would of course alleviate this but the cause is not completely clear. This could be a separate issue.
f758824
to
f169ec7
Compare
Style issues are addressed. I implemented the memory usage check slightly differently (indirectly check via the queue length: simpler and somewhat auto-scaling based on selected buffer size and threads). Existing users with slow output will see little difference in performance but will get log messages indicating that their output is too slow. |
@@ -424,8 +428,12 @@ class ChunkDispatcher(ChunkRange) { | |||
|
|||
void queueResult(size_t num, char[] data) { | |||
synchronized(queue_mutex_) { | |||
while(result_queue_.length > max_queue_length_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a test run, and it stuck here. The condition should also have a clause && num != curr_num_
, otherwise the next chunk to output may arrive too late.
Allowing worker threads to offload completed chunks to an output thread raises the possibility that a slow output (slow device, piped into slow bgzip etc.) can grow the queue indefinitely. If this continues for large input then memory usage will grow extremely large. It is possible to eliminate this effect by reducing the number of worker threads or improving the output speed (e.g. pbgzip) but it's better to log and prevent excessive queueing than to allow it to continue happening indefinitely. This change caps the number of queued chunks to 2 * number of threads.
f169ec7
to
c45c0fd
Compare
Good catch. I ran a test with max length 1 hard-coded and didn't think that I saw anything wrong. Looking back at the logs it looks like it did happen once. Fixed as you suggest and changed the inequality to |
Ok, things look good now. Thanks for the contribution! |
Word of warning: whilst initially testing I was verifying every I was mistaken above; doesn't seem relevant to this PR. I do get a different pileup on every |
Is there a criteria for getting a release for such a change? |
We noticed that in general mpileup shows suboptimtal thread utilisation
(issue #237). Ordering of output results is currently the responsibility
of worker threads, so many of the are waiting on the condition variable
of ChunkDispatcher.
This patch closes #237 by:
We currently see that the output thread cannot keep up with the workers,
leading to large increases in memory and runtime due to queueing, even
though processing time is decreased and thread utilisation is
full. Using a lower thread count would of course alleviate this but the
cause is not completely clear. This could be a separate issue.