-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external sort in markdup #123
Comments
It can be improved, it's just that nobody have run it on huge datasets yet. Picard implements external sort for keeping memory footprint within limits, the same should be done here. |
Thanks Artem, |
That's a funny question :-) I don't know of such an engine. As it were, I typed 'parallel' into Google Translate and tried different languages, and suddenly had this stroke of luck. |
aha. Genius. I'm definitely going to try that next time I need a name for something. |
Hi guys, I second the requirement for a more efficient markdup. I currently switch it off for anything more than 50x whole genome coverage because it runs out of memory. Thanks Artem! |
HI Artem, I just want to underline this. We have routinely ~ >150x WSG sample and the sysasmins don't like the resource usage anymore :) It does run with big overflow-list sizes and changed ulimits, but it stressing the systems. Would be great if you find a way to optimize it. |
Hi all, |
In the latest commit in |
Added to v0.6.0 |
Hi,
I'm (still) playing with the duplicate marking on a large human bam (about 100X coverage).
Using version sambamba_02_02_2015 without any parameters I get the "too many open files" error when marking duplicates.
When I apply the "--hash-table-size 1000000" parameters, the run completes correctly, but the first step, "finding positions of the duplicate reads in the file" used 57gigs of RAM, which is too high for us. Is there anything you can recommend to get the RAM usage lower, but still get a successfully created duplicate marked bam?
The text was updated successfully, but these errors were encountered: