Releases: brentp/mosdepth
fix for d4
fix for bams/crams with *many* contigs
support cram v 3.1
v0.3.7
- support CRAM v3.1. only updates htslib that binary is built with to v1.19.1 (#224 thanks @adthrasher for reporting and testing)
filter on fragment length
v0.3.6
- allow filtering on fragment length thanks @LudvigOlsen for implementing! (#214)
- fix bug where empty chromosomes are not reported as having 0 depth (#216)
The new optional arguments:
-l --min-frag-len <min-frag-len> minimum insert size. reads with a smaller insert size than this are ignored [default: -1]
-u --max-frag-len <max-frag-len> maximum insert size. reads with a larger insert size than this are ignored. [default: -1]
summary min_depth on regions
v0.3.5
- fix bug with summary min for regions (#207 thanks to Xavier for supplying test-case)
v0.3.4
custom index location
v0.3.3
- allow specifying a custom index by passing '/path/to/bam##idx##/other-path/to/index.bai'
readgroups fix
when using read-groups, there was an intermittent error that would sometimes skip reads.
thanks @chrisamiller for reporting and providing a test-case.
bugfix
D4 support!
This release adds support for writing d4 files. See Aaron's poster here
d4 is awesome
d4
is a toolset and format written by Hao Hou from the Quinlan Lab.
mosdepth
provides many options while calculating depth because it is slow to re-parse the per-base.bed.gz files. In
many cases, it's faster to re-parse a cram file than to scan large regions from the per-base bed files. In addition, writing per-base.bed.gz has always been a bottleneck in mosdepth even after it was optimized some in last release.
This release has a static d4utils binary for linux below that will allow users to manipulate d4 files.
d4 is much faster to write:
Here are mosdepth run times on a smallish cram test-case:
- mosdepth without per-base: 5.9s
- mosdepth with per-base bed.gz: 24.8s
- mosdepth with per-base d4: 7.7s
Note that using d4
output greatly mitigates the cost of writing the per-base output.
With d4 mosdepth can write per-base output for a 23X CRAM in 2m15s
d4 output is much more useful.
Once the d4 file is created, it is much faster to access. d4 includes command line utilities to view, get stats, and manipulate d4 files. These eventually will replace much of the functionality in mosdepth like quantize
, histogram (dist.txt)
, regions.bed.gz
etc since the operations are so fast.
why not bigwig
I made several pull requests to Devon Ryan's excellent BigWig library to improve speed and attempt to reduce memory usage: #41, #42, #43.
I also wrote a bigwig library for nim that uses libBigWig and used that to prototype bigwig output for mosdepth
. However, bigwig output dramatically increased the memory usage in mosdepth
such that it was not viable.
We will show in the coming manuscript (and see the poster) that d4
is much faster to create and use than bigwig
and results in smaller file sizes.