-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support SAM normalization #86
Conversation
Codecov Report
@@ Coverage Diff @@
## master #86 +/- ##
==========================================
+ Coverage 82.63% 82.63% +<.01%
==========================================
Files 60 60
Lines 4140 4146 +6
Branches 432 434 +2
==========================================
+ Hits 3421 3426 +5
+ Misses 287 286 -1
- Partials 432 434 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the patch!
I think we need some more changes.
The problems are
- sam->bam or bam->sam cannot be converted by
read-blocks/write-blocks
.cljam.algo.sorter
switches read/write functions according to the combination of input/output formats. - bam normalization can let blocks pass through because
read-blocks
keeps ref-id without referring real RNAME, but we have to update RNAME for sam.
For example, following input
@SQ SN:1 LN:45
@SQ SN:2 LN:40
r003 16 1 29 30 6H5M * 0 0 TAGGC *
r001 163 1 7 30 8M4I4M1D3M = 37 39 TTAGATAAAGAGGATACTG * XX:B:S,12561,2,20,112
r002 0 1 9 30 1S2I6M1P1I1P1I4M2I * 0 0 AAAAGATAAGGGATAAA *
r003 0 1 9 30 5H6M * 0 0 AGCTAA *
x3 0 2 6 30 9M4I13M * 0 0 ttataaaacAAATaattaagtctaca ??????????????????????????
r004 0 1 16 30 6M14N1I5M * 0 0 ATAGCTCTCAGC *
r001 83 1 37 30 9M = 7 -39 CAGCGCCAT *
x1 0 2 1 30 20M * 0 0 aggttttataaaacaaataa ????????????????????
x2 0 2 2 30 21M * 0 0 ggttttataaaacaaataatt ?????????????????????
x4 0 2 10 30 25M * 0 0 CaaaTaattaagtctacagagcaac ?????????????????????????
x6 0 2 14 30 23M * 0 0 Taattaagtctacagagcaacta ???????????????????????
x5 0 2 12 30 24M * 0 0 aaTaattaagtctacagagcaact ????????????????????????
gives this broken output sam with the current implementation.
@SQ SN:chr1 LN:45
@SQ SN:chr2 LN:40
r003 16 1 29 30 6H5M * 0 0 TAGGC *
r001 163 1 7 30 8M4I4M1D3M = 37 39 TTAGATAAAGAGGATACTG * XX:B:S,12561,2,20,112
r002 0 1 9 30 1S2I6M1P1I1P1I4M2I * 0 0 AAAAGATAAGGGATAAA *
r003 0 1 9 30 5H6M * 0 0 AGCTAA *
x3 0 2 6 30 9M4I13M * 0 0 ttataaaacAAATaattaagtctaca ??????????????????????????
r004 0 1 16 30 6M14N1I5M * 0 0 ATAGCTCTCAGC *
r001 83 1 37 30 9M = 7 -39 CAGCGCCAT *
x1 0 2 1 30 20M * 0 0 aggttttataaaacaaataa ????????????????????
x2 0 2 2 30 21M * 0 0 ggttttataaaacaaataatt ?????????????????????
x4 0 2 10 30 25M * 0 0 CaaaTaattaagtctacagagcaac ?????????????????????????
x6 0 2 14 30 23M * 0 0 Taattaagtctacagagcaacta ???????????????????????
x5 0 2 12 30 24M * 0 0 aaTaattaagtctacagagcaact ????????????????????????
We get NPE if we specify bam for output.
In conclusion, we'd better use read-alignments/write-alignments
if either of the input/output formats is sam.
I overlooked that point. Thank you for advice. I'll fix it. |
6839cc1
to
6abf4b0
Compare
LGTM! Thank you 👍 |
No description provided.