You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The document is a little confusion for qlen and rlen.
qlen: no. query bases
rlen: no. reference bases
It took me some time to realize that qlen includes match bases and soft-clipped bases, while rlen includes the intron skip. So, there is no parameters recording the number of matched bases, exclude the soft-clipped and intron-skip region. It is possible to have another parameter for "number of matched bases", say mlen?
Thanks!
The text was updated successfully, but these errors were encountered:
It doesn't sound too hard to add an extra parameter which is based on the CIGAR seq "match" count, but that would be the number of bases with a known alignment coordinate and not strictly matching as it's not possible to tell SNP from REF match using CIGAR alone. Note this would also treat "N" (ref skip) and "D" (deletion) equivalent too, as neither consume sequence bases.
Is that sufficient?
However note that's actually how it already works. It just calls the htslib bam_cigar2qlen and bam_cigar2rlen which is already doing everything you ask for except for also including soft-clips. So potentially it may be better resolved by a function to return the number of soft-clipped bases. That would then allow e.g. qlen - sclen.
Would that be preferable to a new term such as mlen which could be ambiguous: see cigar = vs cigar M opcode.
This is the length of soft-clips, both left and right end.
It may be combined with qlen (qlen-sclen) to obtain the number of
bases in the query sequence that have been aligned to the genome. Ie
it provides a way to compare local-alignment vs global-alignment
length.
Fixessamtools#1436
The document is a little confusion for
qlen
andrlen
.It took me some time to realize that qlen includes match bases and soft-clipped bases, while rlen includes the intron skip. So, there is no parameters recording the number of matched bases, exclude the soft-clipped and intron-skip region. It is possible to have another parameter for "number of matched bases", say
mlen
?Thanks!
The text was updated successfully, but these errors were encountered: