-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added AdamContext.referenceLengthFromCigar #263
Added AdamContext.referenceLengthFromCigar #263
Conversation
All automated tests passed. |
* reference genome (i.e. skipping clipping, padding, and insertion operators) | ||
* | ||
* @param cigar The CIGAR string whose reference length is to be measured | ||
* @return A non-negative integer, the some of the MDNX= operators in the CIGAR string. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sum, not some ;)
Looks good, but I think this'd be a better fit in RichADAMRecord, or in a Util class. |
Good call, I'll move it. |
I fixed those two issues, Frank -- moved it to RichADAMRecord, and fixed the spelling mistake :-) |
Can you rebase this? Also, do you still want to change the imports in ADAMContext? |
All automated tests passed. |
All automated tests passed. |
@tdanford can you remove the changes to ADAMContext? Then, I'll merge. |
The new referenceLengthFromCigar method (and its associated ADAMRecord method, 'referenceLength') is an implementation of a method which counts the length of an aligned read against the reference sequence. This is useful because rec.start + rec.referenceLength is a logical "end" coordinate to be used for testing overlap of an ADAMRecord against a reference range.
@fnothaft Done. I don't think the Jenkins build has kicked off yet though... |
Thanks @tdanford! I will wait for the build and merge if it is clean. |
All automated tests passed. |
…mCigar Added AdamContext.referenceLengthFromCigar
Merged! Thanks @tdanford! |
Thanks, Frank! |
Adds a new method which has some utility (for some of our code):
ADAMContext.referenceLengthFromCigar, which calculates the 'length along the reference sequence' of an alignment record from its Cigar string. This is useful because the 'rec.start + referenceLengthFromCigar(rec.cigar)' is a natural 'end' value which can be used to calculate whether an ADAMRecord overlaps a given region along the reference.
From this, we can calculate overlap queries on ADAMRecords without having to go through the (somewhat cumbersome) ReferenceMapping framework.
Thoughts? Does this duplicate code that's already somewhere else in ADAM? Also, I'm not parsing the Cigar strings using Picard but rather defining my own quick-and-dirty regex for them, I don't know how people feel about that.