Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build up reference information during cigar processing #233

Merged
merged 1 commit into from
May 6, 2014

Conversation

arahuja
Copy link
Contributor

@arahuja arahuja commented Apr 28, 2014

This PR is to track more reference information during the cigar processing.

Currently, RichADAMRecord parses the Cigar string to extract the reference position (as a Long).
This:

  • Also captures the reference base, cigar element, and cigar offset (which will allow for us to not reparse the cigar string when building pileups)
  • Change reference position to be an instance of ReferencePosition (and changes the overlaps function to check other ReferencePosition ensuring a contigName check (resolving RichADAMRecord: referencePosition methods ignore contig #113))

Also, this removes ReferenceLocation in favor of ReferencePosition as they had duplicate functionality

Noted issues:

  • This has overlaps with Remove duplicate mdtag field #230 slightly
  • This still uses the current Cigar processing from samtools, there was some talk of removing this dependency
  • DecadentRead vs RichAdamRecord do we want to merge or distinguish these more?

@@ -180,6 +179,9 @@ case class ReferenceRegion(referenceName: String, start: Long, end: Long) extend
def overlaps(other: ReferenceRegion): Boolean =
referenceName == other.referenceName && end > other.start && start < other.end

def overlaps(other: ReferencePosition): Boolean =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the contains(other: ReferencePosition) method above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not - didn't see that one, will update, thanks!

@fnothaft
Copy link
Member

+1, looks good!

  • This still uses the current Cigar processing from samtools, there was some talk of removing this dependency

I've been speaking with the UCSC folks about this. I'm hoping to have a design document/writeup for this approach by the end of the week.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/306/

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/307/

@nealsid
Copy link

nealsid commented Apr 28, 2014

Sorry to be a commit history nag but can you move some of the PR description over to the git commit message?

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/308/

…processing; remove reference location in favor of reference position
@arahuja
Copy link
Contributor Author

arahuja commented May 6, 2014

Rebased after #230 merged in.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/317/

massie added a commit that referenced this pull request May 6, 2014
Build up reference information during cigar processing
@massie massie merged commit aca27a4 into bigdatagenomics:master May 6, 2014
@massie
Copy link
Member

massie commented May 6, 2014

Thanks, Arun!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants