Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup in org.bdgenomics.adam.converters package. #1062

Closed
wants to merge 1 commit into from

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Jul 1, 2016

  • Remove unused GenotypesToVariantsConverter.
  • Made all classes package private to [converters] if they had no external references, or package private to [adam] if they were referenced elsewhere in ADAM.
  • Added documentation where necessary.
  • Added SupportedHeaderLines object. This contains code previously in VariantAnnotationConverter that was called in the ADAMVCFOutputFormat. This allows us to make VariantAnnotationConverter package private to converters and allows us to make the AttrKeys class and object private.
  • Additionally, pulled mergeAnnotations out to org.bdgenomics.adam.models.VariantContext, which is also needed to make VariantAnnotationConverter package private to converters. This also cleaned up some code in org.bdgenomics.adam.rdd.variation.VariationRDDFunctions.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1290/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1291/

Build result: FAILURE

[...truncated 24 lines...]Triggering ADAM-prb ? 2.6.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.4.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@fnothaft
Copy link
Member Author

fnothaft commented Jul 1, 2016

Jenkins, retest this please.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1292/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jul 1, 2016

Thanks, I will probably want this merged before I pull request on the VariantAnnotation stuff (which unfortunately has some name collisions with what is there already). Will review this afternoon.

def convert(vc: VariantContext, annotation: DatabaseVariantAnnotation): DatabaseVariantAnnotation = {
/**
* Remaps fields from an htsjdk variant context into a site annotation.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been searching for a better name for DatabaseVariantAnnotation, could VariantSiteAnnotation be it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would +1 that! DatabaseVariantAnnotation is totes clunky.

@heuermh
Copy link
Member

heuermh commented Jul 1, 2016

For #1044 I currently have a method

def createVariantAnnotations(variant: Variant, vc: VariantContext): VariantAnnotation = {
  ...

that pulls the ANN keyed INFO field attribute out of the variant context and populates an avro VariantAnnotation record (which contains an array of TranscriptEffect records).

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

* Remove unused GenotypesToVariantsConverter.
* Made all classes package private to [converters] if they had no external
  references, or package private to [adam] if they were referenced elsewhere
  in ADAM.
* Added documentation where necessary.
* Added SupportedHeaderLines object. This contains code previously in
  VariantAnnotationConverter that was called in the ADAMVCFOutputFormat. This
  allows us to make VariantAnnotationConverter package private to converters
  and allows us to make the AttrKeys class and object private.
* Additionally, pulled `mergeAnnotations` out to
  org.bdgenomics.adam.models.VariantContext, which is also needed to make
  VariantAnnotationConverter package private to converters. This also cleaned up
  some code in org.bdgenomics.adam.rdd.variation.VariationRDDFunctions.
* Rename `BroadVariantContext` to `HtsjdkVariantContext`.
@fnothaft
Copy link
Member Author

fnothaft commented Jul 4, 2016

Addressed comment RE: HtsjdkVariantContext. That's a better name, given the repackaging in htsjdk. VariantContext used to live in org.broadinstitute.variant, IIRC.

@fnothaft
Copy link
Member Author

fnothaft commented Jul 4, 2016

Thanks for bringing the following up, @heuermh:

This could be wrapped in a third convert method in VariantAnnotationConverter, however going back to bigdatagenomics/bdg-formats#41 and related discussion around bigdatagenomics/bdg-formats#67, is it really a good idea to have three different code paths around data associated with Variant? Do we need three different sets of schema records?

I agree. I like having annotation and genotype separate, but I don't think we need two different annotation records (for 3 total paths). Perhaps we can sit down this next week and go through the paths with a fine toothed comb? I hadn't gone through this part of the codebase line-by-line in a while, and it was harrier than I remembered...

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1298/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Jul 7, 2016

LGTM

@heuermh
Copy link
Member

heuermh commented Jul 7, 2016

Merged commit 038680c

Thank you, @fnothaft!

@heuermh heuermh closed this Jul 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants