Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PathSeq taxon hit scoring in Spark #3406

Merged
merged 2 commits into from
Aug 9, 2017
Merged

Conversation

mwalker174
Copy link
Contributor

Upgrades PathSeqScoreSpark to perform abundance score calculations on the executors rather than the driver. This was crashing on inputs with a lot of pathogen reads.

This also required some minor changes to the PSPathogenTaxonScore class to be able to keep track of abundance score contributions that come directly from hits to that taxon and those that are from the taxon's descendents.

As a result, some of the test output changed when using bitwise, exact checks on the output. So the tests now check for output equivalence, meaning parsing the scores table, checking that all the taxa are the same, and that the scores are equal to within some defined epsilon.

@codecov-io
Copy link

codecov-io commented Aug 3, 2017

Codecov Report

Merging #3406 into master will increase coverage by 0.093%.
The diff coverage is 89.474%.

@@               Coverage Diff               @@
##              master     #3406       +/-   ##
===============================================
+ Coverage     80.497%   80.589%   +0.093%     
- Complexity     17553     17668      +115     
===============================================
  Files           1175      1175               
  Lines          63487     63836      +349     
  Branches        9895      9963       +68     
===============================================
+ Hits           51105     51445      +340     
- Misses          8433      8434        +1     
- Partials        3949      3957        +8
Impacted Files Coverage Δ Complexity Δ
...nder/tools/spark/pathseq/PSPathogenTaxonScore.java 88.235% <87.5%> (-11.765%) 3 <2> (+1)
...itute/hellbender/tools/spark/pathseq/PSScorer.java 88.77% <90.244%> (-0.825%) 60 <20> (+6)
...tionbiasvariantfilter/OrientationBiasFilterer.java 95.492% <0%> (-0.064%) 80% <0%> (+24%)
...ools/spark/pathseq/PSFilterArgumentCollection.java 80% <0%> (ø) 3% <0%> (+1%) ⬆️
...bender/tools/walkers/annotator/FragmentLength.java 100% <0%> (ø) 8% <0%> (+2%) ⬆️
...e/hellbender/utils/variant/GATKVCFHeaderLines.java 99.315% <0%> (+0.029%) 7% <0%> (ø) ⬇️
...itute/hellbender/tools/spark/pathseq/PSFilter.java 93.567% <0%> (+0.95%) 33% <0%> (ø) ⬇️
...e/hellbender/engine/spark/SparkContextFactory.java 71.296% <0%> (+1.296%) 18% <0%> (+8%) ⬆️
...bender/tools/walkers/annotator/OxoGReadCounts.java 94.828% <0%> (+1.494%) 28% <0%> (+11%) ⬆️
...tools/walkers/genotyper/AlleleSubsettingUtils.java 90.062% <0%> (+1.88%) 55% <0%> (+13%) ⬆️
... and 7 more

@tedsharpe
Copy link
Contributor

On vacation--won't have a chance to look at this until next week. If Chris or others approve just go for it.

@cwhelan cwhelan self-assigned this Aug 9, 2017
Copy link
Member

@cwhelan cwhelan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me.

final Double score = SCORE_GENOME_LENGTH_UNITS * hit.numMates / (numHits * tree.getLengthOf(taxId));
sum += score;
//Git list containing this node and its ancestors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Git -> Get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@mwalker174 mwalker174 merged commit 17147b3 into master Aug 9, 2017
@mwalker174 mwalker174 deleted the mw_score_spark branch August 9, 2017 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants