Implement PathSeq taxon hit scoring in Spark #3406

mwalker174 · 2017-08-03T20:28:11Z

Upgrades PathSeqScoreSpark to perform abundance score calculations on the executors rather than the driver. This was crashing on inputs with a lot of pathogen reads.

This also required some minor changes to the PSPathogenTaxonScore class to be able to keep track of abundance score contributions that come directly from hits to that taxon and those that are from the taxon's descendents.

As a result, some of the test output changed when using bitwise, exact checks on the output. So the tests now check for output equivalence, meaning parsing the scores table, checking that all the taxa are the same, and that the scores are equal to within some defined epsilon.

codecov-io · 2017-08-03T21:08:52Z

Codecov Report

Merging #3406 into master will increase coverage by 0.093%.
The diff coverage is 89.474%.

@@               Coverage Diff               @@
##              master     #3406       +/-   ##
===============================================
+ Coverage     80.497%   80.589%   +0.093%     
- Complexity     17553     17668      +115     
===============================================
  Files           1175      1175               
  Lines          63487     63836      +349     
  Branches        9895      9963       +68     
===============================================
+ Hits           51105     51445      +340     
- Misses          8433      8434        +1     
- Partials        3949      3957        +8

Impacted Files	Coverage Δ	Complexity Δ
...nder/tools/spark/pathseq/PSPathogenTaxonScore.java	`88.235% <87.5%> (-11.765%)`	`3 <2> (+1)`
...itute/hellbender/tools/spark/pathseq/PSScorer.java	`88.77% <90.244%> (-0.825%)`	`60 <20> (+6)`
...tionbiasvariantfilter/OrientationBiasFilterer.java	`95.492% <0%> (-0.064%)`	`80% <0%> (+24%)`
...ools/spark/pathseq/PSFilterArgumentCollection.java	`80% <0%> (ø)`	`3% <0%> (+1%)`	⬆️
...bender/tools/walkers/annotator/FragmentLength.java	`100% <0%> (ø)`	`8% <0%> (+2%)`	⬆️
...e/hellbender/utils/variant/GATKVCFHeaderLines.java	`99.315% <0%> (+0.029%)`	`7% <0%> (ø)`	⬇️
...itute/hellbender/tools/spark/pathseq/PSFilter.java	`93.567% <0%> (+0.95%)`	`33% <0%> (ø)`	⬇️
...e/hellbender/engine/spark/SparkContextFactory.java	`71.296% <0%> (+1.296%)`	`18% <0%> (+8%)`	⬆️
...bender/tools/walkers/annotator/OxoGReadCounts.java	`94.828% <0%> (+1.494%)`	`28% <0%> (+11%)`	⬆️
...tools/walkers/genotyper/AlleleSubsettingUtils.java	`90.062% <0%> (+1.88%)`	`55% <0%> (+13%)`	⬆️
... and 7 more

tedsharpe · 2017-08-08T13:37:31Z

On vacation--won't have a chance to look at this until next week. If Chris or others approve just go for it.

cwhelan

This looks fine to me.

cwhelan · 2017-08-09T17:17:52Z

src/main/java/org/broadinstitute/hellbender/tools/spark/pathseq/PSScorer.java

                final Double score = SCORE_GENOME_LENGTH_UNITS * hit.numMates / (numHits * tree.getLengthOf(taxId));
-                sum += score;
+                //Git list containing this node and its ancestors


Git -> Get?

Implements taxon hit scoring in Spark, updates scoring tool tests

97db2ef

mwalker174 requested review from cwhelan and tedsharpe August 7, 2017 18:21

cwhelan self-assigned this Aug 9, 2017

cwhelan approved these changes Aug 9, 2017

View reviewed changes

Correct misspelling of 'Get'

5ffdb98

mwalker174 merged commit 17147b3 into master Aug 9, 2017

mwalker174 deleted the mw_score_spark branch August 9, 2017 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PathSeq taxon hit scoring in Spark #3406

Implement PathSeq taxon hit scoring in Spark #3406

mwalker174 commented Aug 3, 2017

codecov-io commented Aug 3, 2017 •

edited

Loading

tedsharpe commented Aug 8, 2017

cwhelan left a comment

cwhelan Aug 9, 2017

mwalker174 Aug 9, 2017

Implement PathSeq taxon hit scoring in Spark #3406

Implement PathSeq taxon hit scoring in Spark #3406

Conversation

mwalker174 commented Aug 3, 2017

codecov-io commented Aug 3, 2017 • edited Loading

Codecov Report

tedsharpe commented Aug 8, 2017

cwhelan left a comment

Choose a reason for hiding this comment

cwhelan Aug 9, 2017

Choose a reason for hiding this comment

mwalker174 Aug 9, 2017

Choose a reason for hiding this comment

codecov-io commented Aug 3, 2017 •

edited

Loading