Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added GetSampleName as stopgap until we have named parameters #3538

Merged
merged 4 commits into from
Aug 31, 2017

Conversation

LeeTL1220
Copy link
Contributor

  • Added new tool called GetSampleName, which will get the sample name from a bam file header.
  • If there is not exactly one sample name in the bam file, this tool will fail, by design.
  • Altered the WDL to use this tool instead of a call to samtools in the M2 WDL.

…tool

Fixing the program group for GetSampleName

M2 WDL formatting changes

Making this tool experimental.

Making sure that failures happen appropriately in the WDL for CollectSequencingArtifactMetrics
@codecov-io
Copy link

codecov-io commented Aug 31, 2017

Codecov Report

Merging #3538 into master will increase coverage by 0.002%.
The diff coverage is 62.5%.

@@               Coverage Diff               @@
##              master     #3538       +/-   ##
===============================================
+ Coverage     80.086%   80.088%   +0.002%     
- Complexity     17766     17773        +7     
===============================================
  Files           1188      1189        +1     
  Lines          64416     64432       +16     
  Branches       10007     10010        +3     
===============================================
+ Hits           51588     51602       +14     
  Misses          8842      8842               
- Partials        3986      3988        +2
Impacted Files Coverage Δ Complexity Δ
...broadinstitute/hellbender/tools/GetSampleName.java 62.5% <62.5%> (ø) 6 <6> (?)
...er/tools/spark/sv/discovery/AlignmentInterval.java 91.525% <0%> (+0.847%) 25% <0%> (+1%) ⬆️
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 76.316% <0%> (+1.974%) 38% <0%> (ø) ⬇️

@LeeTL1220
Copy link
Contributor Author

LeeTL1220 commented Aug 31, 2017

@davidbenjamin Just added an additional automated test thanks to @sooheelee , but please ignore commit message.

Copy link
Contributor

@davidbenjamin davidbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve provided that GATKTool is more appropriate than CommandLineProgram.

programGroup = QCProgramGroup.class
)
@BetaFeature
final public class GetSampleName extends GATKTool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make this a CommandLineProgram? BTW, if the answer is that emulating getHeaderForReads would be too cumbersome, I accept it with no further argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidbenjamin That would be my answer. I'd be rewriting that code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No action

@LeeTL1220 LeeTL1220 merged commit 55fa8b6 into master Aug 31, 2017
@LeeTL1220 LeeTL1220 deleted the ll_GetSampleName branch August 31, 2017 15:19
}

@Override
public boolean requiresReads() {return true;}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even trivial methods should be formatted on different lines. e.g.

public boolean requiresReads() {
    return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshapiro4broad Note that I just let intelliJ write that...

throw new UserException.BadInput("The given input bam has no header or no read groups. Cannot determine a sample name.");
}

final List<String> sampleNames = getHeaderForReads().getReadGroups().stream().map(s -> s.getSample()).distinct().collect(Collectors.toList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be simplified slightly if you collect as Set instead of List, which will let you avoid distinct(). Also I would prefer SAMReadGroupRecord::getSample over s -> s.getSample() for readability even though the lambda form is fewer characters.

throw new UserException.BadInput("The given bam input has no sample names.");
}

try (final FileWriter fileWriter = new FileWriter(outputSampleNameFile, false)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Files is more concise, e.g.

try {
    Files.write(outputSampleNameFile.toPath(), sampleNames.get(0).getBytes());
} catch (final IOException e) {
    ...

Changing outputSampleNameFile to be declared as Path would make this shorter as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pshapiro4broad Thanks, but get me before I merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants