Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ReadClipper in BaseQualityClipReadTransformer #3388

Merged
merged 3 commits into from
Aug 2, 2017
Merged

Conversation

mwalker174
Copy link
Contributor

The earlier version of BaseQualityClipReadTransformer clipped bases by just removing them from the bases and base qualities arrays, but did not adjust the cigar, coordinate, etc. (thanks to @SHuang-Broad for catching this in #3354).

This commit fixes this behavior by invoking the ReadClipper class and adds cigar checking to the unit test. It also makes some minor style changes to the transformer and test code.

@codecov-io
Copy link

codecov-io commented Jul 31, 2017

Codecov Report

Merging #3388 into master will increase coverage by 0.081%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##              master     #3388       +/-   ##
===============================================
+ Coverage     80.467%   80.549%   +0.081%     
- Complexity     17507     17639      +132     
===============================================
  Files           1173      1175        +2     
  Lines          63376     63749      +373     
  Branches        9878      9963       +85     
===============================================
+ Hits           50997     51349      +352     
- Misses          8431      8443       +12     
- Partials        3948      3957        +9
Impacted Files Coverage Δ Complexity Δ
...r/transformers/BaseQualityClipReadTransformer.java 100% <100%> (ø) 14 <1> (ø) ⬇️
...nstitute/hellbender/utils/clipping/ClippingOp.java 84.365% <100%> (+1.629%) 92 <0> (+3) ⬆️
...rk/pathseq/PSPathogenReferenceTaxonProperties.java 90% <0%> (-10%) 13% <0%> (+12%)
...stitute/hellbender/tools/walkers/vqsr/Tranche.java 62.921% <0%> (-7.349%) 18% <0%> (ø)
.../hellbender/tools/walkers/vqsr/TrancheManager.java 67.347% <0%> (-2.983%) 18% <0%> (+3%)
...ols/walkers/contamination/ContaminationRecord.java 87.302% <0%> (-2.698%) 9% <0%> (+4%)
...nder/transformers/SimpleRepeatMaskTransformer.java 94.286% <0%> (ø) 11% <0%> (?)
...ellbender/transformers/AdapterTrimTransformer.java 92.857% <0%> (ø) 12% <0%> (?)
...ools/spark/pathseq/PSFilterArgumentCollection.java 80% <0%> (+1.429%) 2% <0%> (ø) ⬇️
...s/spark/pathseq/PSBuildReferenceTaxonomyUtils.java 90.541% <0%> (+1.579%) 80% <0%> (+41%) ⬆️
... and 7 more

@mwalker174 mwalker174 closed this Jul 31, 2017
@mwalker174 mwalker174 reopened this Jul 31, 2017
Copy link
Contributor

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments on how to reduce code duplication. Merge at your will. 👍

public GATKRead apply(GATKRead read) {
GATKRead readClippedRightEnd = clipReadRightEnd(read);
public GATKRead apply(final GATKRead read) {
final GATKRead readClippedRightEnd = clipReadRightEnd(read);
return clipReadLeftEnd(readClippedRightEnd);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this exists before this PR, but while you are at it, could you add a documentation to getRightClipPoint(byte[]) and getLeftClipPoint(byte[]) explicitly saying that a negative return value means no clipping necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public GATKRead apply(GATKRead read) {
GATKRead readClippedRightEnd = clipReadRightEnd(read);
public GATKRead apply(final GATKRead read) {
final GATKRead readClippedRightEnd = clipReadRightEnd(read);
return clipReadLeftEnd(readClippedRightEnd);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, while looking at these two functions, they are very similar.
Considering that even though it is public but seemingly only this class itself is using it, I suggest the following change, but it is totally up to you:

    @override
    public GATKRead apply(final GATKRead read) {
        return clipRead(clipRead(read, false), true);
    }

    private int getClipPoint( final byte[] quals, final boolean fromLeft ) {
        final int readLength = quals.length;
        final int start, end, step;
        if (fromLeft) {
            start = 0;              end = readLength; step = 1;
        } else {
            start = readLength - 1; end = -1;         step = -1;
        }
        int clipSum = 0, lastMax = -1, clipPoint = -1; // -1 means no clip
        for (int i = start; i != end; i+= step) {
            clipSum += (qTrimmingThreshold - quals[i]);
            if (clipSum >= 0 && (clipSum >= lastMax)) {
                lastMax = clipSum;
                clipPoint = i;
            }
        }
        return clipPoint;
    }
    
    private GATKRead clipRead(final GATKRead read, final boolean fromLeft) {
        final byte[] quals = read.getBaseQualities();
        final int clipPoint = getClipPoint(quals, fromLeft);
        if (clipPoint != -1) {
            final ReadClipper readClipper = new ReadClipper(read);
            readClipper.addOp(fromLeft ? new ClippingOp(0, clipPoint) : new ClippingOp(clipPoint, read.getLength()));
            return readClipper.clipRead(ClippingRepresentation.HARDCLIP_BASES);
        } else {
            return read;
        }
    }
    

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my original approach, but I think it's easier to have two simple functions instead of one complicated function in this case.

import org.broadinstitute.hellbender.utils.read.GATKRead;

import java.util.Arrays;

/**
* Clips reads on both ends using base quality scores
*/
public class BaseQualityClipReadTransformer implements ReadTransformer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to mention that you probably intended this class to be final too, and its serialVersionUID to be private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - done.

@mwalker174 mwalker174 merged commit 07b9f46 into master Aug 2, 2017
@mwalker174 mwalker174 deleted the mw_quality_clip branch August 2, 2017 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants