-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prevent users from requesting g.vcf.gz in Spark #4277
Conversation
lbergelson
commented
Jan 26, 2018
- this is currently broken, see HaplotypeCallerSpark doesn't write g.vcf.gz files #4274
- add a check to HaplotypeCallerSpark and VariantSparkSink and throw a clear exception in this case
- added test for GVCF writing in VariantSparkSink which previously didn't exist
- added new UserException.UnimplementedFeature class
- closes prevent running HaplotypeCallerSpark with broken output formats #4275
27e285b
to
bb0a5ba
Compare
Codecov Report
@@ Coverage Diff @@
## master #4277 +/- ##
==============================================
+ Coverage 79.087% 79.147% +0.06%
- Complexity 16612 16707 +95
==============================================
Files 1049 1051 +2
Lines 59580 59808 +228
Branches 9730 9758 +28
==============================================
+ Hits 47120 47336 +216
- Misses 8681 8685 +4
- Partials 3779 3787 +8
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review complete, back to @lbergelson, merge after addressing comments.
|
||
//TODO remove me when https://github.com/broadinstitute/gatk/issues/4274 is fixed | ||
if (writeGvcf && (AbstractFeatureReader.hasBlockCompressedExtension(outputFile) || outputFile.endsWith(IOUtil.BCF_FILE_EXTENSION))) { | ||
throw new UserException.UnimplementedFeature("It is currently not possible to write a compressed g.vcf or bcf.gz on spark. See https://github.com/broadinstitute/gatk/issues/4274 for more details."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this forbid writing any BCFs when in GVCF mode (not just bcf.gz
)? Was this the intent? Both the error message and the ticket #4274 seem incomplete, since they don't mention disallowing g.bcf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the intent, it's currently not possible to write g.bcf at all, compressed or not. I think we are actually compressing bcf though automatically on spark, but it interacts strangely with the .gz codepaths. I've updated the comment and opened a new issue #4303
private static final long serialVersionUID = 0L; | ||
|
||
public UnimplementedFeature(String message){ | ||
super(message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New UserException
subclasses should generally prepend some boilerplate text to the error message in their constructors (eg, "Feature is not currently implemented: %s")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this generally leads to more confusing error messages since you have to fight the existing text to get it to say what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It leads to more consistent error messages. You don't have to "fight" the existing text so much as just look at it before adding your message.
if(hcArgs.emitReferenceConfidence == ReferenceConfidenceMode.GVCF | ||
&& (AbstractFeatureReader.hasBlockCompressedExtension(output) || output.endsWith(IOUtil.BCF_FILE_EXTENSION))) { | ||
throw new UserException.UnimplementedFeature("It is currently not possible to write a compressed g.vcf or bcf.gz from HaplotypeCallerSpark. " + | ||
"See https://github.com/broadinstitute/gatk/issues/4274 for more details."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you repeating this check so that the tool can fail early, or would the check in VariantSparkSink
not catch this?
Either way, my comment above also applies here: Doesn't this forbid writing any BCFs when in GVCF mode (not just bcf.gz)? Was this the intent? Both the error message and the ticket #4274 seem incomplete, since they don't mention disallowing g.bcf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeating it to catch it early, otherwise it will run the entire haplotype caller and then fail. Updated here as well.
@Test(expectedExceptions = UserException.UnimplementedFeature.class) | ||
public void testGVCF_GZ_isDisallowed() throws IOException { | ||
JavaSparkContext ctx = SparkContextFactory.getTestSparkContext(); | ||
VariantsSparkSink.writeVariants(ctx, createTempFile("test",".g.vcf.gz").toString(), null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test that .g.bcf
and .g.bcf.gz
are disallowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added
public String getTestedToolName(){ | ||
return IndexFeatureFile.class.getSimpleName(); | ||
} | ||
}.runCommandLine(new String[]{"-F", output.getAbsolutePath()}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't new Main().instanceMain(args)
simpler here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is but it misses out on some things like setting the default test verbosity.
final String[] args = { | ||
"-I", NA12878_20_21_WGS_bam, | ||
"-R", b37_2bit_reference_20_21, | ||
"-O", createTempFile("testGVCF_GZ_throw_exception", ".g.vcf.gz").getAbsolutePath(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test that .g.bcf
and .g.bcf.gz
are disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* prevent users from requesting g.vcf.gz in Spark * this is currently broken, see #4274 * add a check to HaplotypeCallerSpark and VariantSparkSink and throw a clear exception in this case * added test for GVCF writing in VariantSparkSink which previously didn't exist * added new UserException.UnimplementedFeature class * closes #4275