Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing unit test coverage for VariantContextConverter #1276

Merged
merged 2 commits into from
Nov 18, 2016

Conversation

heuermh
Copy link
Member

@heuermh heuermh commented Nov 16, 2016

No description provided.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1617/

Build result: ABORTED

[...truncated 3 lines...]Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prbWiping out workspace first.Cloning the remote Git repositoryCloning repository https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git init /home/jenkins/workspace/ADAM-prb # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/heads/:refs/remotes/origin/ # timeout=15 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10 > /home/jenkins/git2/bin/git config --add remote.origin.fetch +refs/heads/:refs/remotes/origin/ # timeout=10 > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1276/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 603d840 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1276/merge^{commit} # timeout=10Checking out Revision 603d840 (origin/pr/1276/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 603d8409297f5eeba7f30846362c4933efeacaf5First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTouchstone configurations resulted in ABORTED, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@heuermh
Copy link
Member Author

heuermh commented Nov 17, 2016

If I ignore the hanging unit test then I see VCF header-related exceptions

- don't lose any variants when piping as VCF !!! IGNORED !!!
2016-11-16 17:19:15 ERROR Utils:95 - Aborting task
java.lang.IllegalStateException: Key IndelQD found in VariantContext field FILTER at 1:14397 but this key isn't defined in the VCFHeader.  We require all VCFs to have complete VCF headers by default.
    at htsjdk.variant.vcf.VCFEncoder.fieldIsMissingFromHeaderError(VCFEncoder.java:173)
    at htsjdk.variant.vcf.VCFEncoder.getFilterString(VCFEncoder.java:154)
    at htsjdk.variant.vcf.VCFEncoder.encode(VCFEncoder.java:106)
    at htsjdk.variant.variantcontext.writer.VCFWriter.add(VCFWriter.java:222)
    at org.seqdoop.hadoop_bam.VCFRecordWriter.writeRecord(VCFRecordWriter.java:140)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:60)
    at org.seqdoop.hadoop_bam.KeyIgnoringVCFRecordWriter.write(KeyIgnoringVCFRecordWriter.java:38)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply$mcV$sp(PairRDDFunctions.scala:1113)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12$$anonfun$apply$4.apply(PairRDDFunctions.scala:1111)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1119)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1091)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

@fnothaft
Copy link
Member

We'll need #1260 + a bit more to fix the header lines issue...

Copy link
Member

@fnothaft fnothaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That hang is kind of odd, but I have a guess. I might change the tee /dev/null command in VariantContextRDDSuite to tee to a file and see what you're writing out. I'm thinking that what's happening is we're writing a VCF with a header that is missing a FILTER line for the IndelQD filter. When we read that back from the pipe, we are probably getting an IllegalStateException from tribble/htsjdk RE: the header line. I'm guessing then that this is causing the writer hang to exit but while blocking the piping thread pool from shutting down. (Yeah, that's a bug. Sigh!) Can you test this hypothesis? If that looks right on, open an issue and I'll fix the pipe problems.

@@ -152,6 +152,22 @@ class ADAMContextSuite extends ADAMFunSuite {
assert(vcs.size === 6)

val vc = vcs.head

/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all's the same to you, I'd nix this comment.

case (true, true) => vcb.passFilters
}

val somatic: java.lang.Boolean = Option(variant.getSomatic).getOrElse(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd lose the : java.lang.Boolean. Is there a reason you need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it wouldn't compile without it. Odd that the lines above were ok.

@heuermh
Copy link
Member Author

heuermh commented Nov 17, 2016

Yes, I believe that is what is happening with the hang. Teeing to another file results in an empty file.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1631/
Test PASSed.

test("Convert somatic htsjdk site-only SNV to ADAM") {
val converter = new VariantContextConverter

// not sure why this doesn't work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one too.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/1632/
Test PASSed.

@fnothaft fnothaft merged commit e0979a9 into bigdatagenomics:master Nov 18, 2016
@heuermh heuermh deleted the vcc-coverage branch November 18, 2016 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants