Develop branch: Error in rule vardict_tumor_normal #1286

ivadym · 2023-10-17T09:36:17Z

Describe the bug
When processing variants in specific regions a critical exception occurred: java.lang.IndexOutOfBoundsException with a negative bitIndex value.

After restarting the analysis, it succeeds.

If workflow, which rules

vardict_tumor_normal (balsamic TGA-TN and balsamic-umi TN)

Screenshots

There was Exception while processing variant on 55271360 on region 7:55271084-55271374. The processing will be continued from the next variant.
java.lang.IndexOutOfBoundsException: bitIndex < 0: -2039770131
	at java.util.BitSet.get(BitSet.java:623)
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:281)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.VariationRealigner.noPassingReads(VariationRealigner.java:2060)
	at com.astrazeneca.vardict.modules.VariationRealigner.realigndel(VariationRealigner.java:617)
	at com.astrazeneca.vardict.modules.VariationRealigner.realignIndels(VariationRealigner.java:387)
	at com.astrazeneca.vardict.modules.VariationRealigner.process(VariationRealigner.java:88)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:49)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.combineAnalysis(SomaticPostProcessModule.java:404)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.printVariationsFromFirstSample(SomaticPostProcessModule.java:198)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.callingForBothSamples(SomaticPostProcessModule.java:143)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:87)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:28)
	at java.util.concurrent.CompletableFuture.biAccept(CompletableFuture.java:1190)
	at java.util.concurrent.CompletableFuture.biAcceptStage(CompletableFuture.java:1206)
	at java.util.concurrent.CompletableFuture.thenAcceptBoth(CompletableFuture.java:2056)
	at com.astrazeneca.vardict.modes.SomaticMode.processBothBamsInPipeline(SomaticMode.java:125)
	at com.astrazeneca.vardict.modes.SomaticMode.access$000(SomaticMode.java:31)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:96)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:79)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
There was Exception while processing variant on 55271360 on region 7:55271084-55271374. The processing will be continued from the next variant.
java.lang.IndexOutOfBoundsException: bitIndex < 0: -2039770131
	at java.util.BitSet.get(BitSet.java:623)
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:281)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.VariationRealigner.noPassingReads(VariationRealigner.java:2060)
	at com.astrazeneca.vardict.modules.VariationRealigner.realigndel(VariationRealigner.java:617)
	at com.astrazeneca.vardict.modules.VariationRealigner.realignIndels(VariationRealigner.java:387)
	at com.astrazeneca.vardict.modules.VariationRealigner.process(VariationRealigner.java:88)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:49)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.combineAnalysis(SomaticPostProcessModule.java:404)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.printVariationsFromFirstSample(SomaticPostProcessModule.java:198)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.callingForBothSamples(SomaticPostProcessModule.java:143)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:87)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:28)
	at java.util.concurrent.CompletableFuture.biAccept(CompletableFuture.java:1190)
	at java.util.concurrent.CompletableFuture.biAcceptStage(CompletableFuture.java:1206)
	at java.util.concurrent.CompletableFuture.thenAcceptBoth(CompletableFuture.java:2056)
	at com.astrazeneca.vardict.modes.SomaticMode.processBothBamsInPipeline(SomaticMode.java:125)
	at com.astrazeneca.vardict.modes.SomaticMode.access$000(SomaticMode.java:31)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:96)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:79)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Version (please complete the following information):
balsamic --version develop

Additional context
Previously we had a similar issue: #1271

The text was updated successfully, but these errors were encountered:

ivadym · 2023-11-01T08:05:13Z

We have also vardict failing in tumor-only now:

Critical exception occurs on region: 19:6827911-6829086, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:618)
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:49)
	at com.astrazeneca.vardict.modes.SimpleMode.processBamInPipeline(SimpleMode.java:103)
	at com.astrazeneca.vardict.modes.SimpleMode.access$000(SimpleMode.java:29)
	at com.astrazeneca.vardict.modes.SimpleMode$VardictWorker.call(SimpleMode.java:86)
	at com.astrazeneca.vardict.modes.SimpleMode$VardictWorker.call(SimpleMode.java:72)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:297)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.VariationRealigner.noPassingReads(VariationRealigner.java:2060)
	at com.astrazeneca.vardict.modules.VariationRealigner.realigndel(VariationRealigner.java:617)
	at com.astrazeneca.vardict.modules.VariationRealigner.realignIndels(VariationRealigner.java:387)
	at com.astrazeneca.vardict.modules.VariationRealigner.process(VariationRealigner.java:88)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	... 11 more
[Tue Oct 31 19:04:24 2023]
Error in rule vardict_tumor_only:

There was Exception while processing variant on 112154530 on region 5:112154487-112155216. The processing will be continued from the next variant.
java.lang.IllegalArgumentException
	at java.nio.Buffer.position(Buffer.java:244)
	at htsjdk.samtools.MemoryMappedFileBuffer.skipBytes(MemoryMappedFileBuffer.java:49)
	at htsjdk.samtools.AbstractBAMFileIndex.skipBytes(AbstractBAMFileIndex.java:451)
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:291)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.VariationRealigner.noPassingReads(VariationRealigner.java:2060)
	at com.astrazeneca.vardict.modules.VariationRealigner.realigndel(VariationRealigner.java:617)
	at com.astrazeneca.vardict.modules.VariationRealigner.realignIndels(VariationRealigner.java:387)
	at com.astrazeneca.vardict.modules.VariationRealigner.process(VariationRealigner.java:88)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:49)
	at com.astrazeneca.vardict.modes.SimpleMode.processBamInPipeline(SimpleMode.java:103)
	at com.astrazeneca.vardict.modes.SimpleMode.access$000(SimpleMode.java:29)
	at com.astrazeneca.vardict.modes.SimpleMode$VardictWorker.call(SimpleMode.java:86)
	at com.astrazeneca.vardict.modes.SimpleMode$VardictWorker.call(SimpleMode.java:72)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

ivadym · 2023-11-15T10:24:20Z

Errors appearing in the integration tests:

TN-WGS

Critical exception occurs on region: 10:43604378-43612279, program will be stopped.
java.util.concurrent.CompletionException: java.lang.IllegalArgumentException
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)
	at com.astrazeneca.vardict.collection.DirectThreadExecutor.execute(DirectThreadExecutor.java:8)
	at java.util.concurrent.CompletableFuture.asyncSupplyStage(CompletableFuture.java:1618)
	at java.util.concurrent.CompletableFuture.supplyAsync(CompletableFuture.java:1843)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:46)
	at com.astrazeneca.vardict.modes.SomaticMode.processBothBamsInPipeline(SomaticMode.java:122)
	at com.astrazeneca.vardict.modes.SomaticMode.access$000(SomaticMode.java:31)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:96)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:79)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalArgumentException
	at java.nio.Buffer.position(Buffer.java:244)
	at htsjdk.samtools.MemoryMappedFileBuffer.skipBytes(MemoryMappedFileBuffer.java:49)
	at htsjdk.samtools.AbstractBAMFileIndex.skipBytes(AbstractBAMFileIndex.java:451)
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:291)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.RecordPreprocessor.nextReader(RecordPreprocessor.java:76)
	at com.astrazeneca.vardict.modules.RecordPreprocessor.<init>(RecordPreprocessor.java:50)
	at com.astrazeneca.vardict.modules.SAMFileParser.process(SAMFileParser.java:14)
	at com.astrazeneca.vardict.modes.AbstractMode.lambda$pipeline$0(AbstractMode.java:47)
	at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
	... 12 more

UMI TN-WGS

java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
	at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:618)
	at java.util.concurrent.CompletableFuture.uniApplyStage(CompletableFuture.java:628)
	at java.util.concurrent.CompletableFuture.thenApply(CompletableFuture.java:1996)
	at com.astrazeneca.vardict.modes.AbstractMode.pipeline(AbstractMode.java:49)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.combineAnalysis(SomaticPostProcessModule.java:404)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.printVariationsFromFirstSample(SomaticPostProcessModule.java:198)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.callingForBothSamples(SomaticPostProcessModule.java:143)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:87)
	at com.astrazeneca.vardict.postprocessmodules.SomaticPostProcessModule.accept(SomaticPostProcessModule.java:28)
	at java.util.concurrent.CompletableFuture.biAccept(CompletableFuture.java:1190)
	at java.util.concurrent.CompletableFuture.biAcceptStage(CompletableFuture.java:1206)
	at java.util.concurrent.CompletableFuture.thenAcceptBoth(CompletableFuture.java:2056)
	at com.astrazeneca.vardict.modes.SomaticMode.processBothBamsInPipeline(SomaticMode.java:125)
	at com.astrazeneca.vardict.modes.SomaticMode.access$000(SomaticMode.java:31)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:96)
	at com.astrazeneca.vardict.modes.SomaticMode$SomdictWorker.call(SomaticMode.java:79)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
	at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:297)
	at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:61)
	at htsjdk.samtools.BAMFileReader.getFileSpan(BAMFileReader.java:935)
	at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:952)
	at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:612)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:533)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:538)
	at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.queryOverlapping(SamReader.java:400)
	at com.astrazeneca.vardict.data.SamView.<init>(SamView.java:21)
	at com.astrazeneca.vardict.modules.VariationRealigner.noPassingReads(VariationRealigner.java:2060)
	at com.astrazeneca.vardict.modules.VariationRealigner.realigndel(VariationRealigner.java:617)
	at com.astrazeneca.vardict.modules.VariationRealigner.realignIndels(VariationRealigner.java:387)
	at com.astrazeneca.vardict.modules.VariationRealigner.process(VariationRealigner.java:88)
	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:616)
	... 19 more

mathiasbio · 2023-11-15T10:47:37Z

In this issue which sounds similar AstraZeneca-NGS/VarDict#167 the author says resorting the bamfile would solve their issue. I'll make some tests!

ivadym · 2023-11-15T10:58:50Z

I'm not sure how transferable are the issues in VarDict to VarDictJava though 🤔

mathiasbio · 2023-11-15T11:05:35Z

I don't know either, but it is one of the areas of the analysis where there may have been some changes between the previous version of balsamic and the current release, with the removal of concatenation and parallel alignment. Could be worth a try!

mathiasbio · 2023-11-23T17:04:55Z

I have tried 3 different things now, independently. TLDR: Fiddled with the bamfiles - still confused - still using a lot more vmem than in production

I tried adding sort by position: Add resort bam #1331

Ran the same case 6 times, and once it failed vardict

Critical exception occurs on region: 1:120461884-120462174, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)

Me and @ivadym looked at the benchmarks of the VarDict rule and saw that in some cases the virtual memory had increased by roughly 150X since previous validation. This seemed like a likely explanation behind this random error, that maybe sometimes the rules which took up an unexpected amount of memory was put on an already heavily loaded node and then crashed. (though one could have hoped for more informative errors)

Since the only thing that seemed to have changed since last validation was the bam-files I decided to try test some of the post-processing steps that exists in the production version:

I tried removing the extra read groups that was added during parallel alignment using:

singularity exec --bind /home align_qc.sif picard -Xmx150g AddOrReplaceReadGroups -RGPU ILLUMINAi -RGID NORMAL -RGSM NORMAL -RGPL ILLUMINAi -RGLB ILLUMINAi -MAX_RECORDS_IN_RAM 1000000 -CREATE_INDEX true -CREATE_MD5_FILE true -INPUT normal.ACC.dedup_sorted.bam -OUTPUT normal.ACC.dedup_sorted_rg.bam

But the max_vms from the benchmarks looks similar...

I tried adding FixMate step.

This apparently also does:

INFO    2023-11-23 14:04:38     FixMateInformation      Sorting input into queryname order.
INFO    2023-11-23 14:10:54     FixMateInformation      Sorting by queryname complete.
INFO    2023-11-23 14:10:54     FixMateInformation      Output will be sorted by coordinate
INFO    2023-11-23 14:10:54     FixMateInformation      Traversing query name sorted records and fixing up mate pair information.
INFO    2023-11-23 14:10:54     FixMateInformation      Seen many non-increasing record positions. Printing Read-names as well.

And Sorting input into queryname order. seemed significant, as maybe VarDict performs better if the reads are in queryname rather than position. (this is also interesting with regards to downstream tools in WGS if this is what we're doing in production, maybe it slows down analysis substantially as it's mostly standard to sort by position I think).

Anyway...I ran this (similar to what we run in production):
singularity exec --bind /home align_qc.sif picard -Xmx150g FixMateInformation -ADD_MATE_CIGAR true -CREATE_INDEX true -CREATE_MD5_FILE true -INPUT normal.ACC.dedup_sorted.bam -OUTPUT normal.ACC.dedup_sorted_matefix2.bam

But again...the max_vms from VarDict looks the same.

fevac · 2023-11-23T17:10:08Z

I guess there are no changes in numbers of cpus or tmp folder being used?

fevac · 2023-11-23T17:13:29Z

sorry, not sure I follow everything, is the conclusion that the issue is in the bam files? all the rest is the same?
but also, does it complete successfully after retrying? or is it failing consistently?

mathiasbio · 2023-11-24T06:30:22Z

I didn't check the CPUs, or the tmp folders. Basically just saw the greatly increased memory and it seemed like a likely culprit. Basically the error seems random, and it also completes when retrying, so not failing consistently. I still wonder if it has something to do with the bamfiles but I'm not sure how since I have tested most things that seemed relevant in production:

Removal of unmapped reads (not tested)
1 read group in bam (tested, no effect)
sort bam by query in fixmate (tested, no effect)
add mate MQ info from fixmate (tested, no effect)
concatenate fastqs before...(not tested)

I think I'll try concatenating the fastqs just as a sanity check

fevac · 2023-11-24T07:52:33Z

but it's weird that is not consistent and finishes when retrying. Is it related to specific nodes?

mathiasbio · 2023-11-24T08:22:34Z

We checked that a little too, but maybe not as extensively as we should, but it seems to occur more frequently on node2 and 4, but we have seen other nodes as well.

mathiasbio · 2023-11-24T09:29:36Z

Tried concatenating fastq inputs before starting. Same increased vmem, and multiple vardict errors:

Critical exception occurs on region: 16:20237573-20237774, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access operation in compiled Java code
        at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:273)
        at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:280)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1606)

on node 13, 21, 21 (again), and 13 again

mathiasbio · 2023-11-24T09:47:10Z

comparing benchmarks all these metrics is increased from recent validation to my implement parallel alignment branch:

max_rss: Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms: Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process.
max_uss: Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss: “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)

To give some values. Average values for all VarDict jobs from recent validation (it's run 1 time per chromosome)
max_rss = 8,7812
max_vms = 217,7804
max_uss = 7,3076
max_pss = 7,7916
Compared to same case run in my old implement parallel alignment branch:
max_rss = 9430,9336
max_vms = 36240,0828
max_uss = 9383,656
max_pss = 9386,9888

ivadym · 2023-11-24T14:07:22Z

A production environment in Stage behaves as expected: vardict doesn't fail, and the memory is consistent with what we see in production.

I've also increased the Java heap space to 64G (Xmx64G) and the number of cores:

acetuna: completed succesfully
setamoeba: completed sucessfully

unitedbeagle: failed

Critical exception occurs on region: 20:39794721-39795011, program will be stopped.
java.util.concurrent.CompletionException: java.lang.InternalError: a fault occurred in a recent unsafe memory access 
operation in compiled Java code

savinghorse: failed

There was Exception while processing variant on 407630 on region 6:407355-407699. The processing will be continued 
from the next variant.
java.lang.IndexOutOfBoundsException: bitIndex < 0: -1588120723

We're getting two different errors, and the IndexOutOfBoundsException, which we though was solved in #1271

mathiasbio · 2023-11-24T15:15:52Z

Apparently there's no evidence that we have increased the memory usage in VarDict in the new release, instead the differences in numbers between current production and the release version is due to us updating Snakemake which solved this bug: snakemake/snakemake#1671 where benchmark stats were not accurately reported from jobs using singularity...

So back to square one with the VarDict issue.

ivadym · 2023-11-24T16:08:43Z

Excluding the container as the cause of this issue, vardict also fails when using a varcall_py3 container from production

mathiasbio · 2023-11-25T12:17:55Z

Yes I saw that too. I also tried:

using an old bamfile from the previous validation, but I only ran it once, but that worked (however checking the benchmarks using this old file showed similar mem values as when using a bamfile from the release-version).
I tried reverting the workflow back to the production, at least partially, by reverting to doing concatenation, using bwa-mem, and picard markduplicates. But the error remains there. I'll try next adding the final steps of the post-processing and run again.

mathiasbio · 2023-11-27T15:12:47Z

Summary:

Vardicts fails on bamfiles from:

Sentieon, Dedup, Sort (roughly 1 in 6)
Sentieon, Dedup, Sort + FixMate (1/1)
Sentieon, Dedup, Sort (on concatenated files) (1/1)
Sentieon, Dedup, Sort (using current production container for VarDict) (roughly 1 in 6)
Concatenate, Bwa mem, Picard MarkDups (roughly 2 / 6)
Concatenate, Bwa mem, Picard MarkDups, FixMate, remove unmapped, AddRG (roughly 1/6) (BASICALLY BACK TO PRODUCTION)
Sentieon, Dedup, Sort, VarDict with increased memory and threads (3/6)

Succeeds:

Resubmitting case on previous validation bamfile (2/2)
Sentieon, Dedup, Sort + AddRG (5/5)
Sentieon, Dedup, Sort, AddRG, VarDict with increased memory and threads (12/12) fix: add postprocess bam and memory to vardict #1332

mathiasbio · 2023-11-28T10:26:48Z

At the moment there is no evidence that removing unmapped reads helps in this issue. So maybe this can be removed in a future cleanup PR when things are less hectic.

fevac · 2023-11-28T10:28:36Z

Agree. Amazing job @mathiasbio ! 🌟

ivadym · 2023-11-30T15:26:38Z

❤️ 🎖️ @mathiasbio

Closing! #1332

ivadym added Bug Something isn't working Urgency Large labels Oct 17, 2023

ivadym added this to BALSAMIC Oct 17, 2023

github-project-automation bot moved this to Todo in BALSAMIC Oct 17, 2023

ivadym moved this from Todo to Planned in BALSAMIC Oct 17, 2023

ivadym added this to the Release 13 milestone Oct 24, 2023

ivadym mentioned this issue Nov 13, 2023

feat: Release v13.0.0 #1320

Merged

ivadym pinned this issue Nov 24, 2023

mathiasbio mentioned this issue Nov 28, 2023

fix: add postprocess bam and memory to vardict #1332

Merged

6 tasks

ivadym closed this as completed Nov 30, 2023

github-project-automation bot moved this from Planned to Completed in BALSAMIC Nov 30, 2023

ivadym unpinned this issue Dec 1, 2023

ivadym moved this from Completed to Archived in BALSAMIC Dec 6, 2023

mathiasbio mentioned this issue Dec 9, 2024

[Bug] VarDict fails AGAIN with Critical exception occurs on region #1511

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop branch: Error in rule vardict_tumor_normal #1286

Develop branch: Error in rule vardict_tumor_normal #1286

ivadym commented Oct 17, 2023

ivadym commented Nov 1, 2023

ivadym commented Nov 15, 2023

mathiasbio commented Nov 15, 2023

ivadym commented Nov 15, 2023

mathiasbio commented Nov 15, 2023

mathiasbio commented Nov 23, 2023

fevac commented Nov 23, 2023

fevac commented Nov 23, 2023 •

edited

Loading

mathiasbio commented Nov 24, 2023

fevac commented Nov 24, 2023

mathiasbio commented Nov 24, 2023

mathiasbio commented Nov 24, 2023 •

edited

Loading

mathiasbio commented Nov 24, 2023

ivadym commented Nov 24, 2023

mathiasbio commented Nov 24, 2023

ivadym commented Nov 24, 2023

mathiasbio commented Nov 25, 2023

mathiasbio commented Nov 27, 2023 •

edited

Loading

mathiasbio commented Nov 28, 2023

fevac commented Nov 28, 2023

ivadym commented Nov 30, 2023

Develop branch: Error in rule vardict_tumor_normal #1286

Develop branch: Error in rule vardict_tumor_normal #1286

Comments

ivadym commented Oct 17, 2023

ivadym commented Nov 1, 2023

ivadym commented Nov 15, 2023

TN-WGS

UMI TN-WGS

mathiasbio commented Nov 15, 2023

ivadym commented Nov 15, 2023

mathiasbio commented Nov 15, 2023

mathiasbio commented Nov 23, 2023

fevac commented Nov 23, 2023

fevac commented Nov 23, 2023 • edited Loading

mathiasbio commented Nov 24, 2023

fevac commented Nov 24, 2023

mathiasbio commented Nov 24, 2023

mathiasbio commented Nov 24, 2023 • edited Loading

mathiasbio commented Nov 24, 2023

ivadym commented Nov 24, 2023

mathiasbio commented Nov 24, 2023

ivadym commented Nov 24, 2023

mathiasbio commented Nov 25, 2023

mathiasbio commented Nov 27, 2023 • edited Loading

mathiasbio commented Nov 28, 2023

fevac commented Nov 28, 2023

ivadym commented Nov 30, 2023

fevac commented Nov 23, 2023 •

edited

Loading

mathiasbio commented Nov 24, 2023 •

edited

Loading

mathiasbio commented Nov 27, 2023 •

edited

Loading