-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Develop branch: Error in rule vardict_tumor_normal #1286
Comments
We have also vardict failing in tumor-only now:
|
Errors appearing in the integration tests: TN-WGS
UMI TN-WGS
|
In this issue which sounds similar AstraZeneca-NGS/VarDict#167 the author says resorting the bamfile would solve their issue. I'll make some tests! |
I'm not sure how transferable are the issues in VarDict to VarDictJava though 🤔 |
I don't know either, but it is one of the areas of the analysis where there may have been some changes between the previous version of balsamic and the current release, with the removal of concatenation and parallel alignment. Could be worth a try! |
I have tried 3 different things now, independently. TLDR: Fiddled with the bamfiles - still confused - still using a lot more vmem than in production
Ran the same case 6 times, and once it failed vardict
Me and @ivadym looked at the benchmarks of the VarDict rule and saw that in some cases the virtual memory had increased by roughly 150X since previous validation. This seemed like a likely explanation behind this random error, that maybe sometimes the rules which took up an unexpected amount of memory was put on an already heavily loaded node and then crashed. (though one could have hoped for more informative errors) Since the only thing that seemed to have changed since last validation was the bam-files I decided to try test some of the post-processing steps that exists in the production version:
But the max_vms from the benchmarks looks similar...
This apparently also does:
And Sorting input into queryname order. seemed significant, as maybe VarDict performs better if the reads are in queryname rather than position. (this is also interesting with regards to downstream tools in WGS if this is what we're doing in production, maybe it slows down analysis substantially as it's mostly standard to sort by position I think). Anyway...I ran this (similar to what we run in production): But again...the max_vms from VarDict looks the same. |
I guess there are no changes in numbers of cpus or tmp folder being used? |
sorry, not sure I follow everything, is the conclusion that the issue is in the bam files? all the rest is the same? |
I didn't check the CPUs, or the tmp folders. Basically just saw the greatly increased memory and it seemed like a likely culprit. Basically the error seems random, and it also completes when retrying, so not failing consistently. I still wonder if it has something to do with the bamfiles but I'm not sure how since I have tested most things that seemed relevant in production:
I think I'll try concatenating the fastqs just as a sanity check |
but it's weird that is not consistent and finishes when retrying. Is it related to specific nodes? |
We checked that a little too, but maybe not as extensively as we should, but it seems to occur more frequently on node2 and 4, but we have seen other nodes as well. |
Tried concatenating fastq inputs before starting. Same increased vmem, and multiple vardict errors:
on node 13, 21, 21 (again), and 13 again |
comparing benchmarks all these metrics is increased from recent validation to my implement parallel alignment branch: max_rss: Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used. To give some values. Average values for all VarDict jobs from recent validation (it's run 1 time per chromosome) |
A production environment in Stage behaves as expected: vardict doesn't fail, and the memory is consistent with what we see in production. I've also increased the Java heap space to 64G (Xmx64G) and the number of cores:
We're getting two different errors, and the |
Apparently there's no evidence that we have increased the memory usage in VarDict in the new release, instead the differences in numbers between current production and the release version is due to us updating Snakemake which solved this bug: snakemake/snakemake#1671 where benchmark stats were not accurately reported from jobs using singularity... So back to square one with the VarDict issue. |
Excluding the container as the cause of this issue, vardict also fails when using a |
Yes I saw that too. I also tried:
|
Summary: Vardicts fails on bamfiles from:
Succeeds:
|
At the moment there is no evidence that removing unmapped reads helps in this issue. So maybe this can be removed in a future cleanup PR when things are less hectic. |
Agree. Amazing job @mathiasbio ! 🌟 |
❤️ 🎖️ @mathiasbio Closing! #1332 |
Describe the bug
When processing variants in specific regions a critical exception occurred:
java.lang.IndexOutOfBoundsException
with a negativebitIndex
value.After restarting the analysis, it succeeds.
If workflow, which rules
vardict_tumor_normal
(balsamic
TGA-TN andbalsamic-umi
TN)Screenshots
Version (please complete the following information):
balsamic --version
developAdditional context
Previously we had a similar issue: #1271
The text was updated successfully, but these errors were encountered: