-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Variants Removed but Not Filtered #531
Comments
In case anyone comes across this, I realised that the issue was with the prioritisaition filter. The line in the config is This is prioritising the top 50% ish of variants for speed. As per manual: As a general suggestion, it may be helpful to have this documented clearer on the HTML output so that it can clearly be seen how many variants have been removed for this reason. |
Hi @SophieS9 thanks for the suggestion and the detailed bug report. There are a few issues here:
steps: [
failedVariantFilter: { },
variantEffectFilter: {
remove: [
FIVE_PRIME_UTR_EXON_VARIANT,
FIVE_PRIME_UTR_INTRON_VARIANT,
THREE_PRIME_UTR_EXON_VARIANT,
THREE_PRIME_UTR_INTRON_VARIANT,
NON_CODING_TRANSCRIPT_EXON_VARIANT,
NON_CODING_TRANSCRIPT_INTRON_VARIANT,
CODING_TRANSCRIPT_INTRON_VARIANT,
UPSTREAM_GENE_VARIANT,
DOWNSTREAM_GENE_VARIANT,
INTERGENIC_VARIANT,
REGULATORY_REGION_VARIANT
]
},
frequencyFilter: { maxFrequency: 2.0 },
pathogenicityFilter: { keepNonPathogenic: true },
inheritanceFilter: { },
omimPrioritiser: { },
hiPhivePrioritiser: { }
] and this is probably what you want, tweaking the frequency filters. Note that the
|
Thanks for your reply @julesjacobsen. This was super helpful! I quite simply overlooked the prioritization filter as I used the config we use for a WGS analysis (where we want the filter for speed). In this scenario, I don't need it anymore. Thanks for the feedback on the config too, I've made those changes and looks like the annotations are now correct. From what I could tell, the main difference was the use of spaces in between the curly brackets? |
@SophieS9 I thought that the issue might originally have been due to the difference in YAML style, however it turns out there was a bug in the way the steps were being run so that the frequency and pathogenicity filters would not end up with data to filter on. I've opened a new issue to explain this - #534. Both the reporting of the priority score filter pass/fail counts (still only for genes) and the missing annotations will be fixed in the 13.4.0 release which will come out once tested. In the meantime, you should review your analysis scripts and their output as if this is one you have been using frequently you should see that the annotations for frequency and pathogenicity are missing and you will have a lot of unfiltered data. Mostly this will mean that missense SNVs are not properly scored for potential pathogenicity and there will be no frequency filtering which will further affect all variant scores. Known ClinVar variants will still be prioritised, assuming you're using the ClinVar whitelist. |
@josephhalstead I think this maybe a consequence of the HTML filtering summary not being updated (and also being on a per gene basis). As an example this is from the same sample, here is the HTML: And here is the log printed to standard out: And it has removed variants based on the frequency score in this case. |
@josephhalstead sorry, I've been distracted and forgot to start the release process for this. It will be fixed in 13.4.0 and I'll aim to get this out next week, although this is a bit close to Christmas... @SophieS9 is correct in what she says. |
…ent in logs and HTML output. Also fixed bug where frequency and pathogenicity filters would not be provided with data when run after the initial variant load & filter step. Moved analysis.FilterStats to new filters.FilterResultsCounter Add new FilterResultCount data class Add AnalysisResults.filterResultCounts field Add new FilterRunner.filterCounts and FilterRunner.logFilterResult methods Remove brittle logic for FilterStats from AbstractAnalysisRunner Add Filterable.failedFilter method to enable tracking of both passed and failed filters (previously only passed was exposed)
@SophieS9, @josephhalstead this is fixed in Exomiser v14.0.0 |
Thankyou! |
Hi Exomiser Team,
Firstly, apologies if this is documented somewhere and I've missed it! I have a scenario where a WGS VCF is run through exomiser and the scores are collated from the TSV files and passed into an in-house database.
We have a scenario where we get updated phenotype information on a patient and want to re-rank the variants via Exomiser, but not the whole VCF, just a small subset of variants which pass internal filters so that it's fast.
I'm making a small VCF on the fly of these variants and passing to Exomiser, but only 30/69 variants are being analysed. This VCF has a dummy header and a dummy "QUAL" score and "INFO" column, but all other values are taken from the original vcf. When looking at the HTML, it's not clear why they aren't being analysed as it suggests that only 30 variants were input. All of these variants were analysed by Exomiser when in the WGS VCF. The config yaml is also set to look at 1000 variants.
I'd like to know why 39 of the variants are being excluded if possible?! I'm wondering if it's how I make my on the fly VCF. From looking at the stdout, it says variants are failing the frequency, pathogenicity and inheritance filters. However my config is set to keep all frequency and non pathogenic, and all genotypes are 0/1.
I've attached the vcf and config yaml (both changed to txt for upload). Running Exomiser 13.1.0 with dataset 2209_hg38.
14777_exomiser.txt
14777_exomiser_template.txt
The text was updated successfully, but these errors were encountered: