-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace alt_allele_in_normal filter in TNscope #1254
Comments
I agree, this sounds way too strict. Do you know if there is a setting for this in TNscope? |
I don't know! I remember I was in contact with Sentieon about this in Gbg and I actually saved the response from Don Freed:
And concluded back then to just remove this filter and create our own using bcftools. At least based on this email response it did not seem that there was any great attempt to argue for the utility of this filter. I did a little investigation of the allele-frequencies in the TNscope VCF today. See above! |
That is far from ideal... Great to have data for this. Thanks Mathias! |
I'm not sure. What confuses me a little is how similar the distributions are for the PASS and alt_allele_in_normal statuses are, which indicates that the presence of the variant in the normal isn't the ONLY requirement for setting this filter, otherwise the PASS variants would all be 0 for af_n which they are not. So it seems that the filter is taking some additional parameters into account. But if that's the case and that they should be filtered out, then I'd prefer that we design some more informative filter for these variants. But my suspicion right now is that these variants probably mainly true somatic variants, and that we could remove the filter and design our own which we have more control over. Something like this could maybe work...setting a filter if the AF_T / AF_N >= 0.5, which would be a very relaxed filter: Only PASS: Only alt_allele_in_normal: In table format:
It seems that we would not filter out that many variants that are marked as PASS with this additional filter, but add either 985 + 402 variants if we choose to implement the "high_n_frac" filter, or 985 variants if we implement the "medium_n_frac" filter. (in this example case...) If there are some heterozygote germline variants that have duplications in the tumor, we would for instance probably keep a lot of those with the high_n_frac filter if they managed to slip by the other filters that are already present in Sentieon like "germline_risk". On a connected note...the germline_risk filter is also interesting, but one step at a time. Here's the AFs for the variants only tagged with the germline_risk filter: Funny that there are so many variants with germline_risk with a normal_af of 0. But I remember this being explained to me by Sentieon as being at risk of being germline because they didn't consider there sufficient support in the normal to rule out that it wasn't germline. Such as low coverage, by chance not capturing enough support for a heterozygote germline variant. Here are the coverages for the germline_risk filter for variants with an AF_N = 0. Quite a few variants with a low coverage in the normal, but also quite a few with pretty substantial coverage. |
Perhaps in the end I'd implement this filter: |
Closing this and replacing with user story: #1335 |
Need
Alt_allele_in_normal in TNscope seems way too strict in filtering out the presence of the tumor in the normal. Sometimes even a sequencing error of 1 base can be enough to set this filter. I think we should quickly consider changing this.
Some more background investigations to highlight this issue:
I took a random unfiltered TNscope VCF from a WGS T/N case:
I filtered out all filter-combinations with fewer counts than 900 for this barplot:
Can't see it there really, but there are 1511 variants with only the alt_allele_in_normal filter set.
I next made some scatterplots with tumor_af x normal_af for a few different filters:
Here is "PASS"
Here is the most popular combination of filters "MLrejected;alt_allele_in_normal;t_lod_fstar"
Finally here are the variants with the unique filter "alt_allele_in_normal":
Out of these 1511 variants with only "alt_allele_in_normal", 1002 of them had a total number of reads supporting the variant = 1
As long as Sentieon doesn't have a brilliant and mysterious reason for putting this filter I really think we are at risk of filtering out relevant variants with this...
Suggested approach
A few sentences about the intended solution
Considered alternatives
Were there alternative approaches which have been rejected?
Requests/suggestions/bugs solved by the feature
Link any feature requests/bug reports or other issues which this would solve
Can be closed when
Link the issues needed to be closed for this to be implemented
Blockers
Anything preventing this from happening?
The text was updated successfully, but these errors were encountered: