Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace alt_allele_in_normal filter in TNscope #1254

Closed
mathiasbio opened this issue Sep 15, 2023 · 6 comments · Fixed by #1289
Closed

Replace alt_allele_in_normal filter in TNscope #1254

mathiasbio opened this issue Sep 15, 2023 · 6 comments · Fixed by #1289
Assignees
Labels

Comments

@mathiasbio
Copy link
Collaborator

mathiasbio commented Sep 15, 2023

Need

Alt_allele_in_normal in TNscope seems way too strict in filtering out the presence of the tumor in the normal. Sometimes even a sequencing error of 1 base can be enough to set this filter. I think we should quickly consider changing this.

Some more background investigations to highlight this issue:

I took a random unfiltered TNscope VCF from a WGS T/N case:

I filtered out all filter-combinations with fewer counts than 900 for this barplot:

filtered_counts_per_filter_plot

Can't see it there really, but there are 1511 variants with only the alt_allele_in_normal filter set.

I next made some scatterplots with tumor_af x normal_af for a few different filters:

Here is "PASS"

t_af_n_af_PASS

Here is the most popular combination of filters "MLrejected;alt_allele_in_normal;t_lod_fstar"

t_af_n_af_MLrejected_alt_allele_in_normal_t_lod_fstar

Finally here are the variants with the unique filter "alt_allele_in_normal":

t_af_n_af_alt_allele_in_normal

Out of these 1511 variants with only "alt_allele_in_normal", 1002 of them had a total number of reads supporting the variant = 1

t_af_n_af_alt_allele_in_normal_ad1

As long as Sentieon doesn't have a brilliant and mysterious reason for putting this filter I really think we are at risk of filtering out relevant variants with this...

Suggested approach

A few sentences about the intended solution

Considered alternatives

Were there alternative approaches which have been rejected?

Requests/suggestions/bugs solved by the feature

Link any feature requests/bug reports or other issues which this would solve

Can be closed when

Link the issues needed to be closed for this to be implemented

Blockers

Anything preventing this from happening?

@vwirta
Copy link

vwirta commented Sep 15, 2023

I agree, this sounds way too strict. Do you know if there is a setting for this in TNscope?

@mathiasbio
Copy link
Collaborator Author

mathiasbio commented Oct 17, 2023

I don't know! I remember I was in contact with Sentieon about this in Gbg and I actually saved the response from Don Freed:

There are three filters to remove germline variants that have reads in the matched normal sample. “germline_risk” is set if the variant is present at a dbSNP site and the NLOD is low. “normal_LOD” is set if the NLOD is low, regardless if the site is present in dbSNP or not. “alt_allele_in_normal” is set if the number of reads supporting the alternate allele in the normal sample is above some threshold and if their base-quality score is sufficiently high. Importantly, for “germline_risk” and “normal_LOD” the NLOD may be too low due to low coverage in the tumor sample (even if no reads support the variant in the normal sample).

The interpretation of these filters really depends on your experimental setup. If you have some contamination of tumor variants into the normal sample or (less likely) the somatic mutation occurred early enough in development to also be present in your normal sample, then “alt_allele_in_normal” may be set for real somatic variants.

And concluded back then to just remove this filter and create our own using bcftools. At least based on this email response it did not seem that there was any great attempt to argue for the utility of this filter.

I did a little investigation of the allele-frequencies in the TNscope VCF today. See above!

@fevac
Copy link
Contributor

fevac commented Oct 18, 2023

Out of these 1511 variants with only "alt_allele_in_normal", 1002 of them had a total number of reads supporting the variant = 1

That is far from ideal... Great to have data for this. Thanks Mathias!
What's your suggestion then, removing completely this filter? or adding a different post-filter with bcftools?

@mathiasbio
Copy link
Collaborator Author

mathiasbio commented Oct 18, 2023

I'm not sure. What confuses me a little is how similar the distributions are for the PASS and alt_allele_in_normal statuses are, which indicates that the presence of the variant in the normal isn't the ONLY requirement for setting this filter, otherwise the PASS variants would all be 0 for af_n which they are not.

So it seems that the filter is taking some additional parameters into account. But if that's the case and that they should be filtered out, then I'd prefer that we design some more informative filter for these variants. But my suspicion right now is that these variants probably mainly true somatic variants, and that we could remove the filter and design our own which we have more control over.

Something like this could maybe work...setting a filter if the AF_T / AF_N >= 0.5, which would be a very relaxed filter:
bcftools filter -s high_normal_frac -e '(FORMAT/AF[1] / FORMAT/AF[0]) >= 0.5' -m
Or to be more stringent, allowing for 25% tumor in normal contamination:
bcftools filter -s high_normal_frac -e '(FORMAT/AF[1] / FORMAT/AF[0]) >= 0.25' -m
All filters:

custom_filt_afs

Only PASS:

custom_filt_afs_PASS

Only alt_allele_in_normal:

custom_filt_afs_altinnormal

In table format:

alt_allele_in_normal variants
none 985
medium_n_frac (0.25 --> 0.5) 402
high_n_frac (0.5 --> 1) 124
PASS variants
none 10523
medium_n_frac (0.25 --> 0.5) 76
high_n_frac (0.5 --> 1) 42

It seems that we would not filter out that many variants that are marked as PASS with this additional filter, but add either 985 + 402 variants if we choose to implement the "high_n_frac" filter, or 985 variants if we implement the "medium_n_frac" filter. (in this example case...)

If there are some heterozygote germline variants that have duplications in the tumor, we would for instance probably keep a lot of those with the high_n_frac filter if they managed to slip by the other filters that are already present in Sentieon like "germline_risk".

On a connected note...the germline_risk filter is also interesting, but one step at a time.

Here's the AFs for the variants only tagged with the germline_risk filter:

germline_risk_afs

Funny that there are so many variants with germline_risk with a normal_af of 0. But I remember this being explained to me by Sentieon as being at risk of being germline because they didn't consider there sufficient support in the normal to rule out that it wasn't germline. Such as low coverage, by chance not capturing enough support for a heterozygote germline variant.

Here are the coverages for the germline_risk filter for variants with an AF_N = 0. Quite a few variants with a low coverage in the normal, but also quite a few with pretty substantial coverage.

germline_risk_covs

@mathiasbio
Copy link
Collaborator Author

Perhaps in the end I'd implement this filter:
bcftools filter -s high_normal_frac -e '(FORMAT/AF[1] / FORMAT/AF[0]) >= 0.5' -m
And complement with the loqusDB germline database to filter, just to reduce the risk of filtering out Tumor In Normal Contamination.

@github-project-automation github-project-automation bot moved this to Todo in BALSAMIC Oct 20, 2023
@mathiasbio mathiasbio moved this from Todo to In Progress in BALSAMIC Oct 20, 2023
@mathiasbio mathiasbio self-assigned this Oct 20, 2023
@mathiasbio mathiasbio linked a pull request Oct 20, 2023 that will close this issue
62 tasks
@pbiology pbiology modified the milestone: TBD Oct 24, 2023
@mathiasbio mathiasbio added this to the Release 14 milestone Oct 31, 2023
@mathiasbio mathiasbio moved this from In Progress to Planned in BALSAMIC Oct 31, 2023
@mathiasbio
Copy link
Collaborator Author

Closing this and replacing with user story: #1335

@github-project-automation github-project-automation bot moved this from Planned to Completed in BALSAMIC Dec 1, 2023
@mathiasbio mathiasbio removed this from the Release 14 milestone Dec 1, 2023
@pbiology pbiology removed this from BALSAMIC Dec 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants