Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increased number of SVs in versions after 9.0.1 #1118

Closed
mathiasbio opened this issue Mar 23, 2023 · 3 comments · Fixed by #1120
Closed

Increased number of SVs in versions after 9.0.1 #1118

mathiasbio opened this issue Mar 23, 2023 · 3 comments · Fixed by #1120

Comments

@mathiasbio
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Not sure if this is a relevant issue or not. But I thought I would bring it up as a discussion.

Context: In a GMS-BT meeting case chiefgull (run with 11.2.0, a re-analysis of masterflea, run with 9.0.1) it was seen that the number of PASS variants in the final SV-vcf uploaded to Scout was increased from 197 to 8404.

This triggered a question of why the numbers had increased so significantly, and I learned that 8032 of the unique variants in this re-analysis came from TIDDIT which was added to the WGS flow in version 10.0.0 ((https://github.com/Clinical-Genomics/BALSAMIC/pull/947) )

To see if this was just an outlier I checked a few other cases before and after addition of TIDDIT. Below is a table summarising the number of variants in the final SV vcf with filter PASS (column 1) and PASS + TIDDIT (column2), for a few cases in version 9.0.1, 10.0.5 and 11.2.0 (the current latest version).

In summary in a lot of cases TIDDIT seems to add a lot of SVs.

9.0.1 PASS → Tiddit (0) PASS PASS + TIDDIT
fleetearwig 616 0
betterbeagle 662 0
exactmole 1059 0
fairant 781 0
likedguinea 222 0
notedstork 1871 0
uphornet 137 0
10.0.5 PASS → Tiddit
firmraptor 16832 13883
frankmagpie 14916 13497
dearboa 16385 15499
jointmako 14847 14597
crackbaboon 14473 14242
quickgoat 15489 15098
novelbream 19669 15212
11.2.0 (clinical sv vcf) PASS → Tiddit
expertsatyr 25508 1410
amplewasp 31941 1474
ableant 7153 7011
topsdonkey 8106 7959
suiteddrake 10958 8292
hardyweevil 8101 6739

In the VCF there is a value per variant about how many files this variant was observed in, taken probably from the SVDB merge step. But this value is not available to filter in Scout, nor any other quality-based metric to decrease the number of variants to a manageable amount to interpret.

Describe the solution you'd like

Either more filtering of the SV variants before upload to Scout, or more options for manual filtration in Scout, in which case we need to identify good parameters to filter by.

SOMATICSCORE which we're planning to introduce to Scout (#1107) is only available for variants called with Manta, and would not enable us to filter TIDDIT variants.

Describe alternatives you've considered

Is TIDDIT necessary? Why was it introduced?

Additional context
If possible, add any other context or screenshots about the feature request here.

Expected output for the feature
If possible, an example of expected output

Current BALSAMIC version
balsamic --version 11.2.0

@mathiasbio
Copy link
Collaborator Author

mathiasbio commented Mar 23, 2023

I spoke to Jesper about TIDDIT and there were 2 large conclusions, with fairly simple implementations to probably significantly reduce the number of variants:

  1. Apparently we are calling SVs on both the normal and the tumor, but we are not doing any filtering of presence of these SV variants in the normal sample, and in essence we are just adding the normal variants to the tumor when the point is to use the normal variants to filter the somatic.
  2. For BNDs TIDDIT calls 2 variants for each mutation, sort of the forward and the reverse version of the variant. What this means is that we could choose one variant per mutation and probably remove a couple of thousand additional variants before upload to Scout.

@fevac
Copy link
Contributor

fevac commented Mar 24, 2023

Nice find! 🕵️

@mathiasbio mathiasbio linked a pull request Mar 24, 2023 that will close this issue
8 tasks
@mathiasbio mathiasbio moved this from Todo to Testing in BALSAMIC Mar 28, 2023
@pbiology
Copy link
Contributor

Fixed with #1120

@github-project-automation github-project-automation bot moved this from Testing to Completed in BALSAMIC Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants