Calculate GC/AT dropout for WGS cases #1240

pbiology · 2023-08-31T13:32:13Z

Need

It would be good to start tracking GC and AT dropout for all WGS cases. Right now we are just doing it for panels.

Suggested approach

Run picard hs metrics also on WGS cases. Preferably with the same exome BED file as used in rare disease. Could be worthwhile asking the RD team if we should update which file is used for the exome.

Considered alternatives

Could we otherwise somehow get the whole genome, and not only exome?

Requests/suggestions/bugs solved by the feature

Can be closed when

BALSAMIC captures GC and At dropout in the multiqc.json file

Blockers

pbiology · 2023-09-01T09:24:24Z

Let's implement as in MIP for release 13, and then for next release we should re-evaluate exactly which regions should be underlying the GC/AT dropout calculations

mathiasbio · 2023-09-14T07:59:00Z

RD seems to be using an old bedfile from the twist exome prep: "twistexomerefseq_9.1_hg19_design.bed.pad100.interval_list" I wonder if it isn't simply better for us that we use the RefGene bedfile that we're already using in balsamic as it seems to be a more general, untampered file.

It would also be easier to implement this if we were using that file as implementing the same that RD is using would require us to either:

add another reference file to download in our init, and then do some modification of it to get the "pad100" (as I don't think this file is available online)
alternatively to save this exome-bed in a reference folder and add it as an argument to balsamic config case

To start I will test run with the exome-fil we have, AND the RD one, and compare the stats we get. If they are similar, which I think they will be as the regions included in these files are so large it should converge on the same value, then I think we can just run with the RefGene one. I'll write the results here.

pbiology · 2023-09-14T11:17:55Z

Sounds like a good way forward. We should not skip on aligning with the RD group before making any decisions. Having the values comparable seems quite valuable

mathiasbio · 2023-10-09T08:36:36Z

I ran some tests with 3 different bedfiles. The Refgene bedfile we're using already in balsamic, the untampered Twist bedfile for exome-analysis, and the RD Twist bedfile that they're using. While the coverage values are very similar for the different bedfiles, the GC_dropout is quite different, and substantially so between the untampered Twist v10 and the others. I don't know why this is, perhaps it has something to do with the inclusion of many small bed-regions in this file, which is the one defining feature of this bedfile that I can imagine at the moment. In the end I think the most reasonable way to implement this is to use the RefGene bedfile, the results are similar to the ones we get with the RD bedfile, and I think if there is any standardisation we could achieve between the pipeline it is more reasonable to build this foundation on Refgene rather than a particular exome-panel.

Bedfile	PCT_TARGET_BASES_10X	PCT_TARGET_BASES_20X	PCT_TARGET_BASES_30X	MEAN_TARGET_COVERAGE	AT_DROPOUT	GC_DROPOUT
twistexomerefseq_9,1_hg19_design,bed,pad100,modifiedheader,interval_list	0,98063	0,931417	0,600774	31,370124	2,997022	0,061446
twistexomecomprehensive_10,1_hg19_design,bed	0,979294	0,92972	0,606413	31,482487	2,703384	0,441655
refGene,flat,bed	0,983179	0,95696	0,606675	31,776462	2,253385	0,054916

pbiology added Feature New feature Effort Small Urgency Small Gain Medium Needs Refinement labels Aug 31, 2023

pbiology added this to the TBD milestone Aug 31, 2023

pbiology added this to BALSAMIC Aug 31, 2023

github-project-automation bot moved this to Todo in BALSAMIC Aug 31, 2023

pbiology added Urgency Medium and removed Urgency Small Needs Refinement labels Sep 1, 2023

khurrammaqbool added the Needs Refinement label Sep 1, 2023

pbiology modified the milestones: TBD, Release 13 Sep 1, 2023

pbiology assigned mathiasbio Sep 1, 2023

khurrammaqbool removed the Needs Refinement label Sep 1, 2023

mathiasbio moved this from Todo to Planned in BALSAMIC Sep 12, 2023

mathiasbio moved this from Planned to In Progress in BALSAMIC Sep 12, 2023

mathiasbio linked a pull request Oct 20, 2023 that will close this issue

feat: add extra qc metrics #1288

Merged

8 tasks

mathiasbio mentioned this issue Oct 25, 2023

feat: add extra qc metrics #1288

Merged

8 tasks

mathiasbio moved this from In Progress to Review in BALSAMIC Oct 27, 2023

ivadym moved this from Review to WIP in BALSAMIC Nov 1, 2023

mathiasbio moved this from WIP to Completed in BALSAMIC Nov 6, 2023

mathiasbio closed this as completed Nov 6, 2023

pbiology moved this from Completed to Archived in BALSAMIC Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate GC/AT dropout for WGS cases #1240

Calculate GC/AT dropout for WGS cases #1240

pbiology commented Aug 31, 2023

pbiology commented Sep 1, 2023

mathiasbio commented Sep 14, 2023 •

edited

Loading

pbiology commented Sep 14, 2023

mathiasbio commented Oct 9, 2023

Calculate GC/AT dropout for WGS cases #1240

Calculate GC/AT dropout for WGS cases #1240

Comments

pbiology commented Aug 31, 2023

Need

Suggested approach

Considered alternatives

Requests/suggestions/bugs solved by the feature

Can be closed when

Blockers

pbiology commented Sep 1, 2023

mathiasbio commented Sep 14, 2023 • edited Loading

pbiology commented Sep 14, 2023

mathiasbio commented Oct 9, 2023

mathiasbio commented Sep 14, 2023 •

edited

Loading