-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate GC/AT dropout for WGS cases #1240
Comments
Let's implement as in MIP for release 13, and then for next release we should re-evaluate exactly which regions should be underlying the GC/AT dropout calculations |
RD seems to be using an old bedfile from the twist exome prep: "twistexomerefseq_9.1_hg19_design.bed.pad100.interval_list" I wonder if it isn't simply better for us that we use the RefGene bedfile that we're already using in balsamic as it seems to be a more general, untampered file. It would also be easier to implement this if we were using that file as implementing the same that RD is using would require us to either:
To start I will test run with the exome-fil we have, AND the RD one, and compare the stats we get. If they are similar, which I think they will be as the regions included in these files are so large it should converge on the same value, then I think we can just run with the RefGene one. I'll write the results here. |
Sounds like a good way forward. We should not skip on aligning with the RD group before making any decisions. Having the values comparable seems quite valuable |
I ran some tests with 3 different bedfiles. The Refgene bedfile we're using already in balsamic, the untampered Twist bedfile for exome-analysis, and the RD Twist bedfile that they're using. While the coverage values are very similar for the different bedfiles, the GC_dropout is quite different, and substantially so between the untampered Twist v10 and the others. I don't know why this is, perhaps it has something to do with the inclusion of many small bed-regions in this file, which is the one defining feature of this bedfile that I can imagine at the moment. In the end I think the most reasonable way to implement this is to use the RefGene bedfile, the results are similar to the ones we get with the RD bedfile, and I think if there is any standardisation we could achieve between the pipeline it is more reasonable to build this foundation on Refgene rather than a particular exome-panel.
|
Need
It would be good to start tracking GC and AT dropout for all WGS cases. Right now we are just doing it for panels.
Suggested approach
Run picard hs metrics also on WGS cases. Preferably with the same exome BED file as used in rare disease. Could be worthwhile asking the RD team if we should update which file is used for the exome.
Considered alternatives
Could we otherwise somehow get the whole genome, and not only exome?
Requests/suggestions/bugs solved by the feature
Can be closed when
Blockers
The text was updated successfully, but these errors were encountered: