Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For MAF: upload SNVs of normal samples to Genotype #882

Closed
moahaegglund opened this issue Feb 17, 2022 · 13 comments
Closed

For MAF: upload SNVs of normal samples to Genotype #882

moahaegglund opened this issue Feb 17, 2022 · 13 comments

Comments

@moahaegglund
Copy link

Description

When a tumor/normal analysis is ordered with BALSAMIC analysis only we need to create a case for the MIP analysis of the normal sample under cust000 to compare with the results of MAF. For example the samples in ticket 110945. Today prod bioinfo have to create these cases manually.

@henrikstranneheim
Copy link

@moahaegglund @ashwini06 Would it be possible to use the normal sample SNVs from Balsamic and upload them to Genotype instead of analyzing the same sample again using MIP?

@moahaegglund
Copy link
Author

@moahaegglund @ashwini06 Would it be possible to use the normal sample SNVs from Balsamic and upload them to Genotype instead of analyzing the same sample again using MIP?

I let you answer this question @ashwini06, but this would be the best solution for us.

@moahaegglund moahaegglund transferred this issue from Clinical-Genomics/cg Mar 9, 2022
@moahaegglund moahaegglund changed the title For MAF: create mip cases for normal samples analysed with BALSAMIC For MAF: upload SNVs of normal samples to Genotype Mar 9, 2022
@moahaegglund
Copy link
Author

@ashwini06 would it be possible to upload the SNVs of the normal samples to Genotype?

@ashwini06
Copy link
Contributor

ashwini06 commented Mar 9, 2022

@moahaegglund @henrikstranneheim: Somehow I missed this issue to answer, sorry about that! Yes, it might be possible. Currently, BALSAMIC reports SNV germline variants from normal sample. We have those results generated from two separate variants callers (Sentieons DNAscope and GATK haplotypecaller). Maybe we can use one of those for genotype?

@Vince-janv
Copy link

I tested uploading /home/proj/production/housekeeper-bundles/manydoe/2022-04-26/SNV.germline.normal.dnascope.vcf.gz to genotype-stage. Using the match sequence function genotype then matched it against the vcf uploaded from the mip-analysis.

@ashwini06 However genotype gets the sample name from the VCF header and in BALSAMIC the field is called NORMAL for the normal sample. Can this easily be changed to the sample internal_id?

@ashwini06
Copy link
Contributor

@Vince-janv :

I tested uploading /home/proj/production/housekeeper-bundles/manydoe/2022-04-26/SNV.germline.normal.dnascope.vcf.gz to genotype-stage. Using the match sequence function genotype then matched it against the vcf uploaded from the mip-analysis.

@ashwini06 However genotype gets the sample name from the VCF header and in BALSAMIC the field is called NORMAL for the normal sample. Can this easily be changed to the sample internal_id?

@Vince-janv : Yes, it is possible but currently we don't have the functionality to make this change. But with a simple command-line, one can easily change the sample names in the output VCF files.

bcftools reheader -s change_sample_names.txt SNV.germline.normal.dnascope.vcf.gz -o SNV.germline.normal.dnascope.changednamed.vcf.gz

change_sample_names.txt is a 2-column text file. For eg:
NORMAL ACCxxxx

So it is only you are interested in the sample name change in this output VCF : SNV.germline.normal.dnascope.vcf.gz?

@Vince-janv
Copy link

@ashwini06 Thanks for the quick reply! 😊 I think there is an argument for consistency (ie. using the same header format in all VCFs). But that being said I know very little about how BALSAMIC uses the VCF headers. But for genotype we just need a VCF for the normal, and i chose one of the ones you suggested.

The "manual" change you suggested will be great for testing but for use in production we would need another solution.

@ashwini06
Copy link
Contributor

@Vince-janv: Agree, that this change needs to be done from BALSAMIC for germline DNAscope files, which you can use in genotype directly. I will get back to you on this once we have a solution implemented in BALSAMIC.

@Vince-janv
Copy link

@ivadym we would also need to add the tag genotype to the germline DNAscope file in housekeeper 🙇‍♂️

@Vince-janv
Copy link

Vince-janv commented Jun 16, 2022

Hi. I will summarise some discussions we've had the past weeks.

@ashwini06 Informed me that changing the vcf headers is more problematic than initially thought.

Discussing with @Mropat and @ivadym we concluded that implementing changes in genotype would be beneficial, but would take a lot of work. With the current lack of developers I think it might not happen for a while if not prioritised.

As a quick and dirty fix I suggest that @ivadym adds the tag in housekeeper and I'll merge a small PR in cg to adapt the current genotype upload to balsamic. Production can then re-header the file manually and upload via the cg CLI.

Let me stress that this should not be a permanent solution!

@ashwini06
Copy link
Contributor

ashwini06 commented Jun 16, 2022

@Vince-janv : Thanks for the update! I have two possible solutions that might work, but again I am on it and testing before I promise something to you.
a) Within BALSAMIC, Just renaming the VCF headers of genotype normal VCF file and saving the file specifically for genotype. This way we are only creating an extra VCF file for genotype without disturbing the format of sample names in the rest of the output VCF files and BAM files.

b) To have the same standardized format for sample names (i.e LIMSID instead of TUMOR/NORMAL) in all output VCF and BAM files, it requires huge refactoring with most of the BALSAMIC snakemake rules (for eg) and handling of some wildcards. This could be a permanent solution. This also need some changes on SCOUT balsamic config file creation i guess.. for example: the following sample_id needs to set to LIMSID. @ivadym : can you confirm me if it is done automatically based on the cram file header or someother way?

samples:
- alignment_path: /home/proj/stage/cancer/cases/sweetelf_test_APJ/analysis/bam/tumor.merged.cram
  analysis_type: panel
  phenotype: affected
  sample_id: TUMOR
  sample_name: GMSmyeloid-control-HD829-1
  sex: unknown
  tissue_type: cell line
  tumor_purity: 0
track: cancer
samples:
- alignment_path: /home/proj/stage/cancer/cases/sweetelf_test_APJ/analysis/bam/tumor_namechanged.merged.bam
  analysis_type: panel
  phenotype: affected
  sample_id: ACC6637A1
  sample_name: GMSmyeloid-control-HD829-1
  sex: unknown
  tissue_type: cell line
  tumor_purity: 0
track: cancer

@Vince-janv @ivadym I am on my way in testing my solutions. if its OK, can i ask both to wait a bit before you start fixing your proposed solutions in cg and HK.

@ivadym
Copy link
Contributor

ivadym commented Jun 16, 2022

@ashwini06 The default values for sample_id are actually the lims_id, for BALSAMIC this is being changed in CG to tumor/normal. So its independent of the cram files, but I will look more in detail what would be affected by this changes.

@ashwini06
Copy link
Contributor

Fixed in #958

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants