-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concatenating germline vcfs #792
Conversation
|
I think it's best to start small, and just do germline snps/indels for now.
I do think it's better to have that optional, people can have different usage downstream.
Yes, in my opinion, as long as we produce a vcf.gz, we should have it tabix indexed. |
Thanks for the feedback, @maxulysse. Much appreciated. I'll make the concatenation optional somehow :-) Concerning your idea about the text-file - the vcf-file produced by
I'd say that makes the text-file redundant, right? |
|
…vcf-files from deepvariant.
… for vcf-files from freebayes.
… for vcf-files from tiddit.
…th the other variant-calling-parameters.
@FriederikeHanssen @maxulysse : Can I get you guys to do a preliminary review of this PR? If this PR looks okay, then I'll update the corresponding modules in github.com/nf-core/modules. I've tested this PR with the following cmd:
and it gives me a concatenated germline-vcf-file which was made by this
(N.B. The cnvkit doesn't produce a vcf-file, so no variants from cnvkit in the concatenated vcf-file.) In fact, two concatenated vcf-files were produced, since the input-samplesheet contains to bam-files:
The vcf-files are sorted and have corresponding tbi-files. Warning: This PR contains some real clumsy code: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the overall looks good. I had one comment on sorting --- is that something that has been discussed? I mean post-concatenate sorting?
Also -- I think interpretation wise it does make sense to talk about 'an intersection' of results with long variants, cnvs, e.g. as when including Manta etc. Few long variant callers will agree on exactly where the break ends are of a variant, so it's not really going to be an intersection more like an union with slightly different border repeats of the same CNV.
🙈 oh no |
Ok, so I introduced a local module for adding the With the CLI-options
In the attached concatenated vcf-files, there are no variant from manta or tiddit. What do you guys think about this solution? I'm still passing the index-files from the variant-caller-modules all the way back to |
…nd patient-id correspond to id in bam-files. See nf-core#872
I'm still passing the index-files from the variant-caller-modules all the way back to sarek.nf; that is actually not necessary with the usage of the local module. Should I get rid of the code passing the index-files from the variant-caller-modules back to sarek.nf? The fastest and easiest solution would be just to get rid of the (new) code which is passing the index-files back to |
@maxulysse asked me to get rid of the redundant code, and so I did. I now - finally - have all CI-tests passing. Let's merge this thing! |
uh nice 🥳 🚀 |
Issue #738.
Adding option
--concatenate_vcfs
for concatenating the vcf-files from all the germline variant-callers, except cnvkit which doesn't produce a vcf-file.I've set it up so that Sarek puts the concatenated
.vcf.gz
-file here:The output-vcf-files are sorted and have index-files. Added an INFO-field named
SOURCE
in the output-vcf-fils giving the name of the (source) vcf-file from whence the variant came. (The filename also indicate which variant-caller was used.)I updated
nextflow_schema.json
- perhapsdocs/usage.md
anddocs/output.md
should also be updated?PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).