Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request for VariantsToTable: option to preserve numeric genotypes #8160

Open
bensprung opened this issue Jan 12, 2023 · 4 comments · May be fixed by #8219
Open

Feature request for VariantsToTable: option to preserve numeric genotypes #8160

bensprung opened this issue Jan 12, 2023 · 4 comments · May be fixed by #8219
Assignees

Comments

@bensprung
Copy link

bensprung commented Jan 12, 2023

Following the discussion here, I'd like to request a feature for VariantsToTable. Currently, when genotypes are specified via -GF GT, the numeric genotypes (e.g. 0/1) from the input VCF are replaced in the output TSV by the actual alleles (e.g. CTACCCT/AACCCA). I can see why that behavior would be desirable sometimes, but in cases where there are a lot of long variants and a lot of samples, this makes the resulting TSV file quite a bit more hefty than it needs to be (since that info is already in the REF and ALT fields). Therefore I think an option to output the original numeric genotypes from the VCF would be useful. Thanks!

@lbergelson
Copy link
Member

@bensprung So I thought this would be a trivial change. It turns out that encoding the Genotype as something like 1/1 is done way down in the depths of the VCF encoder and isn't exposed in an accessible way. It's going to need a (hopefully simple) change to the underlying htsjdk library to expose that machinery. It shouldn't be hard, it just means it will take a bit longer to get to than I expected.

@bensprung
Copy link
Author

Well, no worries, thanks so much for working on it!

lbergelson added a commit to samtools/htsjdk that referenced this issue Jan 27, 2023
* Expose two public methods in VCFEncoder writeGtField and encodeGtField
* Supports broadinstitute/gatk#8160 but seems like a useful thing to be able to do in general
* minor breaking change in VCFEncoder, made methods formatVCFField and buildAlleleStrings static
  It is unlikely anyone overrides either of these methods so it should not be a problem.
@droazen
Copy link
Contributor

droazen commented Jan 30, 2023

Blocked on samtools/htsjdk#1648

lbergelson added a commit to samtools/htsjdk that referenced this issue Jan 30, 2023
* Expose two public methods in VCFEncoder writeGtField and encodeGtField
* Supports broadinstitute/gatk#8160 but seems like a useful thing to be able to do in general
* minor breaking change in VCFEncoder, made methods formatVCFField and buildAlleleStrings static
  It is unlikely anyone overrides either of these methods so it should not be a problem.
lbergelson added a commit to samtools/htsjdk that referenced this issue Jan 31, 2023
* Expose the ability to encode a Genotoype into a GT field by exposing
 two public methods in VCFEncoder: writeGtField and encodeGtField
* Supports broadinstitute/gatk#8160 but seems like a useful thing to be able to do in general
* minor breaking change in VCFEncoder, made methods formatVCFField and buildAlleleStrings static
  It is unlikely anyone overrides either of these methods so it should not be a problem.
@lbergelson
Copy link
Member

The necesssary change went into htsjdk so we can do this now whenever we pick up a new release of htsjdk.

@lbergelson lbergelson linked a pull request Feb 22, 2023 that will close this issue
lbergelson added a commit that referenced this issue Feb 22, 2023
* add an new option to VariantsToTable to allow output VCF style numeric GT fields
previously it always output the actual bases of the Allele in the GT spot
* resolves #8160
* updates htsjdk to 3.0.5
lbergelson added a commit that referenced this issue Aug 16, 2024
* add an new option to VariantsToTable to allow output VCF style numeric GT fields
previously it always output the actual bases of the Allele in the GT spot
* resolves #8160
* updates htsjdk to 3.0.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants