-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update variation property to account for multiple alleles #58
Conversation
In order to capture zygosity (and genotype) in CaseLevelVariants completely, we need to be able to account for the situation where a caseLevelVariant contains two alternate alleles, neither of which is the reference.
Add examples of simple and compound heterozygosity
Thank you for the suggestion. In addition to the |
I can add those changes, but as I was about to make them, I realized that one of the problems I'm having is that I'm still not completely clear on whether or not genomicVariant is meant to represent genotype-level data at all: it's not clear from the top-level description, "Schema for a genomic variant entry." If genomicVariant is meant to represent everything that could be in a vcf record, then caseLevelData would include sample-level diploid data, such as in the beacon-ri example, and then variation would have to be an array, with identifiers, molecularAttributes, etc., also following as arrays. Alternatively, these could be nested into an array of objects, with each of the properties being represented per object. However, the alternative scenario is that genomicVariant is only meant to represent a single variation each, in which case sample-level diploid data would not be represented in here at all, and there would need to be a completely separate endpoint(?) to represent sample-level genotypic data, possibly with reference to variations by ID. |
Beacon v2 has a different function than VCFs. The Beacon v2 specification was built to facilitate data discovery (or semantic interoperability), whereas the VCF specification is meant for data analysis, storage or sharing. As you may have noticed, in the current version of the Beacon specification, There are other important factors to consider, such as the lack of a term/property to store variant quality or depth, not at the variant level nor at the GT level. As an implementer, my suggestion would be to simply split your multiallelic VCFs into biallelics and go from there. Another valid option is to use the beacon2-ri-tools software (which I developed) that will perform the VCF to JSON transformation for you, including the split to biallelic. Hope this helps. Thx, m |
So Beacon is not meant to facilitate discovery of genotypic data at all? That seems odd, since there is quite a lot of schema devoted to individuals and cases. |
Beacon v2 purpose is to facilitate data discovery (genomic data and phenoclinic data). |
But isn't genotypic data, like cases of compound heterozyosity, something that one might want to discover? Beacon just won't address that at all? |
The current version is 2.0, which was achieved through a huge community effort. The plan is for future iterations to address practical issues. I am speaking as an implementer. Changes in the spec are decided by a working group. |
Addresses issue #57. In order to capture zygosity (and genotype) in CaseLevelVariants completely, we need to be able to account for the situation where a caseLevelVariant contains two alternate alleles, neither of which is the reference. I would recommend requiring the first element of a variations array, element 0, to be the reference allele, and subsequent alternate alleles to be numbered accordingly. Then zygosity can be represented as in the beacon-ri implementation:
with the labeling schema extended in the style of VCF, with values like
1/2
.More specifically, this allows for the specification of the GENO:0000402 value for zygosity: