Skip to content

cidgoh/pathogen-mutation-functionalannotation-package

Repository files navigation

The Mutation Functional Annotation Contextual Data Specification

About

The Mutation Functional Annotation Data Specification captures information about viral mutations and their correlated functional impacts as described in the literature. This spec is based on the data in Pokay, a manually curated repository of SARS-CoV-2 and MPOX mutation impacts. This specification will allow researchers to validate their own mutation functional annotation data format in order to overlay their own data on the Virus-MVP heatmap, and also provide a way for curators to easily contribute new functional annotations to Pokay.

What are ontologies and how do they improve data quality for functional annotation of mutations?

Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization (think of the word “plant”. To someone in agriculture, “plant” could mean an organism that carries out photosynthesis, while a food regulator might understand the word “plant” to mean a factory where food products are made). This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do.

Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Ontologies encode synonyms, which enables mapping between the specific languages used by different organizations, and every term in the ontology is assigned a globally unique and persistent identifier. Using ontology terms to standardize functional annotation contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).

The Mutation Functional Annotation Contextual Data Specification Package

This specification will be implemented via a DataHarmonizer validation template, accompanying Field and Term reference guides (which provide definitions and additional specific guidance) and a curation Standard Operating Procedure (SOP). New terms and/or term changes can be requested using issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. This resources are available in the files of this repository and listed below under Package Contents.

Version Control

Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z.

x = Field level changes
y = Term value / ID level changes
z = Definition, guidance, example, formatting, or other uncategorized changes

Descriptions of changes are provided in release notes for every new version.

Package Contents

Data Collection Template - TBA

Field and Term Reference Guides

Curation SOP - TBA

DataHarmonizer Instructions and SOP - TBA

New Term Request (NTR) SOP - TBA

Ontology models

The ontological relationships between the terms in the reference guide are captured in these ontology models.

The first model describes the relationships between information about mutations and the physical mutations themselves, which are located within a genome of some organism. This model is both flexible and robust enough to work with different variant naming conventions. Entities outlined in orange are captured in the DataHarmonizer template. mutation model

The second model describes one or more mutation symbols that are correlated with a functional effect, as reported in the literature. evidence model

Contacts

For more information and/or assistance, contact Madeline Iseminger at miseming at sfu dot ca or submit a repository issue request.

License

MIT License

Acknowledgements

Brought to you by The Centre for Infectious disease Genomics and One Health.

LogoCIDGOH2

About

Package for functional annotation of mutations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published