-
Notifications
You must be signed in to change notification settings - Fork 17
Data manipulation
Building and supporting an Ontology requires the same key actions as all database operations: Create, Read, Update, Delete (CRUD). These operations constitute the bulk of the activity in maintaining an Ontology, and focus on single class/term-level changes. This is the reason that github issue tracker is heavily preferred where non-bulk modifications are required. Where bulk manipulations (with grouping annotations) are required, other pipelines are recommended, and will be discussed in 3.2.2 Patterns.
Note that 'class' is equivalent to 'term' or 'entity' within this documentation.
The first operation is New Term Request (NTR), as commonly known with Ontology professionals, originating from new term requested by anyone through github issues tracker on an Ontology's github repo. As found through the AgrO>Issue_tracker>New_issue>Add_new_term link, there is minimal information that is required for Ontology Curators to form a class that is distinct from other potential classes. In general, the more information that can be provided per term, the easier it is for curators to build relations with existing terms. In general, the curated class must have 1 or more annotations with the class by the curator, including label, definition, parent_class, before being published. The reason may involve consistency, community guidelines, distinction & clarity, publication use, etc,. Often times, specific subjects define phrases with contextual meanings, which have a different meaning when translated directly (wind run).
The central annotations required to define a class are rdfs:label
, IAO:0000115
(definition), and owl:equivalentClass
or rdfs:subClassOf
. Additional annotations are optional but highly recommended towards building usable ontologies, including oboInOwl:hasExactSynonym
, dc:creator
(usually ORCID of class proposer) and oboInOwl:source
.
-
rdfs:label
: human-readable label, single entry, defines most common term label published or used -
IAO:0000115
: definition of class which conceptualizes and differentiates it from other similar classes, where 'genus-differentia' form is recommended-
oboInOwl:hasDbXref
: cross database link to define the source for the definition, usually added within the definition's annotation vs the class's annotation
-
-
owl:equivalentClass
orrdfs:subClassOf
: parent classes and/or other classes with relations, where- the 'Equivalent To' defines undirected relations
- the 'SubClass Of' defines directed relations or relation-undefined parent class
-
oboInOwl:hasExactSynonym
: alternative labels, can have multiple annotations per class, each synonym tagged individually -
dc:creator
: ORCID link of proposer of the class, used for micro-creditation -
oboInOwl:source
: sub-annotation for IAO_0000115, definition, often a website to reference -
dc:date
: adds the time/date when the entity was created through the Protege tool
Finally in the NTR workflow, creating the hierarchy is the most critical task of being a curator and potential user. Each Ontology has a base structure of organizing the classes into subgroups, which inherit the parent class's semantics.
In the Agronomy Ontology, and as has been widely adopted within similar Ontologies (ENVO, PATO, PO, etc), the skeleton classes originate from Basic Formal Ontology (BFO). The core terms of use is highly dependent on the curators as term definitions from external ontologies like BFO are not changeable, but can be open to interpretation between potential ambiguities within the parent class definition & semantics and the new class definition and potentially inherited semantics. For instance, BFO defines independent continuant and specifically dependent continuant, where the latter requires each class to have a relation with a former by definition. A 'specifically dependent continuant' has child classes 'quality' and 'realizable entities' which aggregate the qualities inherent from independent continuant or the 'role' that such continuant's can adhere as (ex. fertilizer is a 'role', but also an independent continuant as agronomic fertilizer which has relations that the 'role' cannot form per the reasoner check.
Similar logical equivalencies and semantics are commonly required to satisfy, in order to develop a model. These models are key to the use of an ontology.
It is expected that in order to understand the focus of an ontology, it is important to find the class distributions, or where the terms are created by the curators. For instance, the Agronomy Ontology focuses on the following:
- Entity
- Process
- Material and immaterial entity
- Quality
- Plan specification
- Role
- Unit
On the other hand, other ontologies may focus on defining more specific or general classes from these specific classes, as part of their efforts to define an ontology in comparison to a complete ontology which encompasses everything.
To maintain the interopertability between ontologies, common relations ('object properties') are used to connect classes. In particular, OWL properties are always used where an ontology is formatted in OWL. However, having a very limited number of properties, some properties-oriented ontologies have been created to provide a range of more specialized properties. The one popular ontology is the Relations Ontology (RO). Other useful properties-oriented ontologies include IAO, BFO, OBI.
The last distinction is the 'Equivalent To' vs ‘SubClass Of’ relation within classes. 'Equivalent To' defines a two-way equivalency between classes while the ‘SubClass Of’ is expected to define an expanding branch than a core part of a model. The choice between which relation fits the best between classes is one of the most difficult challenges, partly due to the logical definitions and potential implications which may arise. It might also be the case that specific ontologies prefer the use of particular relations, to maintain the model useability. It is therefore prudent to review pre-existing classes to understand where the classes are placed and which relations are used.
There are a few methods to searching for existing classes:
Ontology Lookup Service, hosted by EMBL-EBI, provides an user-friendly interface and search tool. This service is a popular online repository for finding preexisting terms across most of the published ontologies. It has an additional feature in its search where it displays the term originating ontology, and which ontologies have imported the term.
Protege is a critical offline tool for working with ontologies (owl, rdf, ttl, obo) for any curator, akin to other key software like Microsoft Excel (delimited data), SQLite (SQL), Paint (jpeg), etc.
For the purposes of searching or querying terms, Protege has two features:
- Search (Ctrl+F)
- DL Query
SPARQL is an essential query language in filtering through RDF databases, it's use focuses on multi-class patterns rather than surgical term searching. In order to use SPARQL queries, a swiss-army tool is ROBOT and its query
function is recommended, as additional functions with ROBOT are very useful.
The most active task within the curation of an Ontology is modifications. Often, a request is made to add annotations for terms, based on the annotating fields (usually from the 'Required' & 'Optional' annotations above)
Within Ontologies, where classes are created with permanent identifiers (http://purl.obolibrary.org/obo/AGRO_01000015), it is extremely not recommeneded to 'delete' released classes. Instead, classes can be deprecated to highlight that they are no longer in use.
The reasons for deprecation could range from class merges, insufficient use, imported similar term from external ontology, or even term split when the new terms could not be the parent term.
The Protege software has a useful feature to deprecate entities (Edit>'deprecate entity'). This features allows following a template workflow to limit any breaking changes that may occur in relation to the deprecated entity.
For the Agronomy Ontology, the 'Gene Ontology (GO)' Deprecation profile is used. With this workflow, a reason for deprecation is entered along with potential entities to consider when consumers wanted to use this term.