From 00bc2dbe4a7590ec4cffa4b6142f07774622bf33 Mon Sep 17 00:00:00 2001 From: Daniel Danis Date: Tue, 28 Nov 2023 17:09:41 -0500 Subject: [PATCH 1/4] Show how to run all tests. --- docs/setup.rst | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/docs/setup.rst b/docs/setup.rst index 084455b..085a978 100644 --- a/docs/setup.rst +++ b/docs/setup.rst @@ -31,14 +31,26 @@ the active Python environment, assuming you have privileges to install packages. Run tests ^^^^^^^^^ -The contributors may want to run the unit tests and the integration tests to ensure the features work as expected. -Hpo-toolkit's tests use a combination of `unittest` and `ddt`. The tests are run as:: +The contributors may want to run the unit tests and the integration tests to ensure all features work as expected. + +Before running tests, make sure you install HPO toolkit with `test` and `docs` dependencies:: + + python3 -m pip install .[test,docs] + +The unit tests and the integration tests can the be running by invoking:: python3 -m unittest discover -s src -p _test*.py python3 -m unittest discover -s tests +We go extra mile to ensure the documentation is always up-to-date, and, therefore, we also run the documentation tests. +The documentation tests are run by:: + + cd docs + sphinx-apidoc --separate --module-first -d 2 -H "API reference" -o apidocs ../src/hpotk + make doctest + .. note:: - The library *must* be installed in the environment before running the tests. Otherwise, the test discovery will fail. + The library *must* be installed in the environment before running all tests. Otherwise, the test discovery will fail. That's about it! \ No newline at end of file From c5c128c3d959299260f5cc5e9f6898bc5e360aa1 Mon Sep 17 00:00:00 2001 From: Daniel Danis Date: Tue, 28 Nov 2023 17:12:10 -0500 Subject: [PATCH 2/4] Remove the obsolete notebooks. --- notebooks/Tutorial.ipynb | 1304 -------------------------------------- 1 file changed, 1304 deletions(-) delete mode 100644 notebooks/Tutorial.ipynb diff --git a/notebooks/Tutorial.ipynb b/notebooks/Tutorial.ipynb deleted file mode 100644 index b70aca0..0000000 --- a/notebooks/Tutorial.ipynb +++ /dev/null @@ -1,1304 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "id": "def60811-bf0a-4acc-b907-922824f65fc0", - "metadata": { - "tags": [] - }, - "source": [ - "**HPO toolkit tutorial**\n", - "\n", - "This notebook shows how to install and use HPO toolkit for work with Human Phenotype Ontology (HPO).\n", - "\n", - "# Installation\n", - "\n", - "The toolkit is available at [PyPi](https://pypi.org/project/hpo-toolkit), so installation with `pip` is really easy:" - ] - }, - { - "cell_type": "raw", - "id": "a78a7544-7d66-468c-b2a6-8aec166ac406", - "metadata": {}, - "source": [ - "!pip install hpo-toolkit" - ] - }, - { - "cell_type": "markdown", - "id": "67f52cee-6b59-4284-a8b3-5bb004bde19b", - "metadata": {}, - "source": [ - "# API update\n", - "\n", - "As of `v0.1.5`, we re-export most commonly used classes from the top-level package to save some typing:\n", - "\n", - "| Class | Previous | New API (top-level reexport) |\n", - "|:-----------------|:-----------------------------------|:--------------------------------|\n", - "| TermId | `hpotk.model.TermId` | `hpotk.TermId` |\n", - "| Term | `hpotk.model.Term` | `hpotk.Term` |\n", - "| MinimalTerm | `hpotk.model.MinimalTerm` | `hpotk.MinimalTerm` |\n", - "| Synonym | `hpotk.model.Synonym` | `hpotk.Synonym` |\n", - "| SynonymType | `hpotk.model.SynonymType` | `hpotk.SynonymType` |\n", - "| SynonymCategory | `hpotk.model.SynonymCategory` | `hpotk.SynonymCategory` |\n", - "| OntologyGraph | `hpotk.graph.OntologyGraph` | `hpotk.OntologyGraph` |\n", - "| GraphAware | `hpotk.graph.GraphAware` | `hpotk.GraphAware` |\n", - "| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n", - "| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n", - "| MinimalOntology | `hpotk.ontology.MinimalOntology` | `hpotk.MinimalOntology` |\n" - ] - }, - { - "cell_type": "markdown", - "id": "6e41fb89-d16e-427f-8e53-ec92710af8c8", - "metadata": { - "tags": [] - }, - "source": [ - "# Load HPO\n", - "\n", - "HPO toolkit supports reading ontologies in [Obographs](https://github.com/geneontology/obographs) JSON format.\n", - "\n", - "We can download and open the latest HPO from\n", - "> https://raw.githubusercontent.com/obophenotype/human-phenotype-ontology/master/hp.json" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "id": "7df53743-202e-4786-9beb-be1322605e2b", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from hpotk import Ontology\n", - "from hpotk.ontology.load.obographs import load_ontology\n", - "\n", - "o: Ontology = load_ontology('http://purl.obolibrary.org/obo/hp.json')" - ] - }, - { - "cell_type": "markdown", - "id": "5278e1d4-506b-4136-a90c-e45715534914", - "metadata": {}, - "source": [ - "We start by importing the relevant parts and use `load_ontology` to load the latest HPO in the Obographs format. We get an instance of `Ontology` which is a container for ontology data.\n", - "\n", - "Let's see how HPO toolkit models the ontology data." - ] - }, - { - "cell_type": "markdown", - "id": "a98ea987-741a-42fa-8ae1-5924d77647ff", - "metadata": { - "tags": [] - }, - "source": [ - "# Ontology data model and functionality\n", - "\n", - "HPO toolkit includes several classes that serve as building blocks in the data model. This section provides basic information, starting from the bottom of the class hierarchy.\n", - "\n", - "- `hpotk.TermId` - an identifier of an ontology concept.\n", - "- `hpotk.MinimalTerm` - represents minimal useful information of the ontology concept. `MinimalTerm` has the following attributes:\n", - " - `identifier`, `TermId` (e.g. `HP:0001166`)\n", - " - `name`, `str` (e.g. `Arachnodactyly`)\n", - " - `is_current`/`is_obsolete`, whether or not the concept has been obsoleted\n", - " - `alt_term_ids`, a sequence of obsolete `TermId`s that represented the term previously\n", - "- `hpotk.Term` - the complete info regarding the ontology concept. The `Term` has all attributes of the `MinimalTerm` plus the following:\n", - " - `definition` - an optional description of the term in slightly more verbiage\n", - " - `comment` - additional comment (optional)\n", - " - `synonyms` - alternative designations of the `Term` (optional)\n", - " - `xrefs` - a sequence of cross-references between the `Term` and concepts from different databases\n", - "- `hpotk.MinimalOntology` - the container for ontology data that uses `MinimalTerm`s\n", - "- `hpotk.Ontology` - the ontology data container that contains `Term`s\n", - "- `hpotk.OntologyGraph` - a specification of graph for storing the ontology concept hierarchy and the required graph functionality. As long as the graph implements the methods, it can work with the rest of the toolkit framework\n", - "\n", - "Now, let's go over some examples to explore the provided functionality." - ] - }, - { - "cell_type": "markdown", - "id": "db15d7f4-8e17-4806-8d8e-da43738e2c63", - "metadata": {}, - "source": [ - "## Get all `Term`s and `TermId`s\n", - "\n", - "All `TermId`s, both *primary* and *obsolete* can be iterated over via `o.term_ids` property:" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "id": "3808f2a0-607d-49ac-b6d3-299cc37c04a4", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0000001\n" - ] - } - ], - "source": [ - "print(next(iter(o.term_ids)))" - ] - }, - { - "cell_type": "markdown", - "id": "5f2e0318-98d6-412f-94cd-88a38dd2f2de", - "metadata": {}, - "source": [ - "Similarly, we can iterate over ontology `Term`s via `hpo.terms`:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "id": "a7de11f9-c14f-4c7b-8b4c-2b35979b717a", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Term(identifier=HP:0000001, name=\"All\", definition=None, comment=Root of all terms in the Human Phenotype Ontology., synonyms=None, xrefs=(DefaultTermId(idx=4, value=UMLS:C0444868),), is_obsolete=False, alt_term_ids=\"()\")\n" - ] - } - ], - "source": [ - "print(next(iter(o.terms)))" - ] - }, - { - "cell_type": "markdown", - "id": "e15c75f3-4578-4a08-a7eb-3b5ff45b30f4", - "metadata": {}, - "source": [ - "and get the number of the primary (non-obsolete) `Term`s:" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "id": "3984e0ab-4540-4f78-b575-e7d18987a57b", - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "17490" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "len(o)" - ] - }, - { - "cell_type": "markdown", - "id": "fc5d1682-cbe0-4b9d-8d2d-77b8668e9a7f", - "metadata": {}, - "source": [ - "## Query `Term`\n", - "\n", - "### Test presence of a `TermId` in the ontology\n", - "\n", - "We can test presence of a `TermId` in the same way as we test presence of an item in a Python container:" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "id": "c986d756-6b64-400a-acf0-1e1215dd5149", - "metadata": {}, - "outputs": [], - "source": [ - "current_arachnodactyly_id = \"HP:0001166\" # as of Dec 28th, 2022\n", - "\n", - "assert current_arachnodactyly_id in o" - ] - }, - { - "cell_type": "markdown", - "id": "cead94e2-39b1-4b4a-a8b4-8e8a09a92367", - "metadata": {}, - "source": [ - "The test works for both *primary* and *obsolete* `TermId`s:" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "id": "d33352e5-ecc4-4246-bca4-4f537bdcd921", - "metadata": {}, - "outputs": [], - "source": [ - "obsolete_arachnodactyly_id = \"HP:0001505\"\n", - "\n", - "assert obsolete_arachnodactyly_id in o" - ] - }, - { - "cell_type": "markdown", - "id": "bc6012bc-f8e5-4cc7-918a-069c92fd4e62", - "metadata": {}, - "source": [ - "Queries work with a simple CURIE `str` (e.g. `HP:0001166`) or a `TermId`:" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "id": "b1d6b18d-b073-4e04-85ee-4f8198da149e", - "metadata": {}, - "outputs": [], - "source": [ - "from hpotk import TermId\n", - "assert current_arachnodactyly_id in o and TermId.from_curie(current_arachnodactyly_id) in o" - ] - }, - { - "cell_type": "markdown", - "id": "10ff2cc9-bb32-4c7e-9952-0402cd9f8543", - "metadata": {}, - "source": [ - "### Get a specific `Term`\n", - "\n", - "We can get ahold of a specific `Term` using the `get_term` method:" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "id": "5a4465bd-0bad-4b78-be82-79a4bc1a3dfe", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Term(identifier=HP:0001166, name=\"Arachnodactyly\", definition=Abnormally long and slender fingers (\"spider fingers\")., comment=None, synonyms=(Synonym(name=\"Long slender fingers\", category=SynonymCategory.EXACT, synonym_type=SynonymType.LAYPERSON_TERM, xrefs=\"None\"), Synonym(name=\"Long, slender fingers\", category=SynonymCategory.EXACT, synonym_type=None, xrefs=\"None\"), Synonym(name=\"Spider fingers\", category=SynonymCategory.EXACT, synonym_type=SynonymType.LAYPERSON_TERM, xrefs=\"None\")), xrefs=(DefaultTermId(idx=3, value=MSH:D054119), DefaultTermId(idx=11, value=SNOMEDCT_US:62250003), DefaultTermId(idx=4, value=UMLS:C0003706)), is_obsolete=False, alt_term_ids=\"(DefaultTermId(idx=2, value=HP:0001505),)\")\n" - ] - } - ], - "source": [ - "arachnodactyly = o.get_term(current_arachnodactyly_id)\n", - "print(arachnodactyly)" - ] - }, - { - "cell_type": "markdown", - "id": "70d5ee3d-2329-4094-8b9a-1c09dde7147f", - "metadata": {}, - "source": [ - "`get_term` always returns the primary `Term`, even for an obsolete `TermId`:" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "id": "7763a3f8-9487-4ccc-ac20-06333fe3fd37", - "metadata": {}, - "outputs": [], - "source": [ - "assert o.get_term(current_arachnodactyly_id) == o.get_term(obsolete_arachnodactyly_id)" - ] - }, - { - "cell_type": "markdown", - "id": "1f1250fa-6da0-4296-85c3-6e5907916e1a", - "metadata": {}, - "source": [ - "# Ontology algorithms\n", - "\n", - "The `hpo-toolkit` module provides several ontology algorithms.\n", - "\n", - "## Ontology traversal\n", - "\n", - "> As of `v0.1.6`, the ontology traversal should not be performed using methods from the `hpotk.algorithm` module.\n", - "> Use the methods defined on the ontology `graph` instead.\n", - "\n", - "Use `get_parents` and `get_ancestors` of the ontology `graph` to get an iterable with *parents* or *ancestors* of a `TermId`:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "id": "fd90ffa7-9cd4-4557-8c36-e7c1ba7b5c64", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "#################### Parents ####################\n", - "HP:0100807 - Long fingers\n", - "HP:0001238 - Slender finger\n", - "\n", - "#################### Ancestors ##################\n", - "HP:0000924 - Abnormality of the skeletal system\n", - "HP:0000001 - All\n", - "HP:0040064 - Abnormality of limbs\n", - "HP:0002817 - Abnormality of the upper limb\n", - "HP:0040068 - Abnormality of limb bone\n", - "HP:0001167 - Abnormal finger morphology\n", - "HP:0033127 - Abnormality of the musculoskeletal system\n", - "HP:0100807 - Long fingers\n", - "HP:0011844 - Abnormal appendicular skeleton morphology\n", - "HP:0001238 - Slender finger\n", - "HP:0002813 - Abnormality of limb bone morphology\n", - "HP:0000118 - Phenotypic abnormality\n", - "HP:0001155 - Abnormality of the hand\n", - "HP:0011297 - Abnormal digit morphology\n", - "HP:0011842 - Abnormal skeletal morphology\n" - ] - } - ], - "source": [ - "arachnodactyly_id = arachnodactyly.identifier\n", - "\n", - "print('#' * 20 + ' Parents ' + '#' * 20)\n", - "for parent in o.graph.get_parents(arachnodactyly_id):\n", - " term = o.get_term(parent)\n", - " print(f\"{term.identifier.value} - {term.name}\")\n", - "\n", - "print('\\n'+'#' * 20 + ' Ancestors ' + '#' * 18)\n", - "for parent in o.graph.get_ancestors(arachnodactyly_id):\n", - " term = o.get_term(parent)\n", - " print(f\"{term.identifier.value} - {term.name}\")" - ] - }, - { - "cell_type": "markdown", - "id": "e238c177-3e72-406f-b1e3-450373c6aa31", - "metadata": {}, - "source": [ - "We can get the *children* or *descendants* by calling `get_children` and `get_descendants`:" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "id": "e74d9f8c-20af-4a1d-a8b2-81ec90708b30", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "#################### Children ####################\n", - "HP:0001182 - Tapered finger\n", - "HP:0001166 - Arachnodactyly\n", - "\n", - "#################### Descendants #################\n", - "HP:0006216 - Single interphalangeal crease of fifth finger\n", - "HP:0006077 - Absent proximal finger flexion creases\n", - "HP:0001032 - Absent distal interphalangeal creases\n", - "HP:0005780 - Absent fourth finger distal interphalangeal crease\n" - ] - } - ], - "source": [ - "print('#' * 20 + ' Children ' + '#' * 20)\n", - "for child in o.graph.get_children(TermId.from_curie('HP:0100807')): # Long fingers\n", - " term = o.get_term(child)\n", - " print(f\"{term.identifier.value} - {term.name}\")\n", - " \n", - "print('\\n'+'#' * 20 + ' Descendants ' + '#' * 17)\n", - "for descendant in o.graph.get_descendants(TermId.from_curie('HP:0006109')): # Absent phalangeal crease \n", - " term = o.get_term(descendant)\n", - " print(f\"{term.identifier.value} - {term.name}\")" - ] - }, - { - "cell_type": "markdown", - "id": "c219878d-ae6e-4279-b7b7-993b4e2110dc", - "metadata": {}, - "source": [ - "## Term relationship tests\n", - "\n", - "We can check if a term is a *parent* or an *ancestor* of another term using the ontology graph:" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "id": "9f02ea88-694b-4d4a-a2cd-8e604d40cf18", - "metadata": {}, - "outputs": [], - "source": [ - "arachnodactyly = TermId.from_curie('HP:0001166')\n", - "abnormality_of_the_hand = TermId.from_curie('HP:0001155')\n", - "long_fingers = TermId.from_curie('HP:0100807')\n", - "\n", - "assert o.graph.is_parent_of(long_fingers, arachnodactyly)\n", - "assert o.graph.is_ancestor_of(abnormality_of_the_hand, arachnodactyly)" - ] - }, - { - "cell_type": "markdown", - "id": "5efc4a21-b115-4511-80fd-cf02d3afdc46", - "metadata": {}, - "source": [ - "Similarly, we can check if a term is a *child* or a *descendant* of another term:" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "id": "df6344dd-9b72-4d94-88aa-201a9578fc7b", - "metadata": {}, - "outputs": [], - "source": [ - "assert o.graph.is_child_of(arachnodactyly, long_fingers)\n", - "assert o.graph.is_descendant_of(arachnodactyly, long_fingers)" - ] - }, - { - "cell_type": "markdown", - "id": "a94bcf99-fb72-4b83-bc3d-805fd7a30cb6", - "metadata": {}, - "source": [ - "# Validation\n", - "\n", - "The toolkit provides functions for performing multiple useful sanity checks.\n", - "\n", - "The validators validate a sequence of `Identified` (a thing that has a `TermId` identifier) or `TermId`s.\n", - "\n", - "## Obsolete term IDs\n", - "\n", - "We should always use the primary term IDs instead of the obsolete terms.\n", - "\n", - "`HP:0100807` and `HP:0006010` are the primary and obsolete term IDs for *Long fingers*. \n", - "The `ObsoleteTermIdsValidator` points out usage of the obsolete term:" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "id": "3b2f58cc-e008-4153-96b5-1abbd4db9e47", - "metadata": {}, - "outputs": [], - "source": [ - "from hpotk import MinimalTerm\n", - "\n", - "from hpotk.validate import ValidationLevel\n", - "from hpotk.validate import ObsoleteTermIdsValidator\n", - "\n", - "obso_validator = ObsoleteTermIdsValidator(o)\n", - "\n", - "# The term uses an obsolete term ID `HP:0006010` instead of the current `HP:0100807`.\n", - "inputs = [\n", - " TermId.from_curie('HP:0006010')\n", - "]\n", - "results = obso_validator.validate(inputs)\n", - "\n", - "# At least one error or warning was found\n", - "assert results.is_ok() == False\n", - "\n", - "# A sequence of errors/warnings is availabe via `results` property\n", - "assert len(results.results) == 1\n", - "\n", - "validation_result = results.results[0]\n", - "\n", - "# Obsolete term ID is a warning. The toolkit can map the term ID to the primary term ID and use it in the downstream analyses\n", - "assert validation_result.level == ValidationLevel.WARNING\n", - "\n", - "# A unique ID of the validation check\n", - "assert validation_result.category == 'obsolete_term_id_is_used'\n", - "\n", - "# A message for human consumption\n", - "assert validation_result.message == 'Using the obsolete HP:0006010 instead of HP:0100807 for Long fingers'" - ] - }, - { - "cell_type": "markdown", - "id": "0d3f8049-0c39-49fc-a27e-889d3bb3bb12", - "metadata": {}, - "source": [ - "## Violation of the annotation propagation rule\n", - "\n", - "A set of HPO terms should not contain both term and its ancestor because the annotation propagation rule implies presence of all term's ancestors." - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "id": "0e668a54-c146-47be-921e-72ab29ee7d28", - "metadata": {}, - "outputs": [], - "source": [ - "from hpotk.validate import AnnotationPropagationValidator\n", - "\n", - "# Long fingers is a parent of Arachnodactyly\n", - "inputs = [\n", - " TermId.from_curie('HP:0001166'), # Arachnodactyly\n", - " TermId.from_curie('HP:0100807'), # Long fingers\n", - "]\n", - "\n", - "ap_validator = AnnotationPropagationValidator(o)\n", - "results = ap_validator.validate(inputs)\n", - "\n", - "val_result = results.results[0]\n", - "\n", - "# Violation of the annotation propagation rule is an ERROR.\n", - "# Most analyses will yield biased/incorrect results when using both the term and its ancestor.\n", - "assert val_result.level == ValidationLevel.ERROR\n", - "\n", - "# A unique ID of the validation check\n", - "assert val_result.category == 'annotation_propagation'\n", - "\n", - "# A message for human consumption\n", - "assert val_result.message == 'Terms should not contain both present Arachnodactyly [HP:0001166] and its present or excluded ancestor Long fingers [HP:0100807]'" - ] - }, - { - "cell_type": "markdown", - "id": "d4352e03-9141-4d2d-8ab7-3b33b6f0e218", - "metadata": {}, - "source": [ - "## Terms are phenotypic features\n", - "\n", - "Most algorithms require a list of phenotypic features (i.e. descendants of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)) \n", - "but HPO contains terms that are does not represent phenotypic abnormality, such as [Clinical modifier](https://hpo.jax.org/app/browse/term/HP:0012823), [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005), etc.\n", - "\n", - "`PhenotypicAbnormalityValidator` reports all terms that are not descendants of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)." - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "id": "b575229e-1c08-4d02-aaa3-012c9a8c80bd", - "metadata": {}, - "outputs": [], - "source": [ - "from hpotk.validate import PhenotypicAbnormalityValidator\n", - "\n", - "# Long fingers is a parent of Arachnodactyly\n", - "inputs = [\n", - " TermId.from_curie('HP:0000007'), # Autosomal recessive inheritance\n", - " TermId.from_curie('HP:0100807'), # Long fingers\n", - "]\n", - "\n", - "pa_validator = PhenotypicAbnormalityValidator(o)\n", - "results = pa_validator.validate(inputs)\n", - "\n", - "val_result = results.results[0]\n", - "\n", - "# Using non-phenotypic abnormality is a WARNING.\n", - "assert val_result.level == ValidationLevel.WARNING\n", - "\n", - "# A unique ID of the validation check\n", - "assert val_result.category == 'phenotypic_abnormality_descendant'\n", - "\n", - "# A message for human consumption\n", - "assert val_result.message == 'Autosomal recessive inheritance [HP:0000007] is not a descendant of Phenotypic abnormality [HP:0000118]'" - ] - }, - { - "cell_type": "markdown", - "id": "b5632af0-339d-4de1-ad82-9aea62e67f8c", - "metadata": {}, - "source": [ - "## Perform multiple checks at the same time\n", - "\n", - "The checks can be performed individually, as described in the previous sections, or together within a single validation run using `ValidationRunner`." - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "id": "654026c8-0e98-45ee-b245-0fa72f8498c3", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Using the obsolete HP:0006010 instead of HP:0100807 for Long fingers\n", - "Terms should not contain both present Arachnodactyly [HP:0001166] and its present or excluded ancestor Long fingers [HP:0100807]\n", - "Autosomal recessive inheritance [HP:0000007] is not a descendant of Phenotypic abnormality [HP:0000118]\n" - ] - } - ], - "source": [ - "from hpotk.validate import ValidationRunner\n", - "\n", - "inputs = [\n", - " TermId.from_curie('HP:0000007'), # Autosomal recessive inheritance\n", - " TermId.from_curie('HP:0006010'), # Long fingers\n", - " TermId.from_curie('HP:0001166') # Arachnodactyly\n", - "]\n", - "\n", - "runner = ValidationRunner(validators=[obso_validator, ap_validator, pa_validator])\n", - "results = runner.validate_all(inputs)\n", - "for issue in results.results:\n", - " print(issue.message)" - ] - }, - { - "cell_type": "markdown", - "id": "8f505032-013b-4cd3-a098-4d60834c3696", - "metadata": {}, - "source": [ - "# HPO constants\n", - "\n", - "The toolkit provides `TermId`s of the well-established and stable HPO concepts.\n", - "\n", - "## Base terms\n", - "\n", - "The terms that define the first level of HPO hierarchy (e.g. [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118)):" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "id": "215903d1-5c3f-4b0f-8fa0-486ef272f5a8", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0000118\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.base import PHENOTYPIC_ABNORMALITY\n", - "print(PHENOTYPIC_ABNORMALITY)" - ] - }, - { - "cell_type": "markdown", - "id": "d9c7c354-aab7-49f7-b7d8-e25eba3c6c8c", - "metadata": {}, - "source": [ - "## Frequency\n", - "\n", - "Concepts to represent frequency of phenotypic abnormalities within a patient cohort, the children of [Frequency](https://hpo.jax.org/app/browse/term/HP:0040279) (e.g. [Occasional](https://hpo.jax.org/app/browse/term/HP:0040283)):" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "id": "0b1e1b30-e70e-4ec9-89c9-5c702d24ea25", - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0040283\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.frequency import OCCASIONAL\n", - "print(OCCASIONAL)" - ] - }, - { - "cell_type": "markdown", - "id": "85a44d5b-3e17-454a-a8b8-aee0ca043148", - "metadata": {}, - "source": [ - "On top of the frequency `TermId` constants, the toolkit provides the `HpoFrequency` class that groups frequency `TermId`s with the frequency bounds:" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "id": "5c4bf517-3e4c-4444-9ada-8bdd9cdcbbbe", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "HpoFrequency(identifier=HP:0040283, lower_bound=0.05, upper_bound=0.29)" - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from hpotk.constants.hpo.frequency import parse_hpo_frequency\n", - "\n", - "parse_hpo_frequency(OCCASIONAL)" - ] - }, - { - "cell_type": "markdown", - "id": "c970b5e8-3ce6-4c92-8b85-dcd926577811", - "metadata": {}, - "source": [ - "The lookup works both for a `str` CURIE and a `TermId`:" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "id": "03e7bc93-a58e-422e-a9e1-b04f40956dfb", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "occasional_id = 'HP:0040283'\n", - "assert parse_hpo_frequency(OCCASIONAL) == parse_hpo_frequency(occasional_id)" - ] - }, - { - "cell_type": "markdown", - "id": "3d646525-854d-4b8a-bf49-f40f4eb0a977", - "metadata": {}, - "source": [ - "## Inheritance\n", - "\n", - "Selected descendents of [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005) (e.g. [Autosomal dominant inheritance](https://hpo.jax.org/app/browse/term/HP:0000006)):" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "id": "7daaab3e-6ead-4230-88d5-b8c080b3730b", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0000006\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.inheritance import AUTOSOMAL_DOMINANT_INHERITANCE\n", - "print(AUTOSOMAL_DOMINANT_INHERITANCE)" - ] - }, - { - "cell_type": "markdown", - "id": "b84fb3f0-5cb0-4ccd-bfea-698896e5bbbd", - "metadata": {}, - "source": [ - "## Onset\n", - "\n", - "All descendents of [Onset](https://hpo.jax.org/app/browse/term/HP:0003674) (e.g. [Congenital onset](https://hpo.jax.org/app/browse/term/HP:0003577)):" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "id": "d7bf2281-d8ed-4373-b094-ef7df0c60aa8", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0003577\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.onset import CONGENITAL_ONSET\n", - "print(CONGENITAL_ONSET)" - ] - }, - { - "cell_type": "markdown", - "id": "b4c4ce00-09ec-4860-ad08-2353cb2a4df0", - "metadata": {}, - "source": [ - "## Organ system\n", - "\n", - "Children of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118) that correspond to abnormalities of organ systems (e.g. [Abnormality of the ear](https://hpo.jax.org/app/browse/term/HP:0000598)):" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "id": "2ddccd09-43d8-448d-8df1-f55ae8eeb658", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0000598\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.organ_system import ABNORMALITY_OF_THE_EAR\n", - "print(ABNORMALITY_OF_THE_EAR)" - ] - }, - { - "cell_type": "markdown", - "id": "51c144fc-39aa-4a39-abff-44d2f8b27c10", - "metadata": {}, - "source": [ - "## Severity\n", - "\n", - "All descendents of [Severity](https://hpo.jax.org/app/browse/term/HP:0012824) (e.g. [Mild](https://hpo.jax.org/app/browse/term/HP:0012825)):" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "id": "0c183a9c-802b-43a1-8bf9-b0e81e5a26f1", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HP:0012825\n" - ] - } - ], - "source": [ - "from hpotk.constants.hpo.severity import MILD\n", - "print(MILD)" - ] - }, - { - "cell_type": "markdown", - "id": "e19beec0-d10d-488c-b550-30716629333d", - "metadata": {}, - "source": [ - "# HPO annotations\n", - "\n", - "HPO toolkit provides a data model and IO for working with Mendelian diseases.\n", - "\n", - "The data model consists of the following classes:\n", - "- `hpotk.annotations.HpoDiseaseAnnotation` - represents data available for a phenotypic feature of a disease, such as the feature identifier (`TermId`), frequency of occurrence (`hpotk.annotations.Ratio`), references (a list of `hpotk.annotations.AnnotationReference`s), and modifiers (a list of `TermId`s that should be the descendants of the [Clinical modifier](https://hpo.jax.org/app/browse/term/HP:0012823) HPO concept).\n", - "- `hpotk.annotations.HpoDisease` - the disease model that has an identifier (`TermId`), name (`str`), phenotypic features/annotations (a collection of `HpoDiseaseAnnotation`s), and modes of inheritance (a collection of `TermId`s from the [Mode of inheritance](https://hpo.jax.org/app/browse/term/HP:0000005) HPO sub-hierarchy)" - ] - }, - { - "cell_type": "markdown", - "id": "bf852b18-ef17-445a-bc50-3e53b728e399", - "metadata": {}, - "source": [ - "## Load HPO annotations\n", - "\n", - "The toolkit provides parser for reading HPO annotation file format. Get the latest annotation from the [HPO download section](https://hpo.jax.org/app/data/annotations):\n", - "> http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "id": "9b5e5216-abde-4b8e-888b-fd12b43ce92e", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "'Parsed 12467 diseases'" - ] - }, - "execution_count": 26, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from hpotk.annotations import HpoDiseases\n", - "from hpotk.annotations.load.hpoa import SimpleHpoaDiseaseLoader\n", - "\n", - "loader = SimpleHpoaDiseaseLoader(o)\n", - "diseases: HpoDiseases = loader.load('http://purl.obolibrary.org/obo/hp/hpoa/phenotype.hpoa')\n", - "f'Parsed {len(diseases)} diseases'" - ] - }, - { - "cell_type": "markdown", - "id": "d04ec8a4-0c36-43b2-ac42-bb678fec5a36", - "metadata": {}, - "source": [ - "The loader parser the HPO annotation file into `HpoDiseases`; a container of disease models. \n", - "\n", - "The container supports iteration over diseases," - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "id": "b4303dc6-a653-4720-8a50-0600d655b679", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HpoDisease(identifier=OMIM:619340, name=Developmental and epileptic encephalopathy 96, n_annotations=9)\n" - ] - } - ], - "source": [ - "print(next(iter(diseases.items)))" - ] - }, - { - "cell_type": "markdown", - "id": "bab5bd7d-b075-4ffa-a14f-678da78a9520", - "metadata": {}, - "source": [ - "iteration over disease IDs," - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "id": "45035d68-baad-4e69-ab51-5e9c9e0aad86", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "OMIM:619340\n" - ] - } - ], - "source": [ - "# Iterate over disease identifiers\n", - "print(next(iter(diseases.item_ids())))" - ] - }, - { - "cell_type": "markdown", - "id": "63df5d07-37e0-4235-b722-fe9382aff471", - "metadata": {}, - "source": [ - "and lookup of a disease by ID (either a CURIE `str` or a `TermId`)" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "id": "e061b854-2218-4355-8a9f-d6ef9988f588", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "HpoDisease(identifier=OMIM:154700, name=Marfan syndrome, n_annotations=69)\n" - ] - } - ], - "source": [ - "# Look up the disease by ID\n", - "print(diseases['OMIM:154700'])" - ] - }, - { - "cell_type": "markdown", - "id": "4fa861f1-1e74-4aa2-84c1-ed793c62857b", - "metadata": {}, - "source": [ - "Each `HpoDisease` has the following attributes:\n", - "- `identifier` - `TermId` corresponding to the identifier\n", - "- `name` - disease name (e.g. *Marfan syndrome*)\n", - "- `annotations` - a collection of disease annotations\n", - "- `modes_of_inheritance` - a collection of modes of inheritance" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "id": "2faad640-4765-49c5-9eba-535018782e5b", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ID: OMIM:154700\n", - "Name: Marfan syndrome\n", - "The annotation count: 69\n", - "The first 5 annotations: Astigmatism, Limited elbow extension, Strabismus, Mitral annular calcification, Flexion contracture\n", - "Modes of inheritance: ['Autosomal dominant inheritance']\n" - ] - } - ], - "source": [ - "marfan = diseases['OMIM:154700']\n", - "print(f'ID: {marfan.identifier}')\n", - "print(f'Name: {marfan.name}')\n", - "marfan_features = [o.get_term(a.identifier).name for a in marfan.annotations]\n", - "print(f'The annotation count: {len(marfan.annotations)}')\n", - "print(f'The first 5 annotations: {\", \".join(marfan_features[:5])}')\n", - "mois = [o.get_term(moi).name for moi in marfan.modes_of_inheritance]\n", - "print(f'Modes of inheritance: {mois}')" - ] - }, - { - "cell_type": "markdown", - "id": "4d950864-22b6-415c-9991-bec4ba1f435e", - "metadata": {}, - "source": [ - "The HPO annotations API is still evolving. Stay tuned for more features to come!" - ] - }, - { - "cell_type": "markdown", - "id": "2c94a367-e7a7-4b24-8aa8-3cdd9449bb12", - "metadata": {}, - "source": [ - "# Semantic similarity\n", - "\n", - "## Information content of ontology terms\n", - "\n", - "We can calculate information content (IC) for all ontology terms and obtain `AnnotationIcContainer`:" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "id": "fe09955e-9a7d-49dc-9cd7-e7db4b876af5", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "from hpotk.algorithm.similarity import calculate_ic_for_annotated_items, AnnotationIcContainer\n", - "\n", - "term_id2ic: AnnotationIcContainer = calculate_ic_for_annotated_items(diseases, o)" - ] - }, - { - "cell_type": "markdown", - "id": "000836e9-98da-4fdc-8c76-eab5c163d779", - "metadata": {}, - "source": [ - "`AnnotationIcContainer` is a mapping from `TermId` to IC (`float`):" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "id": "7ad52558-e398-413f-8d65-0e67d6099d42", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "IC of Arachnodactyly: 7.251653654960611 nats\n" - ] - } - ], - "source": [ - "ic_arachnodactyly = term_id2ic[arachnodactyly]\n", - "print(f'IC of Arachnodactyly: {ic_arachnodactyly} nats')" - ] - }, - { - "cell_type": "markdown", - "id": "341f7123-3606-4a6f-be77-2944af843412", - "metadata": {}, - "source": [ - "By default, the `base` parameter of the `calculate_ic_for_annotated_items` is set to $e$ (Euler's number), returning the IC values in [nats](https://en.wikipedia.org/wiki/Nat_(unit)). Set `base=2` to get the IC in bits." - ] - }, - { - "cell_type": "markdown", - "id": "caab19bd-c09a-4e2f-84d3-402f3ee40635", - "metadata": {}, - "source": [ - "### Information content of an ontology module\n", - "\n", - "We can calculate IC for an ontology module - a set of descendants of an ontology term. In case of HPO, it makes sense to analyze \n", - "descendants of [Phenotypic abnormality](https://hpo.jax.org/app/browse/term/HP:0000118):\n", - "\n", - "```python\n", - "from hpotk.constants.hpo.base import PHENOTYPIC_ABNORMALITY\n", - "term_id2ic = calculate_ic_for_annotated_items(diseases, o, module_root=PHENOTYPIC_ABNORMALITY)\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "c4ac12fb-7057-4f5b-b907-ee74f808b2c9", - "metadata": {}, - "source": [ - "### Use pseudocount for the very rare terms\n", - "\n", - "By default the IC is calculated only for ontology terms that are used at least once to annotate the annotated items (e.g. diseases). However, we may want to use \"pseudocount\" technique and set `count=1` for the very rare terms, resulting in a large IC value:\n", - "\n", - "```python\n", - "term_id2ic = calculate_ic_for_annotated_items(diseases, o, use_pseudocount=True)\n", - "```" - ] - }, - { - "cell_type": "markdown", - "id": "255a4fbf-f1e9-40e8-8efa-6b91724968a5", - "metadata": {}, - "source": [ - "### Metadata\n", - "\n", - "The versions of the items and ontology used to calculate the ICs are preserved in the `metadata` attribute of the `AnnotationIcContainer`:" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "id": "deb939b7-847f-4b9f-a823-61ea8e9cbb1a", - "metadata": { - "tags": [] - }, - "outputs": [ - { - "data": { - "text/plain": [ - "{'annotated_items_version': '2023-07-21', 'ontology_version': '2023-07-21'}" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "term_id2ic.metadata" - ] - }, - { - "cell_type": "markdown", - "id": "cb372d93-34c1-48b1-a097-9b174dca54c8", - "metadata": {}, - "source": [ - "## Resnik similarity\n", - "\n", - "We can use the term IC to calculate semantic similarity of items (e.g. diseases, patients) annotated with ontology concepts.\n", - "\n", - "Resnik similarity uses the information content of the *most informative common ancestor* ($IC_{MICA}$ to calculate semantic similarity. `hpo-toolkit` offers a function for calculating $IC_{MICA}$ for all ontology concept pairs:" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "id": "6f4dec17-d7df-442a-b803-2aabe86ca1b4", - "metadata": {}, - "outputs": [], - "source": [ - "from hpotk.algorithm.similarity import precalculate_ic_mica_for_hpo_concept_pairs, SimilarityContainer\n", - "\n", - "# sc: SimilarityContainer = precalculate_ic_mica_for_hpo_concept_pairs(term_id2ic, o)" - ] - }, - { - "cell_type": "markdown", - "id": "b0b9de40-efa5-4b28-bc95-f86e933faf1e", - "metadata": {}, - "source": [ - "The function is commented out/not run in the notebook because it is a bit slow..\n", - "\n", - "However, for the purpose of the demonstration, let's assume we have the `SimilarityContainer` on hand." - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "id": "bea2ed63-59bc-4081-a9e8-d81ec6d06d80", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "sc = SimilarityContainer()" - ] - }, - { - "cell_type": "markdown", - "id": "d6c99e9a-37e7-4fdd-96b8-dec09e6f5004", - "metadata": {}, - "source": [ - "We can query the $IC_{MICA}$ of two HPO terms by:" - ] - }, - { - "cell_type": "code", - "execution_count": 36, - "id": "0ade7bf4-7d4c-4f5d-b612-dd3f8d9e2c57", - "metadata": { - "tags": [] - }, - "outputs": [], - "source": [ - "a = 'HP:0001166' # Arachnodactyly\n", - "b = 'HP:0001182' # Tapered finger\n", - "ic_mica = sc.get_similarity(a, b)" - ] - }, - { - "cell_type": "markdown", - "id": "a3634603-e270-4b7b-a8c1-bc4813b3066a", - "metadata": {}, - "source": [ - "That's it for now!" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "HPO Toolkit (Python 3.10)", - "language": "python", - "name": "hpotk" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.10.6" - }, - "toc-autonumbering": true - }, - "nbformat": 4, - "nbformat_minor": 5 -} From 474e50f18661708d355de645473a017ecae8c2df Mon Sep 17 00:00:00 2001 From: Daniel Danis Date: Tue, 28 Nov 2023 17:13:57 -0500 Subject: [PATCH 3/4] Update README.md --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b169074..0cb8893 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ ![PyPi downloads](https://img.shields.io/pypi/dm/hpo-toolkit.svg?label=Pypi%20downloads) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/hpo-toolkit) -A toolkit for working with Human Phenotype Ontology in Python. +A toolkit for working with Human Phenotype Ontology (HPO) and HPO disease annotations in Python. Loading HPO is as simple as: @@ -14,7 +14,7 @@ import hpotk hpo = hpotk.load_ontology('http://purl.obolibrary.org/obo/hp.json') ``` -Loading HPO annotations is accomplished by running: +The HPO disease annotations are loaded by running: ```python from hpotk.annotations.load.hpoa import SimpleHpoaDiseaseLoader @@ -28,7 +28,7 @@ diseases = loader.load(hpoa_path) assert len(diseases) == 12_468 ``` -Check out the User guide and the API reference for more info: +Find more info in our detailed documentation: - [Stable documentation](https://thejacksonlaboratory.github.io/hpo-toolkit/stable) (last release on `main` branch) - [Latest documentation](https://thejacksonlaboratory.github.io/hpo-toolkit/latest) (bleeding edge, latest commit on `development` branch) From 82736bc7e6fa596fe97967f2c34dd86b71343743 Mon Sep 17 00:00:00 2001 From: Daniel Danis Date: Wed, 29 Nov 2023 16:10:00 -0500 Subject: [PATCH 4/4] Add a section to user guide regarding `hpotk.validate` package. --- docs/user-guide/index.rst | 1 + .../validate-phenotypic-features.rst | 150 ++++++++++++++++++ 2 files changed, 151 insertions(+) create mode 100644 docs/user-guide/validate-phenotypic-features.rst diff --git a/docs/user-guide/index.rst b/docs/user-guide/index.rst index c3a93fc..9bd4eb9 100644 --- a/docs/user-guide/index.rst +++ b/docs/user-guide/index.rst @@ -13,5 +13,6 @@ This guide includes self-contained tutorials for using HPO toolkit to work with load-ontology use-hierarchy load-hpo-annotations + validate-phenotypic-features sort-term-ids diff --git a/docs/user-guide/validate-phenotypic-features.rst b/docs/user-guide/validate-phenotypic-features.rst new file mode 100644 index 0000000..46a7f73 --- /dev/null +++ b/docs/user-guide/validate-phenotypic-features.rst @@ -0,0 +1,150 @@ +.. _validate-phenotypic-features: + +============================ +Validate phenotypic features +============================ + +One of the most convenient benefits of using ontology concepts for annotation of items with features and properties +is that one can leverage the information encoded in the semantic relationships of the ontology hierarchy. +Among other things, the semantic relationships empower fuzzy searches and similarity measures that have found +broad usage in many fields, such as biomedicine. + +HPO became a *de facto* standard for representing phenotypic features - the signs and symptoms of an individual. +However, unlike in the case of other ontologies, several unique rules should be followed to maximize the benefits +of HPO annotations. In the sections below, we describe the rules and show how HPO toolkit can reveal their violations. + +For the sake of this guide, let's assume we have an individual annotated with the following four phenotypic features: + +* *Arachnodactyly* +* *Seizure* +* *Focal clonic seizure* +* *Enuresis nocturna* + +.. doctest:: check-consistency + + >>> curies = [ + ... 'HP:0001505', # Arachnodactyly + ... 'HP:0001250', # Seizure + ... 'HP:0002266', # Focal clonic seizure + ... 'HP:0010677' # Enuresis nocturna + ... ] + +Let's convert the CURIEs into term ids: + +.. doctest:: check-consistency + + >>> import hpotk + >>> term_ids = [hpotk.TermId.from_curie(curie) for curie in curies] + +and let's finish the setup by loading the toy HPO shipped with the documentation. + +.. doctest:: check-consistency + + >>> hpo = hpotk.load_minimal_ontology('data/hp.toy.json') + + +Do not use obsolete term ids +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As the first rule, the annotations should always use the current identifier. During ontology development, +some concepts may become obsolete and later be removed from the ontology altogether. +Most of the time, however, the removed concepts have a straightforward replacement. + +The :class:`hpotk.validate.ObsoleteTermIdsValidator` points out the usage of obsolete term ids +and suggests the replacement. + +Let's create the validator and check if the phenotypic features are OK: + +.. doctest:: check-consistency + + >>> from hpotk.validate import ObsoleteTermIdsValidator + >>> obs_val = ObsoleteTermIdsValidator(hpo) + + >>> vr = obs_val.validate(term_ids) + +The validator returns back an instance of :class:`hpotk.validate.ValidationResults` with the validation output. +We can check for presence of issues in the input: + +.. doctest:: check-consistency + + >>> vr.is_ok() + False + +The input is *not* OK, so we should look at the issues in greater detail: + +.. doctest:: check-consistency + + >>> for validation_result in vr.results: + ... print(validation_result) + ValidationResult(level=, category='obsolete_term_id_is_used', message='Using the obsolete HP:0001505 instead of HP:0001166 for Arachnodactyly') + +We see that the `HP:0001505` is obsolete and `HP:0001166` should be used as the new *Arachnodactyly* identifier. + + +Phenotypic features should be descendants of *Phenotypic abnormality* +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +HPO hierarchy has several major branches to uniquely represent concepts such as clinical modifiers, modes of inheritance, +and past medical medical history. However, the signs and symptoms should be encoded into descendants +of *Phenotypic abnormality*. + +:class:`hpotk.validate.PhenotypicAbnormalityValidator` checks that all identifiers correspond to descendants +of *Phenotypic abnormality*: + +Let's test that this is valid for the patient features: + +.. doctest:: check-consistency + + >>> from hpotk.validate import PhenotypicAbnormalityValidator + >>> pa_val = PhenotypicAbnormalityValidator(hpo) + + >>> vr = pa_val.validate(term_ids) + >>> vr.is_ok() + True + +Yes, the all term ids represent the descendants of *Phenotypic abnormality*. + + +Phenotypic features should not violate the annotation propagation rule +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Last and most importantly, let's discuss the concept of annotation redundancy. +HPO uses `is_a` to represent the edges of the ontology hierarchy graph. The edges model the "parent-child" +relationships between two ontology concepts (the object subsumes the subject). + +When using HPO concepts to encode clinical features of an individual, presence of a concept implies presence +of all ancestral concepts. This is also known as the "True path rule", where the annotation "propagates" across +the concept ancestors. + +In general, using the same annotation more than once is considered an error (e.g. annotate the subject with *Focal clonic seizure* +and *Focal clonic seizure*). However, thanks to the True path rule, using a concept and its ancestor is an offender +of a similar kind. + +:class:`hpotk.validate.AnnotationPropagationValidator` checks if a set of terms violate the annotation propagation rule +- if a collection of concepts contains a term and its ancestor. + +.. doctest:: check-consistency + + >>> from hpotk.validate import AnnotationPropagationValidator + >>> ap_val = AnnotationPropagationValidator(hpo) + + >>> vr = ap_val.validate(term_ids) + >>> vr.is_ok() + False + +There seems to an issue. Let's break it down: + +.. doctest:: check-consistency + + >>> for validation_result in vr.results: + ... print(validation_result.level) + ... print(validation_result.category) + ... print(validation_result.message) + ValidationLevel.ERROR + annotation_propagation + Terms should not contain both present Focal clonic seizure [HP:0002266] and its present or excluded ancestor Seizure [HP:0001250] + +The validator points out that *Seizure* is an ancestor of *Focal clonic seizure* and should, therefore, not be used +as an annotation of the individual. + +That's it for now. There are more validators to come!