Skip to content

Commit

Permalink
Merge pull request #26 from TheJacksonLaboratory/release_v0.1.5
Browse files Browse the repository at this point in the history
Release v0.1.5
  • Loading branch information
ielis authored May 12, 2023
2 parents 12104ff + ed56cd8 commit 675479a
Show file tree
Hide file tree
Showing 46 changed files with 24,326 additions and 309 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ windows-latest, macOS-latest, ubuntu-latest ]
os: [ ubuntu-latest ]
python: [ "3.11" ]

steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
fail-fast: false
matrix:
os: [ windows-latest, macOS-latest, ubuntu-latest ]
python: [ "3.8", "3.10", "3.11" ]
python: [ "3.8", "3.9", "3.10", "3.11" ]

steps:
- uses: actions/checkout@v2
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# hpo-toolkit

![Build status](https://img.shields.io/github/actions/workflow/status/TheJacksonLaboratory/hpo-toolkit/python_ci.yml)
![PyPi downloads](https://img.shields.io/pypi/dm/hpo-toolkit.svg?label=Pypi%20downloads)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/hpo-toolkit)

A toolkit for working with Human Phenotype Ontology in Python

## Install
Expand Down
214 changes: 202 additions & 12 deletions notebooks/Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,30 @@
"!pip install hpo-toolkit"
]
},
{
"cell_type": "markdown",
"id": "67f52cee-6b59-4284-a8b3-5bb004bde19b",
"metadata": {},
"source": [
"# API update\n",
"\n",
"As of `v0.1.5`, we re-export most commonly used classes from the top-level package to save some typing:\n",
"\n",
"| Class | Previous | New API (top-level reexport) |\n",
"|:-----------------|:-----------------------------------|:--------------------------------|\n",
"| TermId | `hpotk.model.TermId` | `hpotk.TermId` |\n",
"| Term | `hpotk.model.Term` | `hpotk.Term` |\n",
"| MinimalTerm | `hpotk.model.MinimalTerm` | `hpotk.MinimalTerm` |\n",
"| Synonym | `hpotk.model.Synonym` | `hpotk.Synonym` |\n",
"| SynonymType | `hpotk.model.SynonymType` | `hpotk.SynonymType` |\n",
"| SynonymCategory | `hpotk.model.SynonymCategory` | `hpotk.SynonymCategory` |\n",
"| OntologyGraph | `hpotk.graph.OntologyGraph` | `hpotk.OntologyGraph` |\n",
"| GraphAware | `hpotk.graph.GraphAware` | `hpotk.GraphAware` |\n",
"| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n",
"| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n",
"| MinimalOntology | `hpotk.ontology.MinimalOntology` | `hpotk.MinimalOntology` |\n"
]
},
{
"cell_type": "markdown",
"id": "6e41fb89-d16e-427f-8e53-ec92710af8c8",
Expand All @@ -48,7 +72,7 @@
},
"outputs": [],
"source": [
"from hpotk.ontology import Ontology\n",
"from hpotk import Ontology\n",
"from hpotk.ontology.load.obographs import load_ontology\n",
"\n",
"o: Ontology = load_ontology('http://purl.obolibrary.org/obo/hp.json')"
Expand All @@ -75,20 +99,20 @@
"\n",
"HPO toolkit includes several classes that serve as building blocks in the data model. This section provides basic information, starting from the bottom of the class hierarchy.\n",
"\n",
"- `hpotk.model.TermId` - an identifier of an ontology concept.\n",
"- `hpotk.model.MinimalTerm` - represents minimal useful information of the ontology concept. `MinimalTerm` has the following attributes:\n",
"- `hpotk.TermId` - an identifier of an ontology concept.\n",
"- `hpotk.MinimalTerm` - represents minimal useful information of the ontology concept. `MinimalTerm` has the following attributes:\n",
" - `identifier`, `TermId` (e.g. `HP:0001166`)\n",
" - `name`, `str` (e.g. `Arachnodactyly`)\n",
" - `is_current`/`is_obsolete`, whether or not the concept has been obsoleted\n",
" - `alt_term_ids`, a sequence of obsolete `TermId`s that represented the term previously\n",
"- `hpotk.model.Term` - the complete info regarding the ontology concept. The `Term` has all attributes of the `MinimalTerm` plus the following:\n",
"- `hpotk.Term` - the complete info regarding the ontology concept. The `Term` has all attributes of the `MinimalTerm` plus the following:\n",
" - `definition` - an optional description of the term in slightly more verbiage\n",
" - `comment` - additional comment (optional)\n",
" - `synonyms` - alternative designations of the `Term` (optional)\n",
" - `xrefs` - a sequence of cross-references between the `Term` and concepts from different databases\n",
"- `hpotk.ontology.MinimalOntology` - the container for ontology data that uses `MinimalTerm`s\n",
"- `hpotk.ontology.Ontology` - the ontology data container that contains `Term`s\n",
"- `hpotk.graph.OntologyGraph` - a specification of graph for storing the ontology concept hierarchy and the required graph functionality. As long as the graph implements the methods, it can work with the rest of the toolkit framework\n",
"- `hpotk.MinimalOntology` - the container for ontology data that uses `MinimalTerm`s\n",
"- `hpotk.Ontology` - the ontology data container that contains `Term`s\n",
"- `hpotk.OntologyGraph` - a specification of graph for storing the ontology concept hierarchy and the required graph functionality. As long as the graph implements the methods, it can work with the rest of the toolkit framework\n",
"\n",
"Now, let's go over some examples to explore the provided functionality."
]
Expand Down Expand Up @@ -235,7 +259,7 @@
"metadata": {},
"outputs": [],
"source": [
"from hpotk.model import TermId\n",
"from hpotk import TermId\n",
"assert current_arachnodactyly_id in o and TermId.from_curie(current_arachnodactyly_id) in o"
]
},
Expand Down Expand Up @@ -400,6 +424,8 @@
"\n",
"The toolkit provides functions for performing multiple useful sanity checks.\n",
"\n",
"The validators validate a sequence of `Identified` (a thing that has a `TermId` identifier) or `TermId`s.\n",
"\n",
"## Obsolete term IDs\n",
"\n",
"We should always use the primary term IDs instead of the obsolete terms.\n",
Expand All @@ -415,7 +441,7 @@
"metadata": {},
"outputs": [],
"source": [
"from hpotk.model import MinimalTerm\n",
"from hpotk import MinimalTerm\n",
"\n",
"from hpotk.validate import ValidationLevel\n",
"from hpotk.validate import ObsoleteTermIdsValidator\n",
Expand All @@ -424,7 +450,7 @@
"\n",
"# The term uses an obsolete term ID `HP:0006010` instead of the current `HP:0100807`.\n",
"inputs = [\n",
" MinimalTerm.create_minimal_term(TermId.from_curie('HP:0006010'), name='Long fingers', alt_term_ids=[], is_obsolete=False)\n",
" TermId.from_curie('HP:0006010')\n",
"]\n",
"results = obso_validator.validate(inputs)\n",
"\n",
Expand Down Expand Up @@ -891,7 +917,7 @@
}
],
"source": [
"print(next(iter(diseases.diseases)))"
"print(next(iter(diseases.items)))"
]
},
{
Expand Down Expand Up @@ -920,7 +946,7 @@
],
"source": [
"# Iterate over disease identifiers\n",
"print(next(iter(diseases.disease_ids)))"
"print(next(iter(diseases.item_ids())))"
]
},
{
Expand Down Expand Up @@ -1003,6 +1029,170 @@
"The HPO annotations API is still evolving. Stay tuned for more features to come!"
]
},
{
"cell_type": "markdown",
"id": "2c94a367-e7a7-4b24-8aa8-3cdd9449bb12",
"metadata": {},
"source": [
"# Semantic similarity\n",
"\n",
"## Information content of ontology terms\n",
"\n",
"We can calculate information content (IC) for all ontology terms and obtain `AnnotationIcContainer`:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "fe09955e-9a7d-49dc-9cd7-e7db4b876af5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from hpotk.algorithm.similarity import calculate_ic_for_annotated_items, AnnotationIcContainer\n",
"\n",
"term_id2ic: AnnotationIcContainer = calculate_ic_for_annotated_items(diseases, o)"
]
},
{
"cell_type": "markdown",
"id": "000836e9-98da-4fdc-8c76-eab5c163d779",
"metadata": {},
"source": [
"`AnnotationIcContainer` is a mapping from `TermId` to IC (`float`):"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "7ad52558-e398-413f-8d65-0e67d6099d42",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"IC of Arachnodactyly: 7.259769077238713 nats\n"
]
}
],
"source": [
"ic_arachnodactyly = term_id2ic[arachnodactyly.identifier]\n",
"print(f'IC of {arachnodactyly.name}: {ic_arachnodactyly} nats')"
]
},
{
"cell_type": "markdown",
"id": "341f7123-3606-4a6f-be77-2944af843412",
"metadata": {},
"source": [
"By default, the `base` parameter of the `calculate_ic_for_annotated_items` is set to $e$ (Euler's number), returning the IC values in [nats](https://en.wikipedia.org/wiki/Nat_(unit)). Set `base=2` to get the IC in bits."
]
},
{
"cell_type": "markdown",
"id": "255a4fbf-f1e9-40e8-8efa-6b91724968a5",
"metadata": {},
"source": [
"The versions of the items and ontology used to calculate the ICs are preserved in the `metadata` attribute of the `AnnotationIcContainer`:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "deb939b7-847f-4b9f-a823-61ea8e9cbb1a",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'annotated_items_version': '2023-04-05', 'ontology_version': '2023-04-06'}"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"term_id2ic.metadata"
]
},
{
"cell_type": "markdown",
"id": "cb372d93-34c1-48b1-a097-9b174dca54c8",
"metadata": {},
"source": [
"## Resnik similarity\n",
"\n",
"We can use the term IC to calculate semantic similarity of items (e.g. diseases, patients) annotated with ontology concepts.\n",
"\n",
"Resnik similarity uses the information content of the *most informative common ancestor* ($IC_{MICA}$ to calculate semantic similarity. `hpo-toolkit` offers a function for calculating $IC_{MICA}$ for all ontology concept pairs:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "4e4afd86-1c26-454e-a627-07b29eb00f01",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from hpotk.algorithm.similarity import precalculate_ic_mica_for_hpo_concept_pairs, SimilarityContainer\n",
"\n",
"#sc: SimilarityContainer = precalculate_ic_mica_for_hpo_concept_pairs(term_id2ic, o)"
]
},
{
"cell_type": "markdown",
"id": "b0b9de40-efa5-4b28-bc95-f86e933faf1e",
"metadata": {},
"source": [
"The function is commented out/not run in the notebook because it is a bit slow..\n",
"\n",
"However, for the purpose of the demonstration, let's assume we have the `SimilarityContainer` on hand."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "bea2ed63-59bc-4081-a9e8-d81ec6d06d80",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sc = SimilarityContainer()"
]
},
{
"cell_type": "markdown",
"id": "d6c99e9a-37e7-4fdd-96b8-dec09e6f5004",
"metadata": {},
"source": [
"We can query the $IC_{MICA}$ of two HPO terms by:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "0ade7bf4-7d4c-4f5d-b612-dd3f8d9e2c57",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"a = 'HP:0001166' # Arachnodactyly\n",
"b = 'HP:0001182' # Tapered finger\n",
"ic_mica = sc.get_similarity(a, b)"
]
},
{
"cell_type": "markdown",
"id": "a3634603-e270-4b7b-a8c1-bc4813b3066a",
Expand Down
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,16 @@ authors = [
]

readme = "README.md"
requires-python = ">=3.6"
requires-python = ">=3.8"
keywords = ["human phenotype ontology", "HPO", "library"]
license = { file = "LICENSE" }
classifiers = [
"Programming Language :: Python :: 3",
"Development Status :: 3 - Alpha",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
]

dependencies = [
Expand Down
2 changes: 1 addition & 1 deletion recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set name = "hpo-toolkit" %}
{% set version = "0.1.4" %}
{% set version = "0.1.5" %}

package:
name: {{ name|lower }}
Expand Down
14 changes: 9 additions & 5 deletions src/hpotk/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
__version__ = "0.1.4"
__version__ = "0.1.5"

from . import model
from . import algorithm
from . import annotations
from . import constants
from . import graph
from . import model
from . import ontology
from . import algorithm
from . import annotations
from . import validate
from . import util
from . import validate

from .graph import OntologyGraph, GraphAware
from .model import TermId, Term, MinimalTerm, Synonym, SynonymType, SynonymCategory
from .ontology import Ontology, MinimalOntology
1 change: 1 addition & 0 deletions src/hpotk/algorithm/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ._traversal import get_ancestors, get_parents
from ._traversal import get_children, get_descendents, get_descendants
from ._traversal import exists_path
from . import similarity
Loading

0 comments on commit 675479a

Please sign in to comment.