Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release v0.1.5 #26

Merged
merged 43 commits into from
May 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
07fe092
Next development iteration `v0.1.5dev0`.
ielis Apr 14, 2023
13a331f
The rule validators can work with a sequence of `Identified` or `Term…
ielis Apr 25, 2023
740782a
Introduce annotated items concept.
ielis Apr 27, 2023
678aaf9
Use auto enum numbering for hpotk.annotation enums.
ielis Apr 27, 2023
efd5eb9
Make annotated item container iterable.
ielis Apr 27, 2023
eefd5cd
Update docs.
ielis Apr 28, 2023
efb61f5
Move algorithm.traversal tests into a dedicated folder.
ielis Apr 28, 2023
b6d21c6
Add new test files - an example HPOA for Marfan syndrome + Hyperekple…
ielis Apr 28, 2023
011fd83
Make algorithm tests a module.
ielis Apr 28, 2023
647e27e
Create an `algorithm.similarity` package.
ielis Apr 28, 2023
1518246
Add functions for augmenting a set of `TermId`s with ancestors/descen…
ielis Apr 28, 2023
aa490b6
Calculate IC for `TermId`s of annotated items.
ielis Apr 28, 2023
701df3f
Implement calculation and persistence of Resnik similarity.
ielis May 1, 2023
e8daa25
Separate data classes.
ielis May 2, 2023
e3962cf
Implement serialization of the TermId2IC container.
ielis May 3, 2023
2ca04ad
Implement incremental building of the ontology graph. Deprecate `CsrG…
ielis May 9, 2023
6cb5020
Add badges with build status and PyPi downloads.
ielis May 9, 2023
64d07b4
Remove local tests.
ielis May 9, 2023
8f9c70b
Add instance validation utilities.
ielis May 10, 2023
382579b
Export the most commonly used data model classes at the top-level.
ielis May 10, 2023
89ebbe7
Use binary search to look up node index in CSR ontology graph. Ontolo…
ielis May 10, 2023
84fc8ac
Implement getting ancestors and descendants of an ontology graph node…
ielis May 10, 2023
c6b8679
Use graph methods for traversal algorithms.
ielis May 10, 2023
192ede5
Use `set.update` instead of iterative addition in `hpotk.algorithm._a…
ielis May 11, 2023
f36421f
Remove type subscription on `np.ndarray`
ielis May 11, 2023
d978c20
Remove type subscription on `np.ndarray` on the remaining instances.
ielis May 11, 2023
93ebe5d
Use `typing.FrozenSet[TermId]` instead of `frozenset[TermId]`.
ielis May 11, 2023
fc193e2
Use `typing.FrozenSet[TermId]` instead of `frozenset[TermId]` in Resn…
ielis May 11, 2023
dddee30
Put `disease_ids()` back to `SimpleHpoDiseases` to maintain backward …
ielis May 12, 2023
44492bf
Update the documentation tutorial.
ielis May 12, 2023
d15affa
Store ontology version in `AnnotationIcContainer`. Reword HPO IC MICA…
ielis May 12, 2023
de44c98
Describe `hpotk.algorithm.similarity` in the doc notebook.
ielis May 12, 2023
6f6e029
Merge pull request #22 from TheJacksonLaboratory/implement_semsim
ielis May 12, 2023
ba86b65
Update classifiers to indicate the supported Python versions. Run tes…
ielis May 12, 2023
68162d9
Run documentation on Ubuntu only.
ielis May 12, 2023
afb0ce0
Run CI on all supported Python versions.
ielis May 12, 2023
997910a
Run notebooks on one OS only.
ielis May 12, 2023
2578457
Merge pull request #24 from TheJacksonLaboratory/update_classifiers
ielis May 12, 2023
43efe70
Merge branch 'development' into accept_identified_and_term_ids_in_rul…
ielis May 12, 2023
9838e4d
Fix bug in obsolete term usage validator, update doc notebook.
ielis May 12, 2023
d103c89
Merge pull request #25 from TheJacksonLaboratory/accept_identified_an…
ielis May 12, 2023
659483b
Set version to `v0.1.5`.
ielis May 12, 2023
ed56cd8
Update the version in the conda recipe.
ielis May 12, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/notebooks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ windows-latest, macOS-latest, ubuntu-latest ]
os: [ ubuntu-latest ]
python: [ "3.11" ]

steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/python_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ jobs:
fail-fast: false
matrix:
os: [ windows-latest, macOS-latest, ubuntu-latest ]
python: [ "3.8", "3.10", "3.11" ]
python: [ "3.8", "3.9", "3.10", "3.11" ]

steps:
- uses: actions/checkout@v2
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# hpo-toolkit

![Build status](https://img.shields.io/github/actions/workflow/status/TheJacksonLaboratory/hpo-toolkit/python_ci.yml)
![PyPi downloads](https://img.shields.io/pypi/dm/hpo-toolkit.svg?label=Pypi%20downloads)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/hpo-toolkit)

A toolkit for working with Human Phenotype Ontology in Python

## Install
Expand Down
214 changes: 202 additions & 12 deletions notebooks/Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,30 @@
"!pip install hpo-toolkit"
]
},
{
"cell_type": "markdown",
"id": "67f52cee-6b59-4284-a8b3-5bb004bde19b",
"metadata": {},
"source": [
"# API update\n",
"\n",
"As of `v0.1.5`, we re-export most commonly used classes from the top-level package to save some typing:\n",
"\n",
"| Class | Previous | New API (top-level reexport) |\n",
"|:-----------------|:-----------------------------------|:--------------------------------|\n",
"| TermId | `hpotk.model.TermId` | `hpotk.TermId` |\n",
"| Term | `hpotk.model.Term` | `hpotk.Term` |\n",
"| MinimalTerm | `hpotk.model.MinimalTerm` | `hpotk.MinimalTerm` |\n",
"| Synonym | `hpotk.model.Synonym` | `hpotk.Synonym` |\n",
"| SynonymType | `hpotk.model.SynonymType` | `hpotk.SynonymType` |\n",
"| SynonymCategory | `hpotk.model.SynonymCategory` | `hpotk.SynonymCategory` |\n",
"| OntologyGraph | `hpotk.graph.OntologyGraph` | `hpotk.OntologyGraph` |\n",
"| GraphAware | `hpotk.graph.GraphAware` | `hpotk.GraphAware` |\n",
"| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n",
"| Ontology | `hpotk.ontology.Ontology` | `hpotk.Ontology` |\n",
"| MinimalOntology | `hpotk.ontology.MinimalOntology` | `hpotk.MinimalOntology` |\n"
]
},
{
"cell_type": "markdown",
"id": "6e41fb89-d16e-427f-8e53-ec92710af8c8",
Expand All @@ -48,7 +72,7 @@
},
"outputs": [],
"source": [
"from hpotk.ontology import Ontology\n",
"from hpotk import Ontology\n",
"from hpotk.ontology.load.obographs import load_ontology\n",
"\n",
"o: Ontology = load_ontology('http://purl.obolibrary.org/obo/hp.json')"
Expand All @@ -75,20 +99,20 @@
"\n",
"HPO toolkit includes several classes that serve as building blocks in the data model. This section provides basic information, starting from the bottom of the class hierarchy.\n",
"\n",
"- `hpotk.model.TermId` - an identifier of an ontology concept.\n",
"- `hpotk.model.MinimalTerm` - represents minimal useful information of the ontology concept. `MinimalTerm` has the following attributes:\n",
"- `hpotk.TermId` - an identifier of an ontology concept.\n",
"- `hpotk.MinimalTerm` - represents minimal useful information of the ontology concept. `MinimalTerm` has the following attributes:\n",
" - `identifier`, `TermId` (e.g. `HP:0001166`)\n",
" - `name`, `str` (e.g. `Arachnodactyly`)\n",
" - `is_current`/`is_obsolete`, whether or not the concept has been obsoleted\n",
" - `alt_term_ids`, a sequence of obsolete `TermId`s that represented the term previously\n",
"- `hpotk.model.Term` - the complete info regarding the ontology concept. The `Term` has all attributes of the `MinimalTerm` plus the following:\n",
"- `hpotk.Term` - the complete info regarding the ontology concept. The `Term` has all attributes of the `MinimalTerm` plus the following:\n",
" - `definition` - an optional description of the term in slightly more verbiage\n",
" - `comment` - additional comment (optional)\n",
" - `synonyms` - alternative designations of the `Term` (optional)\n",
" - `xrefs` - a sequence of cross-references between the `Term` and concepts from different databases\n",
"- `hpotk.ontology.MinimalOntology` - the container for ontology data that uses `MinimalTerm`s\n",
"- `hpotk.ontology.Ontology` - the ontology data container that contains `Term`s\n",
"- `hpotk.graph.OntologyGraph` - a specification of graph for storing the ontology concept hierarchy and the required graph functionality. As long as the graph implements the methods, it can work with the rest of the toolkit framework\n",
"- `hpotk.MinimalOntology` - the container for ontology data that uses `MinimalTerm`s\n",
"- `hpotk.Ontology` - the ontology data container that contains `Term`s\n",
"- `hpotk.OntologyGraph` - a specification of graph for storing the ontology concept hierarchy and the required graph functionality. As long as the graph implements the methods, it can work with the rest of the toolkit framework\n",
"\n",
"Now, let's go over some examples to explore the provided functionality."
]
Expand Down Expand Up @@ -235,7 +259,7 @@
"metadata": {},
"outputs": [],
"source": [
"from hpotk.model import TermId\n",
"from hpotk import TermId\n",
"assert current_arachnodactyly_id in o and TermId.from_curie(current_arachnodactyly_id) in o"
]
},
Expand Down Expand Up @@ -400,6 +424,8 @@
"\n",
"The toolkit provides functions for performing multiple useful sanity checks.\n",
"\n",
"The validators validate a sequence of `Identified` (a thing that has a `TermId` identifier) or `TermId`s.\n",
"\n",
"## Obsolete term IDs\n",
"\n",
"We should always use the primary term IDs instead of the obsolete terms.\n",
Expand All @@ -415,7 +441,7 @@
"metadata": {},
"outputs": [],
"source": [
"from hpotk.model import MinimalTerm\n",
"from hpotk import MinimalTerm\n",
"\n",
"from hpotk.validate import ValidationLevel\n",
"from hpotk.validate import ObsoleteTermIdsValidator\n",
Expand All @@ -424,7 +450,7 @@
"\n",
"# The term uses an obsolete term ID `HP:0006010` instead of the current `HP:0100807`.\n",
"inputs = [\n",
" MinimalTerm.create_minimal_term(TermId.from_curie('HP:0006010'), name='Long fingers', alt_term_ids=[], is_obsolete=False)\n",
" TermId.from_curie('HP:0006010')\n",
"]\n",
"results = obso_validator.validate(inputs)\n",
"\n",
Expand Down Expand Up @@ -891,7 +917,7 @@
}
],
"source": [
"print(next(iter(diseases.diseases)))"
"print(next(iter(diseases.items)))"
]
},
{
Expand Down Expand Up @@ -920,7 +946,7 @@
],
"source": [
"# Iterate over disease identifiers\n",
"print(next(iter(diseases.disease_ids)))"
"print(next(iter(diseases.item_ids())))"
]
},
{
Expand Down Expand Up @@ -1003,6 +1029,170 @@
"The HPO annotations API is still evolving. Stay tuned for more features to come!"
]
},
{
"cell_type": "markdown",
"id": "2c94a367-e7a7-4b24-8aa8-3cdd9449bb12",
"metadata": {},
"source": [
"# Semantic similarity\n",
"\n",
"## Information content of ontology terms\n",
"\n",
"We can calculate information content (IC) for all ontology terms and obtain `AnnotationIcContainer`:"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "fe09955e-9a7d-49dc-9cd7-e7db4b876af5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from hpotk.algorithm.similarity import calculate_ic_for_annotated_items, AnnotationIcContainer\n",
"\n",
"term_id2ic: AnnotationIcContainer = calculate_ic_for_annotated_items(diseases, o)"
]
},
{
"cell_type": "markdown",
"id": "000836e9-98da-4fdc-8c76-eab5c163d779",
"metadata": {},
"source": [
"`AnnotationIcContainer` is a mapping from `TermId` to IC (`float`):"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "7ad52558-e398-413f-8d65-0e67d6099d42",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"IC of Arachnodactyly: 7.259769077238713 nats\n"
]
}
],
"source": [
"ic_arachnodactyly = term_id2ic[arachnodactyly.identifier]\n",
"print(f'IC of {arachnodactyly.name}: {ic_arachnodactyly} nats')"
]
},
{
"cell_type": "markdown",
"id": "341f7123-3606-4a6f-be77-2944af843412",
"metadata": {},
"source": [
"By default, the `base` parameter of the `calculate_ic_for_annotated_items` is set to $e$ (Euler's number), returning the IC values in [nats](https://en.wikipedia.org/wiki/Nat_(unit)). Set `base=2` to get the IC in bits."
]
},
{
"cell_type": "markdown",
"id": "255a4fbf-f1e9-40e8-8efa-6b91724968a5",
"metadata": {},
"source": [
"The versions of the items and ontology used to calculate the ICs are preserved in the `metadata` attribute of the `AnnotationIcContainer`:"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "deb939b7-847f-4b9f-a823-61ea8e9cbb1a",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"{'annotated_items_version': '2023-04-05', 'ontology_version': '2023-04-06'}"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"term_id2ic.metadata"
]
},
{
"cell_type": "markdown",
"id": "cb372d93-34c1-48b1-a097-9b174dca54c8",
"metadata": {},
"source": [
"## Resnik similarity\n",
"\n",
"We can use the term IC to calculate semantic similarity of items (e.g. diseases, patients) annotated with ontology concepts.\n",
"\n",
"Resnik similarity uses the information content of the *most informative common ancestor* ($IC_{MICA}$ to calculate semantic similarity. `hpo-toolkit` offers a function for calculating $IC_{MICA}$ for all ontology concept pairs:"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "4e4afd86-1c26-454e-a627-07b29eb00f01",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from hpotk.algorithm.similarity import precalculate_ic_mica_for_hpo_concept_pairs, SimilarityContainer\n",
"\n",
"#sc: SimilarityContainer = precalculate_ic_mica_for_hpo_concept_pairs(term_id2ic, o)"
]
},
{
"cell_type": "markdown",
"id": "b0b9de40-efa5-4b28-bc95-f86e933faf1e",
"metadata": {},
"source": [
"The function is commented out/not run in the notebook because it is a bit slow..\n",
"\n",
"However, for the purpose of the demonstration, let's assume we have the `SimilarityContainer` on hand."
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "bea2ed63-59bc-4081-a9e8-d81ec6d06d80",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"sc = SimilarityContainer()"
]
},
{
"cell_type": "markdown",
"id": "d6c99e9a-37e7-4fdd-96b8-dec09e6f5004",
"metadata": {},
"source": [
"We can query the $IC_{MICA}$ of two HPO terms by:"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "0ade7bf4-7d4c-4f5d-b612-dd3f8d9e2c57",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"a = 'HP:0001166' # Arachnodactyly\n",
"b = 'HP:0001182' # Tapered finger\n",
"ic_mica = sc.get_similarity(a, b)"
]
},
{
"cell_type": "markdown",
"id": "a3634603-e270-4b7b-a8c1-bc4813b3066a",
Expand Down
8 changes: 6 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,16 @@ authors = [
]

readme = "README.md"
requires-python = ">=3.6"
requires-python = ">=3.8"
keywords = ["human phenotype ontology", "HPO", "library"]
license = { file = "LICENSE" }
classifiers = [
"Programming Language :: Python :: 3",
"Development Status :: 3 - Alpha",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
]

dependencies = [
Expand Down
2 changes: 1 addition & 1 deletion recipe/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set name = "hpo-toolkit" %}
{% set version = "0.1.4" %}
{% set version = "0.1.5" %}

package:
name: {{ name|lower }}
Expand Down
14 changes: 9 additions & 5 deletions src/hpotk/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
__version__ = "0.1.4"
__version__ = "0.1.5"

from . import model
from . import algorithm
from . import annotations
from . import constants
from . import graph
from . import model
from . import ontology
from . import algorithm
from . import annotations
from . import validate
from . import util
from . import validate

from .graph import OntologyGraph, GraphAware
from .model import TermId, Term, MinimalTerm, Synonym, SynonymType, SynonymCategory
from .ontology import Ontology, MinimalOntology
1 change: 1 addition & 0 deletions src/hpotk/algorithm/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from ._traversal import get_ancestors, get_parents
from ._traversal import get_children, get_descendents, get_descendants
from ._traversal import exists_path
from . import similarity
Loading