Skip to content

Commit

Permalink
Merge pull request #62 from TheJacksonLaboratory/add-ontology-store
Browse files Browse the repository at this point in the history
Add `OntologyStore` API
  • Loading branch information
ielis authored Mar 13, 2024
2 parents f8d513b + 77e512c commit c11224d
Show file tree
Hide file tree
Showing 11 changed files with 517 additions and 90 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,11 @@ Loading HPO is as simple as:
```python
import hpotk

hpo = hpotk.load_ontology('http://purl.obolibrary.org/obo/hp.json')
store = hpotk.configure_ontology_store()
hpo = store.load_hpo()
```

Now you have HPO concepts and the ontology hierarchy at your fingertips.
Now you have the concepts and the hierarchy of the latest HPO release at your fingertips.

Next, load the HPO disease annotations by running:

Expand Down
1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ This guide includes self-contained tutorials for using HPO toolkit to work with
:caption: Contents:

load-ontology
use-ontology
use-hierarchy
load-hpo-annotations
validate-phenotypic-features
Expand Down
124 changes: 37 additions & 87 deletions docs/user-guide/load-ontology.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,17 @@ Loading HPO is the first item of all analysis task lists.
HPO toolkit supports loading the ontology data from an `Obographs <https://github.com/geneontology/obographs>`_
JSON file which is available for download from the `HPO website <https://hpo.jax.org/app/data/ontology>`_.

Minimal ontology
Ontology loaders
****************

HPO toolkit provides 2 ways for loading an ontology: a low-level loader and a high-level :class:`OntologyStore`.

Low level loader
^^^^^^^^^^^^^^^^

The low-level loader function loads a :class:`hpotk.ontology.MinimalOntology` from a local or remote resource.
The loader will open the resource and parse its contents into an ontology. Any failure is reported as an exception.

Let's load the HPO version released on Oct 9th, 2023:

.. doctest:: load-minimal-ontology
Expand All @@ -19,112 +27,54 @@ Let's load the HPO version released on Oct 9th, 2023:

>>> url = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'
>>> hpo = hpotk.load_minimal_ontology(url)

The loader fetches the Obographs JSON file and loads the data into :class:`hpotk.ontology.MinimalOntology`.

.. note::

The loader can fetch the HPO from a URL, and it transparently handles gzipped files
if the file name ends with `.gz` suffix.

Having `MinimalOntology`, we can do several checks. We can check the HPO version:

.. doctest:: load-minimal-ontology

>>> hpo.version
'2023-10-09'

check that the Oct 9th release has *17,664* terms:
We use the :func:`hpotk.ontology.load.obographs.load_minimal_ontology` function to fetch the Obographs JSON file
and to load the data into :class:`hpotk.ontology.MinimalOntology`.

.. doctest:: load-minimal-ontology

>>> len(hpo)
17664

check that `HP:0001250` is/was a valid identifier:
.. note::

.. doctest:: load-minimal-ontology
The loader can fetch the HPO from a local path (relative or absolute), or from a URL,
and it transparently handles decompression of gzipped files if the file name has a `.gz` suffix.

>>> 'HP:0001250' in hpo
True
A similar loader function :func:`hpotk.ontology.load.obographs.load_ontology` exists
to load an :class:`hpotk.ontology.Ontology`.

check that `HP:0001250` in fact represents *Seizure*:

.. doctest:: load-minimal-ontology
Ontology store
^^^^^^^^^^^^^^

>>> seizure = hpo.get_term('HP:0001250')
>>> seizure.name
'Seizure'
Alternatively, we can use the :class:`hpotk.util.store.OntologyStore`, a class that wraps the low-level loader
and provides more convenience.

or print names of its children in alphabetical order:
Using `OntologyStore` provides several benefits. `OntologyStore` caches the ontology data files in a local directory
to prevent downloading a HPO release more than once, to save time spent during slow network access.

.. doctest:: load-minimal-ontology

>>> for child in sorted(hpo.get_term_name(child)
... for child in hpo.graph.get_children(seizure)):
... print(child)
Bilateral tonic-clonic seizure
Dialeptic seizure
Focal-onset seizure
Generalized-onset seizure
Infection-related seizure
Maternal seizure
Motor seizure
Neonatal seizure
Nocturnal seizures
Non-motor seizure
Reflex seizure
Status epilepticus
Symptomatic seizures

The terms of :class:`hpotk.ontology.MinimalOntology` are instances of :class:`hpotk.model.MinimalTerm` and contain a subset
of the term metadata such as identifier, labels, and alternative IDs. The simplified are useful for tasks that
use the ontology hierarchy. However, the tasks that need the full term metadata should use `Ontology`.

Ontology
^^^^^^^^

Unsurprisingly, loading ontology is very similar to loading minimal ontology. We use `hpotk.load_ontology`
loader function:

.. testsetup:: load-ontology

import hpotk
url = 'https://github.com/obophenotype/human-phenotype-ontology/releases/download/v2023-10-09/hp.json'

.. doctest:: load-ontology

>>> hpo = hpotk.load_ontology(url)
>>> store = hpotk.configure_ontology_store()
>>> hpo = store.load_minimal_hpo(release='v2023-10-09')
>>> hpo.version
'2023-10-09'

Same as above, the loader parses the Obographs JSON file and returns an ontology. However, this time
it is an instance :class:`hpotk.ontology.Ontology` with :class:`hpotk.model.Term` - the term with full metadata.
The store will download the ontology file the first time a release (e.g. `v2023-10-09`) is requested, and subsequent
loads will skip the download. The `release` must be a release tag, as defined
in the tag section of the `HPO release page <https://github.com/obophenotype/human-phenotype-ontology/tags>`_.

So, now we can access the definition of the seizure:
Moreover, `OntologyStore` will load the *latest* release, if the `release` option is omitted.

.. doctest:: load-ontology

>>> seizure = hpo.get_term('HP:0001250')
>>> definition = seizure.definition
>>> definition.definition
'A seizure is an intermittent abnormality of nervous system physiology characterised by a transient occurrence of signs and/or symptoms due to abnormal excessive or synchronous neuronal activity in the brain.'
>>> definition.xrefs
('https://orcid.org/0000-0002-0736-9199', 'PMID:15816939')


or check out seizure's synonyms:
.. doctest:: load-minimal-ontology

.. doctest:: load-ontology
>>> hpo_latest = store.load_minimal_hpo() # doctest: +SKIP
>>> hpo_latest.version # doctest: +SKIP
'2024-03-06'

>>> for synonym in seizure.synonyms:
... print(synonym.name)
Epileptic seizure
Seizures
Epilepsy
As of the time of this writing, ``2024-03-06`` is the latest HPO release.

.. note::

Since `Ontology` is a subclass of `MinimalOntology`, any function that needs `MinimalOntology` will work just fine
when provided with `Ontology`.
Next steps
**********

Loading an ontology is a prerequisite for doing anything useful with the ontology data. Check out
the :ref:`use-ontology` section for an overview of the functionality.
133 changes: 133 additions & 0 deletions docs/user-guide/use-ontology.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
.. _use-ontology:

============
Use ontology
============

HPO toolkit simplifies working with Human Phenotype Ontology from Python by providing APIs
for accessing the ontology data. Here we show how to access the data.

We assume the reader is familiar with loading ontology from an Obographs JSON file as described
in the :ref:`rstload-ontology` section.

HPO toolkit represents the ontology data either as :class:`hpotk.ontology.MinimalOntology`
or as its subclass :class:`hpotk.ontology.Ontology`.
The two classes are mostly equivalent but the `MinimalOntology` terms contain less metadata than the `Ontology` terms.
We recommend using `MinimalOntology` for applications that mostly care about the ontology hierarchy and
`Ontology` is suitable for applications that use definitions, synonyms, or cross-references of the ontology terms,
such as natural language processing applications.


Minimal ontology
^^^^^^^^^^^^^^^^

Let's see what we can do with a `MinimalOntology`.

We start with loading the version `v2023-10-09` using :class:`hpotk.util.store.OntologyStore`:

.. doctest:: load-minimal-ontology

>>> import hpotk
>>> store = hpotk.configure_ontology_store()

>>> hpo = store.load_minimal_hpo(release='v2023-10-09')

We can check the HPO version:

.. doctest:: load-minimal-ontology

>>> hpo.version
'2023-10-09'

check that the release has *17,664* terms:

.. doctest:: load-minimal-ontology

>>> len(hpo)
17664

check that `HP:0001250` is/was a valid term id:

.. doctest:: load-minimal-ontology

>>> 'HP:0001250' in hpo
True

check that `HP:0001250` in fact represents *Seizure*:

.. doctest:: load-minimal-ontology

>>> seizure = hpo.get_term('HP:0001250')
>>> seizure.name
'Seizure'

or print the names of its children in alphabetical order:

.. doctest:: load-minimal-ontology

>>> for child in sorted(hpo.get_term_name(child)
... for child in hpo.graph.get_children(seizure)):
... print(child)
Bilateral tonic-clonic seizure
Dialeptic seizure
Focal-onset seizure
Generalized-onset seizure
Infection-related seizure
Maternal seizure
Motor seizure
Neonatal seizure
Nocturnal seizures
Non-motor seizure
Reflex seizure
Status epilepticus
Symptomatic seizures

The terms of :class:`hpotk.ontology.MinimalOntology` are instances of :class:`hpotk.model.MinimalTerm` and contain a subset
of the term metadata such as identifier, labels, and alternative IDs. The simplified are useful for tasks that
use the ontology hierarchy.

Ontology
^^^^^^^^

Unsurprisingly, loading ontology is very similar to loading minimal ontology. Same as above,
we use :class:`hpotk.util.store.OntologyStore`:

.. doctest:: load-ontology

>>> import hpotk
>>> store = hpotk.configure_ontology_store()

>>> hpo = store.load_hpo(release='v2023-10-09')
>>> hpo.version
'2023-10-09'

Same as above, the ontology store will check the local cache for the ontology data file of the requested release
and fetch the file from HPO release page if missing. Then, the file is parsed into :class:`hpotk.ontology.Ontology`,
where the ontology terms are represented as :class:`hpotk.model.Term`.

Thanks to the additional metadata present in a `Term`, we can also access the definition of the *Seizure*:

.. doctest:: load-ontology

>>> seizure = hpo.get_term('HP:0001250')
>>> definition = seizure.definition
>>> definition.definition
'A seizure is an intermittent abnormality of nervous system physiology characterised by a transient occurrence of signs and/or symptoms due to abnormal excessive or synchronous neuronal activity in the brain.'
>>> definition.xrefs
('https://orcid.org/0000-0002-0736-9199', 'PMID:15816939')

or its synonyms:

.. doctest:: load-ontology

>>> for synonym in seizure.synonyms:
... print(synonym.name)
Epileptic seizure
Seizures
Epilepsy

.. note::

Since `Ontology` is a subclass of `MinimalOntology`, any function that needs `MinimalOntology` will work just fine
when provided with `Ontology`.

1 change: 1 addition & 0 deletions src/hpotk/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@
from .ontology import Ontology, MinimalOntology

from .ontology.load.obographs import load_minimal_ontology, load_ontology
from .util.store import OntologyType, OntologyStore, configure_ontology_store
2 changes: 1 addition & 1 deletion src/hpotk/util/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from . import sort
from . import sort # TODO: probably not necessary

from ._io import looks_like_url, looks_gzipped
from ._io import open_text_io_handle, open_text_io_handle_for_reading, open_text_io_handle_for_writing
Expand Down
11 changes: 11 additions & 0 deletions src/hpotk/util/store/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""
The `hpotk.util.store` package provides
"""

from ._api import OntologyType, OntologyStore
from ._config import configure_ontology_store

__all__ = [
'OntologyType', 'OntologyStore',
'configure_ontology_store',
]
Loading

0 comments on commit c11224d

Please sign in to comment.