Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance the documentation #48

Merged
merged 4 commits into from
Nov 29, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
![PyPi downloads](https://img.shields.io/pypi/dm/hpo-toolkit.svg?label=Pypi%20downloads)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/hpo-toolkit)

A toolkit for working with Human Phenotype Ontology in Python.
A toolkit for working with Human Phenotype Ontology (HPO) and HPO disease annotations in Python.

Loading HPO is as simple as:

Expand All @@ -14,7 +14,7 @@ import hpotk
hpo = hpotk.load_ontology('http://purl.obolibrary.org/obo/hp.json')
```

Loading HPO annotations is accomplished by running:
The HPO disease annotations are loaded by running:

```python
from hpotk.annotations.load.hpoa import SimpleHpoaDiseaseLoader
Expand All @@ -28,7 +28,7 @@ diseases = loader.load(hpoa_path)
assert len(diseases) == 12_468
```

Check out the User guide and the API reference for more info:
Find more info in our detailed documentation:

- [Stable documentation](https://thejacksonlaboratory.github.io/hpo-toolkit/stable) (last release on `main` branch)
- [Latest documentation](https://thejacksonlaboratory.github.io/hpo-toolkit/latest) (bleeding edge, latest commit on `development` branch)
18 changes: 15 additions & 3 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,26 @@ the active Python environment, assuming you have privileges to install packages.
Run tests
^^^^^^^^^

The contributors may want to run the unit tests and the integration tests to ensure the features work as expected.
Hpo-toolkit's tests use a combination of `unittest` and `ddt`. The tests are run as::
The contributors may want to run the unit tests and the integration tests to ensure all features work as expected.

Before running tests, make sure you install HPO toolkit with `test` and `docs` dependencies::

python3 -m pip install .[test,docs]

The unit tests and the integration tests can the be running by invoking::

python3 -m unittest discover -s src -p _test*.py
python3 -m unittest discover -s tests

We go extra mile to ensure the documentation is always up-to-date, and, therefore, we also run the documentation tests.
The documentation tests are run by::

cd docs
sphinx-apidoc --separate --module-first -d 2 -H "API reference" -o apidocs ../src/hpotk
make doctest

.. note::

The library *must* be installed in the environment before running the tests. Otherwise, the test discovery will fail.
The library *must* be installed in the environment before running all tests. Otherwise, the test discovery will fail.

That's about it!
1 change: 1 addition & 0 deletions docs/user-guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,5 +13,6 @@ This guide includes self-contained tutorials for using HPO toolkit to work with
load-ontology
use-hierarchy
load-hpo-annotations
validate-phenotypic-features
sort-term-ids

150 changes: 150 additions & 0 deletions docs/user-guide/validate-phenotypic-features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
.. _validate-phenotypic-features:

============================
Validate phenotypic features
============================

One of the most convenient benefits of using ontology concepts for annotation of items with features and properties
is that one can leverage the information encoded in the semantic relationships of the ontology hierarchy.
Among other things, the semantic relationships empower fuzzy searches and similarity measures that have found
broad usage in many fields, such as biomedicine.

HPO became a *de facto* standard for representing phenotypic features - the signs and symptoms of an individual.
However, unlike in the case of other ontologies, several unique rules should be followed to maximize the benefits
of HPO annotations. In the sections below, we describe the rules and show how HPO toolkit can reveal their violations.

For the sake of this guide, let's assume we have an individual annotated with the following four phenotypic features:

* *Arachnodactyly*
* *Seizure*
* *Focal clonic seizure*
* *Enuresis nocturna*

.. doctest:: check-consistency

>>> curies = [
... 'HP:0001505', # Arachnodactyly
... 'HP:0001250', # Seizure
... 'HP:0002266', # Focal clonic seizure
... 'HP:0010677' # Enuresis nocturna
... ]

Let's convert the CURIEs into term ids:

.. doctest:: check-consistency

>>> import hpotk
>>> term_ids = [hpotk.TermId.from_curie(curie) for curie in curies]

and let's finish the setup by loading the toy HPO shipped with the documentation.

.. doctest:: check-consistency

>>> hpo = hpotk.load_minimal_ontology('data/hp.toy.json')


Do not use obsolete term ids
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

As the first rule, the annotations should always use the current identifier. During ontology development,
some concepts may become obsolete and later be removed from the ontology altogether.
Most of the time, however, the removed concepts have a straightforward replacement.

The :class:`hpotk.validate.ObsoleteTermIdsValidator` points out the usage of obsolete term ids
and suggests the replacement.

Let's create the validator and check if the phenotypic features are OK:

.. doctest:: check-consistency

>>> from hpotk.validate import ObsoleteTermIdsValidator
>>> obs_val = ObsoleteTermIdsValidator(hpo)

>>> vr = obs_val.validate(term_ids)

The validator returns back an instance of :class:`hpotk.validate.ValidationResults` with the validation output.
We can check for presence of issues in the input:

.. doctest:: check-consistency

>>> vr.is_ok()
False

The input is *not* OK, so we should look at the issues in greater detail:

.. doctest:: check-consistency

>>> for validation_result in vr.results:
... print(validation_result)
ValidationResult(level=<ValidationLevel.WARNING: 1>, category='obsolete_term_id_is_used', message='Using the obsolete HP:0001505 instead of HP:0001166 for Arachnodactyly')

We see that the `HP:0001505` is obsolete and `HP:0001166` should be used as the new *Arachnodactyly* identifier.


Phenotypic features should be descendants of *Phenotypic abnormality*
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

HPO hierarchy has several major branches to uniquely represent concepts such as clinical modifiers, modes of inheritance,
and past medical medical history. However, the signs and symptoms should be encoded into descendants
of *Phenotypic abnormality*.

:class:`hpotk.validate.PhenotypicAbnormalityValidator` checks that all identifiers correspond to descendants
of *Phenotypic abnormality*:

Let's test that this is valid for the patient features:

.. doctest:: check-consistency

>>> from hpotk.validate import PhenotypicAbnormalityValidator
>>> pa_val = PhenotypicAbnormalityValidator(hpo)

>>> vr = pa_val.validate(term_ids)
>>> vr.is_ok()
True

Yes, the all term ids represent the descendants of *Phenotypic abnormality*.


Phenotypic features should not violate the annotation propagation rule
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Last and most importantly, let's discuss the concept of annotation redundancy.
HPO uses `is_a` to represent the edges of the ontology hierarchy graph. The edges model the "parent-child"
relationships between two ontology concepts (the object subsumes the subject).

When using HPO concepts to encode clinical features of an individual, presence of a concept implies presence
of all ancestral concepts. This is also known as the "True path rule", where the annotation "propagates" across
the concept ancestors.

In general, using the same annotation more than once is considered an error (e.g. annotate the subject with *Focal clonic seizure*
and *Focal clonic seizure*). However, thanks to the True path rule, using a concept and its ancestor is an offender
of a similar kind.

:class:`hpotk.validate.AnnotationPropagationValidator` checks if a set of terms violate the annotation propagation rule
- if a collection of concepts contains a term and its ancestor.

.. doctest:: check-consistency

>>> from hpotk.validate import AnnotationPropagationValidator
>>> ap_val = AnnotationPropagationValidator(hpo)

>>> vr = ap_val.validate(term_ids)
>>> vr.is_ok()
False

There seems to an issue. Let's break it down:

.. doctest:: check-consistency

>>> for validation_result in vr.results:
... print(validation_result.level)
... print(validation_result.category)
... print(validation_result.message)
ValidationLevel.ERROR
annotation_propagation
Terms should not contain both present Focal clonic seizure [HP:0002266] and its present or excluded ancestor Seizure [HP:0001250]

The validator points out that *Seizure* is an ancestor of *Focal clonic seizure* and should, therefore, not be used
as an annotation of the individual.

That's it for now. There are more validators to come!
Loading
Loading