Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul Ontology Annotations #120

Merged
merged 4 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ARC specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ The `study` file MUST follow the [ISA-XLSX study file specification](ISA-XLSX.md

Protocols that are necessary to describe the sample or material creating process can be placed under the protocols directory.

Further explications about data entities defined in the assay MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each assay. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).
Further explications about data entities defined in the study MAY be stored in [ISA-XLSX](#isa-xlsx-format) format in a `isa.datamap.xlsx` file, which SHOULD exist for each study. Further details on `isa.datamap.xlsx` are specified [in the isa-xlsx specification](ISA-XLSX.md#datamap-file).

## Assay Data and Metadata

Expand Down
75 changes: 63 additions & 12 deletions ISA-XLSX.md
Original file line number Diff line number Diff line change
Expand Up @@ -692,29 +692,61 @@ In this example, there is a measurement of two `Samples`, namely `input1` and `i

## Ontology Annotations

Where a value is an `Ontology Annotation` in a table file, `Term Accession Number` and `Term Source REF` fields MUST follow the column cell in which the value is entered. These two columns SHOULD contain further ontological information about the header. In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form `<IDSPACE>:<LOCALID>` (specified [here](http://obofoundry.org/id-policy)) inside `()` brackets.
Where a value is an `Ontology Annotation` in an annotation table, `Term Accession Number` and `Term Source REF` columns MUST follow the main column.

An `Ontology Annotation` MAY be applied to any appropriate `Characteristic`, `Parameter`, `Factor`, `Component` or `Protocol Type`.

This implements `Ontology Annotation` from the ISA Abstract Model.

#### Ontology Annotation Headers

The header of the main column MUST contain the structural column type followed by the `name` of the ontology term in `[]` brackets.
There SHOULD be a `space` between the column type and the `[` bracket.

The headers of the two annotation columns SHOULD contain further ontological information about the ontology term of the main header.
In this case, following the static header string, separated by a single space, there MUST be a short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form `<IDSPACE>:<LOCALID>` (specified [here](http://obofoundry.org/id-policy)) inside `()` brackets.

In the other case, i.e. when the annotation columns do not contain further ontological information, the static header strings MUST be either followed by a single space and empty `()` brackets or nothing.

#### Ontology Annotation Values

The value in the main column MUST contain the name of the ontology term.

The value in the `Term Source REF` column MUST either contain a short identifier for the `IDSPACE`, which identifies the ontology containing the term, or be left empty.

The value in the `Term Accession Number` column MUST either contain a value formatted in one of the following formats, or be left empty:
- `LOCALID` of the ontology, which is only applicable if the matching `IDSPACE` is given in the `Term Source REF` column
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOCALID => do we support this in ARCtrl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, will open an Issue

- short ontology term identifier formatted as CURIEs (prefixed identifiers) of the form `<IDSPACE>:<LOCALID>` (specified [here](http://obofoundry.org/id-policy))
- `URL` pointing to the ontology term

#### Ontology Annotation Example

For example, a characteristic type `organism` with a value of `Homo sapiens` can be qualified with an `Ontology Annotation` of a term from NCBI Taxonomy as follows:

| Characteristic [organism] | Term Source REF (OBI:0100026) | Term Accession Number (OBI:0100026) |
|-----------------------------|-------------------|------------------------------------------------------|
| Homo sapiens | NCBITaxon | [http://…/NCBITAXON_9606](http://.../NCBITAXON_9606) |

An `Ontology Annotation` MAY be applied to any appropriate `Characteristic`, `Parameter`, `Factor`, `Component` or `Protocol Type`.
| Homo sapiens | NCBITaxon | [http://…/NCBITAXON_9606](http://purl.obolibrary.org/obo/NCBITAXON_9606) |

This implements `Ontology Annotation` from the ISA Abstract Model.
> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.

## Unit

Where a value is numeric, a `Unit` MAY be used to qualify the quantity. In this case, following the column in which a `Unit`
is used, a `Unit` heading MUST be present, and MAY be further annotated as an [`Ontology Annotation`](#ontology-annotations).
Where a value is numeric, a `Unit` MAY be used to qualify the quantity.
In this case, the main column must be followed by a `Unit` column, which in turn SHOULD be further annotated as an [`Ontology Annotation`](#ontology-annotations), being followed by `Term Accession Number` and `Term Source REF` columns.

- The headers of the annotation columns then refer to the header of the main column.
- The values of the annotation columns then refer to the unit, and not to the numeric value of the main column.

For example, to qualify the value `300` with a `Unit` `Kelvin` qualified as an [`Ontology Annotation`](#ontology-annotations) from the Units Ontology declared
in the Ontology Sources with `UO`:
For example, in the following, the header ontology `temperature` is further qualified with the CURIE `PATO:0000146`.
The value `300` is qualified with a `Unit` `Kelvin`, which is further qualified as an [`Ontology Annotation`](#ontology-annotations) from the Units Ontology declared in the Ontology Sources with `UO`:

| Parameter [temperature] | Unit | Term Source REF (PATO:0000146) | Term Accession Number (PATO:0000146) |
|--------------------------------|--------|-------------------|------------------------------------------------------|
| 300 | Kelvin | UO | [http://…/obo/UO_0000012](http://.../obo/UO_0000012) |
| 300 | Kelvin | UO | [http://…/obo/UO_0000012](http://purl.obolibrary.org/obo/UO_0000012) |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.


## Characteristics
Expand All @@ -728,6 +760,9 @@ For example, a characteristic type Organism with a value of Homo sapiens can be
|-------------------------------|-------------------|-------------------------|
| Liver | MeSH | D008099 |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `LOCALID`. The associated `IDSPACE` to identify the ontology term is given in the `Term Source REF` column.

## Factors

A `Factor` is an independent variable manipulated by an experimentalist with the intention to affect biological systems in a way that can be measured by an assay. This field holds the actual data for the `Factor` named between the square brackets (as declared in the `Study Factors` section of a top-level metadata sheet) so MUST match, for example, `Factor [compound]`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations).
Expand All @@ -736,23 +771,33 @@ A `Factor` is an independent variable manipulated by an experimentalist with the
|------------------------|-------------------|-------------------------|
| Male | MeSH | D008297 |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `LOCALID`. The associated `IDSPACE` to identify the ontology term is given in the `Term Source REF` column.

## Components

A `Component` is a consumable or reusable physical entity used in the experimental workflow. It is formatted in the pattern `Component [<category term>]`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations).

| Component [Measurement Device] | Term Source REF (NCIT_C81182) | Term Accession Number (NCIT_C81182) |
| Component [Measurement Device] | Term Source REF (NCIT:C81182) | Term Accession Number (NCIT:C81182) |
|------------------------|-------------------|-------------------------|
| Illumina MiniSeq | OBI | [http://…/obo/OBI_0003114](http://purl.obolibrary.org/obo/OBI_0003114) |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.


## Parameters

A `Parameter` can be used to specify any additional information about the experimental setup, that does not fall under the aforementioned 3 categories. It is formatted in the pattern `Parameter [<category term>]`. The value MUST be free text, numeric, or an [`Ontology Annotation`](#ontology-annotations).

| Parameter [time] | Unit | Term Source REF (PATO_0000165) | Term Accession Number (PATO:0000165) |
| Parameter [temperature] | Unit | Term Source REF (NCRO:0000029) | Term Accession Number (NCRO:0000029) |
|--------------------------------|--------|-------------------|------------------------------------------------------|
| 300 | Kelvin | UO | [http://…/obo/UO_0000032](http://purl.obolibrary.org/obo/UO_0000032) |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.


## Comments

A `Comment` can be used to provide some additional information. Columns headed with `Comment[<comment name>]` MAY appear anywhere in the Annotation Table. The comment always refers to the Annotation Table. The value MUST be free text.
Expand Down Expand Up @@ -824,6 +869,9 @@ Every `Datamap Table sheet` SHOULD contain an `Unit` column. The `Unit` adds a u
|------------------------|-------------------|-------------------------|
| milligram per milliliter | UO | [http://…/obo/UO_0000176](http://purl.obolibrary.org/obo/UO_0000176) |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.

## Object Type column

Every `Datamap Table sheet` SHOULD contain an `Object Type` column. The `Object Type` defines the shape or format in which the data node is represented. The value MUST be free text, or an [`Ontology Annotation`](#ontology-annotations).
Expand All @@ -832,6 +880,9 @@ Every `Datamap Table sheet` SHOULD contain an `Object Type` column. The `Object
|------------------------|-------------------|-------------------------|
| Float | NCIT | [http://…/obo/NCIT_C48150](http://purl.obolibrary.org/obo/NCIT_C48150) |

> [!NOTE]
> In this example, the value in the `Term Accession Number` column is formatted as a `URL`, but shortened for the purpose of markdown-formatting.

## Description column

Every `Datamap Table sheet` SHOULD contain a `Description` column. The `Description` gives additional, humand readable context about the data node. The value MUST be free text.
Expand Down