-
Notifications
You must be signed in to change notification settings - Fork 10
Tutorial
Phenopackets can be encoded in either JSON or YAML, there is no difference between the two. We will use YAML here for compactness.
Our example involves a case study with three people (phenopackets can be used to describe other kinds of entities such as variants, examples on these cases will follow).
We list all people inside a persons
block:
persons:
- id: "#1"
date_of_birth: 1999-01-01
sex: M
- id: "#2"
sex: M
- id: "#3"
sex: M
Note that in YAML, a -
denotes an element in a list. The value of the persons
property is always a list.
Here we are providing a DOB for the first person, and biological sexes for all persons.
Note the identifiers used. There are strict rules on the structure of identifiers used in phenopackets, and on the rules for mapping these to real-world entities. We will return to these in more detail later. In this particular case we are using hash identifiers; we use these when the identifiers are local to the packet and are not intended to be referenced from outside. If we had a global identifier for the person, we could use this instead.
Next we will describe the conditions for these persons. In phenopackets, there are two distinct types of conditions: phenotypes and diseases.
We first list any disease diagnoses. We only have one, for person number 1:
disease_diagnoses:
- entity: "#1"
disease_occurrence:
types:
- id: OMIM:615426
label: amyotrophic lateral sclerosis type 20
Note that the diseases block is a separate block from the persons block. We refer back to individuals in the block using the id, rather than nesting the disease inside the person block. This allows for more flexibility in how persons and diagnoses are exchanged as messages.
The value for types
is a list. Although there will
typically only be one disease here, there are reasons for having a
uniform list representation, which we will return to later.
We have not specified any diseases for person 2 and 3. Although we might assume these are not disease carriers, this is not explicitly stated and cannot be known for sure. We will return to how to make negative assertions later on.
Next up is phenotype associations. Let's start with a two phenotypes for person 1:
phenotype_profile:
- entity: "#1"
phenotype:
types:
- id: HP:0003560
label: Muscular dystrophy
- entity: "#1"
phenotype:
types:
- id: HP:0007354
label: Amyotrophic lateral sclerosis
Each element of the list is a phenotype association. The concept of an association is a recurring feature of the phenopacket format.
Although the structure may appear overly nested here, this because we
use a highly normalized model that allows maximal hooks for
extensibility. For example, we can add onset
into the phenotype
object, where onset is described either with an ontology term or with
a quantitative range. We can also attach a natural language
description to the phenotype, to complement or extend the ontological
one.
Similarly, the association itself can have evidence and additional provenance and audit information associated with it. This is shown in the following example:
phenotype_profile:
- entity: "#1"
phenotype:
types:
- id: HP:0003560
label: Muscular dystrophy
onset:
types:
- id: HP:0003584
label: Late onset
description: additional notes on this phenotype here
evidence:
- types:
id: TAS
source:
id: PMID:23455423
title: Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS
- entity: "#1"
phenotype:
types:
- id: HP:0007354
label: Amyotrophic lateral sclerosis
The example can be visualized as: