-
Notifications
You must be signed in to change notification settings - Fork 5
Data Browser
The Data Browser UI is designed to validate data architecture and values for custom cohorts.
The data browser UI accepts user inputs for a data dictionary. The app auto detects either a phenotree or terms table format.
Data requirements for both formats:
- All term_ids must be unique.
- Terms must not appear in different levels of the same parent branch. For example in the phenotree format, term C and term D in this branch
Level_1 | Level_2 | Level_3 | Level_4 |
---|---|---|---|
term A | term B | term C | term D |
Level_1 | Level_2 | Level_3 | Level_4 |
---|---|---|---|
term A | term C | term D | - |
Tab delimited file with the following columns:
Required. variable (i.e. ids) for each term
Required. Types are integer, float, and categorical.
Optional. Creates the hierarchy (i.e. tree branches). Order left to right, from highest to lowest, no extraneous columns in between, and with the naming convention: Level_[##].
Optional. Specify term labels, show/hide, etc. options for individual terms in a JSON format. Example: {"1":{"label":"Yes"}, "0":{"label":"No"}}
Optional. Specify the unit for numeric terms. Only applies to numeric terms.
Options:
- Label: STR. Required. Value will be displayed as the label in the charts
- uncomputable: true/false. Optional. True removes from term from chart displays
Below is a copied example of a tab delimited phenotree.
Level_1 | Level_2 | Level_3 | Level_4 | Variable | type | Categories | Unit |
---|---|---|---|---|---|---|---|
Genomic Profiling Status | Whole Genome Sequencing | - | - | wgs_sequenced | categorical | {"1":{"label":"Yes"}} | |
Genomic Profiling Status | SNP Array 6.0 | - | - | snp6_genotyped | categorical | {"1":{"label":"Yes"}, "0":{"label":"No"}} | |
Cancer- related Variables | Treatment | Alkylating Agents, mg/m2 | Cyclophosphamide | cyclophosphamide_5 | float | {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown"} } | |
Cancer- related Variables | Treatment | Alkylating Agents, mg/m2 | Cumulative Alkylating Agents | aaclassic_5 | float | {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown exposure"} } | |
Genomic Profiling Status | Age (years) at SNP Array 6.0 sample collection | - | - | snp6_sample_age | integer | {"-994":{"label":"N/A:CCSS"}} | years |
Only use blanks or ‘-’ for non applicable level columns. No blanks or ‘-’ between levels.
Don’t:
Level_1 | Level_2 | Level_3 |
---|---|---|
term A | term B | -- |
-- | -- | term C |
The dashes in the second row will throw an error.
Do:
Level_1 | Level_2 | Level_3 |
---|---|---|
term A | term B | - |
term A | term B | term C |
Tab delimited file with the following columns:
Required.
Required. Immediate parent ID
Required. Label for the term
Required.
- non-graphible: applies to parent terms without values
- categorical: string or uncomputable values
- integer
- float
Optional. value labels or the term separated by a semicolon. E.g. 1=Yes; 2=No
Below is an example of a tab delimited data dictionary.
term_id parent_id name type values
gps root Genomic Profiling Status non graphable
wgs_sequenced gps Whole Genome Sequencing categorical 1=Yes;
snp6_genotyped gps Affymetrix Genome-Wide Human SNP Array 6.0 categorical 1=Yes; 0=No; -994=N/A: CCSS
wgs_curated gps Whole Genome Sequencing Curated Variant Calls categorical 1=Yes; -9999=Pending review
Data requirements:
The parent_id for grandparents at the start of the branch is root. For example, the parent ids for this branch: root, term_1, term_2, and term_3.
term_id | parent_id | name |
---|---|---|
term_1 | root | Term 1 |
term_2 | term_1 | Term 2 |
term_3 | term_2 | Term 3 |
term_4 | term_3 | Term 4 |
The user interface is available from the Data Browser card on our homepage. Submitting the dictionary file for a custom cohort displays a new UI with a suite of tools to explore the data. First the dictionary will appear.
The terms appear in a collapsible list, shown in the example below. The terms with white background are not linked to any data but show terms underneath by clicking on the ‘+’. Think of these terms as headers and subheaders for the collapsible list.
Terms shown as clickable blue pills are intended to link to data. The example below depicts a collapsible list with blue pills and white, non data linked terms.
**Clicking on a blue pill will show an error.