Skip to content

Data Browser

Colleen Reilly edited this page Nov 15, 2024 · 8 revisions

The Data Browser UI is designed to validate data architecture and values for custom cohorts.

File Formats

The data browser UI accepts user inputs for a data dictionary. The app auto detects either a phenotree or terms table format.

Data requirements for both formats:

  1. All term_ids must be unique.
  2. Terms must not appear in different levels of the same parent branch. For example in the phenotree format, term C and term D in this branch

Level_1 Level_2 Level_3 Level_4
term A term B term C term D
must not appear at different levels in this branch:

Level_1 Level_2 Level_3 Level_4
term A term C term D -

Phenotree

Tab delimited file with the following columns:

Variable: STR

Required. variable (i.e. ids) for each term

Type: STR

Required. Types are integer, float, and categorical.

Level_[##]: STR

Optional. Creates the hierarchy (i.e. tree branches). Order left to right, from highest to lowest, no extraneous columns in between, and with the naming convention: Level_[##].

Categories: JSON

Optional. Specify term labels, show/hide, etc. options for individual terms in a JSON format. Example: {"1":{"label":"Yes"}, "0":{"label":"No"}}

Unit: String

Optional. Specify the unit for numeric terms. Only applies to numeric terms.

Options:

  • Label: STR. Required. Value will be displayed as the label in the charts
  • uncomputable: true/false. Optional. True removes from term from chart displays

Below is a copied example of a tab delimited phenotree.

Level_1 Level_2 Level_3 Level_4 Variable type Categories Unit
Genomic Profiling Status Whole Genome Sequencing - - wgs_sequenced categorical {"1":{"label":"Yes"}}
Genomic Profiling Status SNP Array 6.0 - - snp6_genotyped categorical {"1":{"label":"Yes"}, "0":{"label":"No"}}
Cancer- related Variables Treatment Alkylating Agents, mg/m2 Cyclophosphamide cyclophosphamide_5 float {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown"} }
Cancer- related Variables Treatment Alkylating Agents, mg/m2 Cumulative Alkylating Agents aaclassic_5 float {"0":{"label":"not exposed;"}, "-8888":{"label":"exposed, dose unknown"}, "-9999":{"label":"unknown exposure"} }
Genomic Profiling Status Age (years) at SNP Array 6.0 sample collection - - snp6_sample_age integer {"-994":{"label":"N/A:CCSS"}} years

Only use blanks or ‘-’ for non applicable level columns. No blanks or ‘-’ between levels.

Don’t:

Level_1 Level_2 Level_3
term A term B --
-- -- term C

The dashes in the second row will throw an error.

Do:

Level_1 Level_2 Level_3
term A term B -
term A term B term C
Both Level_1 and Level_2 columns are complete.

Terms table

Tab delimited file with the following columns:

term_id: STR

Required.

parent_id: STR

Required. Immediate parent ID

name: STR

Required. Label for the term

type: STR

Required.

  • non-graphible: applies to parent terms without values
  • categorical: string or uncomputable values
  • integer
  • float

values: STR

Optional. value labels or the term separated by a semicolon. E.g. 1=Yes; 2=No

Below is an example of a tab delimited data dictionary.


term_id	parent_id	name	type	values
gps	root	Genomic Profiling Status	non graphable
wgs_sequenced	gps	Whole Genome Sequencing	categorical	1=Yes;
snp6_genotyped	gps	Affymetrix Genome-Wide Human SNP Array 6.0	categorical	1=Yes; 0=No; -994=N/A: CCSS
wgs_curated	gps	Whole Genome Sequencing Curated Variant Calls	categorical	1=Yes; -9999=Pending review

Data requirements:

The parent_id for grandparents at the start of the branch is root. For example, the parent ids for this branch: root, term_1, term_2, and term_3.

term_id parent_id name
term_1 root Term 1
term_2 term_1 Term 2
term_3 term_2 Term 3
term_4 term_3 Term 4

User Interface

The user interface is available from the Data Browser card on our homepage. Submitting the dictionary file for a custom cohort displays a new UI with a suite of tools to explore the data. First the dictionary will appear.

The terms appear in a collapsible list, shown in the example below. The terms with white background are not linked to any data but show terms underneath by clicking on the ‘+’. Think of these terms as headers and subheaders for the collapsible list.

Terms shown as clickable blue pills are intended to link to data. The example below depicts a collapsible list with blue pills and white, non data linked terms.

Screenshot 2024-06-04 at 10 50 26 AM

**Clicking on a blue pill will show an error.