Below is a data dictionary for data stored in ID3C. Data is organized by its location within the ID3C schema. Relational linkages have been preserved where possible.
- Warehouse
- Receiving
- Shipping
- HCov19 Presence Absence Result v1
- Genome Submission Metadata v1
- Sample with Best Available Encounter Data v1
- Incidence Model Observation v4
- Observation With Presence Absence Result V3
- FHIR Encounter Details V2
- SCAN Enrollments V1
- SCAN Encounters with Best Available Vaccination Data v1
- SCAN Encounters V1
- UW Reopening Encounters V1
- UW Reopening Enrollment FHIR Encounter Details V1
- Operations
The warehouse schema is where raw data from projects ingested into ID3C lives. The data in this schema is standardized and analysis ready.
Rows in the encounter
table represent: An interaction with an individual to collect point-in-time information or samples
encounter_id
: Internal id of this encounter; integeridentifier
: External identifier for this encounter; case-sensitive; textindividual_id
: Internal id of the individual who was encountered; integersite_id
: Internal id of the site where the encounter occurred; integerencountered
: When the encounter occurred; timestampdetails
: Additional information about this encounter which does not have a place in the relational schema; jsonbage
: Age of individual at the time of this encounter; intervalcreated
: When this encounter record was first processed by ID3C; timestampmodified
: When this encounter record was last modified; timestamp
Rows in the individual
table represent: A single, real person (or other member of a population)
individual_id
: Internal id of this individual; integeridentifier
: External identifier for this individual (e.g. study participant id); textsex
: Sex assigned at birth; assigned-sex (see fhir documentation here)details
: Additional information about this individual which does not have a place in the relational schema; jsonbcreated
: When this individual record was first processed by ID3C; timestampmodified
: When this individual record was last modified; timestamp
Rows in the sample
table represent: A sample collected from an individual during a specific encounter
sample_id
: Internal id of this sample; integeridentifier
: A unique external identifier assigned to this sample; textencounter_id
: The encounter where the sample was collected; integerdetails
: Additional information about this sample which does not have a place in the relational schema; jsonbcollection_identifier
: A unique external identifier assigned to the collection of this sample; textcollected
: The date when this sample was collected; datecreated
: When this sample was first processed by ID3C; timestampmodified
: When this sample was last modified; timestamp
Rows in the presence_absence
table represent: A presence/absence test result from an individual sample
presence_absence_id
: Internal id of this presence/absence test; integeridentifer
: External id of this presence/absence test; integersample_id
: Internal id of the sample that is being tested for presence/absence of a target; integertarget_id
: Internal id of the target for the presence/absence test; integerpresent
: The result of a presence/absence test. True if the target is found, False if not, null if inconclusive; booleandetails
: The details for each replicate of a presence/absence test; integercreated
: When this presence/absence test was first processed by ID3C; timestampmodified
: When this presence/absence test was last modified; timestamp
Rows in the organism
table represent: Hierarchical classification of taxa
organism_id
: Internal id of this organism; used only for internal foreign keys; integerlineage
: Label tree of ancestors with this organism as the leaf/final label; ltreeidentifiers
: Key-value pairs of external identifiers (i.e. dbxrefs) for this organism; keys should use well-known database or ontology prefix (e.g. NCBITaxon); hstoredetails
: Additional information about this organism which does not have a place in the relational schema
Rows in the target
table represent: A thing (e.g. pathogen) that can be detected in presence-absence testing
target_id
: Internal id of this target; integeridentifier
: An identifier assigned to this target by the testing center, assumed to be stable and unique; textcontrol
: Whether the this target is a control. True if it is, False if not; booleanorganism_id
: Internal id of the organism detected by this target; most-specific available; integer
Rows in the location
table represent: Hierarchical geospatial locations. Note these locations use EPSG:4326, a latitude/longitude coordinate system used by traditional GPS services
location_id
: Internal id of this location; integeridentifier
: External identifier by which this location is known; citextscale
: Relative size or extent of this location (e.g. country, state, city); citexthierarchy
: Set of key-value pairs describing where this location fits within a hierarchy; hstorepoint
: Representative point geometry in WGS84 (EPSG:4326); geometry(Point,4326)polygon
: Multi-polygon geometry in WGS84 (EPSG:4326); geometry(MultiPolygon,4326)simplified_polygon
: Multi-polygon geometry in WGS84 (EPSG:4326) with reduced complexity/accuracy, intended for cartographic use; geometry(MultiPolygon,4326)details
: Additional information about this location which does not have a place in the relational schema; jsonbcreated
: When this location was first processed by ID3C; timestampmodified
: When this location was last modified; timestamp
Rows in the site
table represent: A real-world or virtual/logical location where individuals are encountered
site_id
: Internal id of this site; integeridentifier
: External identifier for this site; case-preserving but must be unique case-insensitively; textdetails
: Additional information about this site which does not have a place in the relational schema; jsonb
Rows in the encounter_location
table represent: The association between an encounter and location
encounter_id
: Internal id of the encounter; integerlocation_id
: Internal id of the location; integerrelation
: The relation between the encounter and location, e.g. collection site, residence, workplace, etc; citextdetails
: Additional information about this encounter-location which does not have a place in the relational schema; jsonbcreated
: When this encounter/location relation was first processed by ID3C; timestampmodified
: When this encounter/location relation was last modified; timestamp
Rows in the encounter_location_relation
table represent: Controlled vocabulary for encounter/location relationships
relation
: A relation between an encounter and location, e.g. collection site, residence, workplace, etc; citextpriority
: Arbitrary inter-relation ranking, where smaller numbers mean greater importance within this ID3C instance. Used to determine the default "primary" location related to an encounter when there is more than one; integer
Rows in the identifier_set
table represent: A conceptual group of identifiers used for a common purpose
identifier_set_id
: Internal id of this identifier set; integername
: The well-known, unique name of the set (e.g.collections-scan
,collections-airs
, etc.); textdescription
: A plain text description of the purpose and usage of this set; textuse
: A standard identifier use type, e.g. sample, collection, clia; citext
Rows in the identifier
table represent: Globally unique ids and their CualID barcodes, used for tracking physical things
uuid
: UUIDv4 that should be used in programmatic and downstream data processing; relates to both theidentifier
andcollection-identifier
of sample records; uuidbarcode
: Short suffix of UUID (CualID) used as a tracking barcode; citextidentifier_set_id
: Internal id of the identifier set this identifier belongs to; integergenerated
: When this identifier was generated; timestamp
Rows in the identifier_set_use
table represent: Controlled vocabularly defining how an identifier set is used
use
: A standard identifier use type, e.g. sample, collection, clia; citextdescription
: A plain text description of this identifier use type; text
Rows in the sequence_read_set
table represent: A set of references to sequence reads from an individual sample
sequence_read_set_id
: Internal id of this set of sequence reads; integersample_id
: Internal id of the sample from which the sequence reads were generated; integerurls
: External object store urls that hold the sequence reads; text[]details
: Additional information about this sequence read set which does not have a place in the relational schema; jsonbcreated
: When this sequence read set was first processed by ID3C; timestampmodified
: When this sequence read set was last modified; timestamp
Rows in the consensus_genome
table represent: A whole consensus genome for an organism extracted from a sample that may have been generated from a sequence read set
consensus_genome_id
: Internal id of this consensus genome; integersample_id
: Internal id of the sample from which this consensus genome was extracted; integerorganism_id
: Internal id of the organism to which this consensus genome belongs to; integersequence_read_set_id
: Internal id of the sequence read set that was used to generate this consensus genome; may be null if the genome was generated from an external source; integerdetails
: Additional information about this consensus genome which does not have a place in the relational schema; jsonb
Rows in the genomic_sequence
table represent: A genomic sequence from a genome. There may be multiple sequences per genome if the genome is segmented (e.g. Influenza genomes)
genomic_sequence_id
: Internal id of this genomic sequence; integeridentifier
: An external, unqiue identifier for this genomic sequence; textsegment
: The segment of the genome that this genomic sequence represents; citextseq
: A succession of letters indicating the order of nucleotides of the genomic sequence; textconsensus_genome_id
: Internal id of the consensus genome that this genomic sequence belongs to; integerdetails
: Additional information about this genomic sequence which does not have a place in the relational schema; jsonbcreated
: When this genomic sequence was first processed by ID3C; timestampmodified
: When this genomic sequence was last modified; timestamp
Rows in the kit
table represent: A self-test kit connected to an encounter and 2 different samples
kit_id
: Internal id of this kit; integeridentifier
: A unique exteranl identifier assigned to this kit; textencounter_id
: Internal id of the encounter to which this kit is associated; integerrdt_sample_id
: Internal id of the sample collected from the residual RDT extractant; integerutm_sample_id
: Internal id of the sample collected from the second swap; integerdetails
: Additional information about this kit which does not have a place in the relational schema; jsonb
The receiving table stores non-relational (or minimally-relational) data before it is transformed and loaded into the warehouse by one of our ETL processes
Rows in the clinical
table represent: A set of clinical documents; append only
clinical_id
: Internal id of this clinical record set; integerdocument
: JSON document created from a pre-processed Excel file; jsonreceived
: When this clinical document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the consensus_genome
table represent: A set of consensus genome documents; append only
consensus_genome_id
: Internal id of this consensus genome; integerdocument
: Original consensus genome JSON document; jsonreceived
: When this consensus genome document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the enrollment
table represent: A set of enrollment documents; append only
enrollment_id
: Internal id of this enrollment document; integerdocument
: Original enrollment JSON document; jsonreceived
: When this enrollment document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the fhir
table represent: A set of FHIR documents; append only
fhir_ud
: Internal id of this fhir record set; integerdocument
: JSON document in FHIR format; jsonreceived
: When this FHIR document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the longitudinal
table represent: A set of longitudianl documents; append only
longitudinal_id
: Internal id of this longitudinal document; integerdocument
: JSON document created from a pre-processed Excel file; jsonreceived
: When this longitudinal document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the manifest
table represent: A set of manifest records; append only
manifest_id
: Internal id of this manifest document; integerdocument
: Manifest record as a JSON document; jsonreceived
: When this manifest document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the presence_absence
table represent: A set of presence/absence documents; append only
presence_absence_id
: Internal id of this presence/absence document; integerdocument
: Original presence/absence JSON document; jsonreceived
: When this presence/absence document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the redcap_det
table represent: A set of REDCap DET request data documents; append only
redcap_det_id
: Internal id of this REDCap DET document; integerdocument
: JSON document created from REDCap DET request data; jsonreceived
: When this REDCap DET document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Rows in the sequence_read_set
table represent: A set of references to sequence reads; append only
sequence_read_set_id
: Internal id of this sequence read set document; integerdocument
: Original sequence read set JSON document; jsonreceived
: When this sequence read set document was received; timestampprocessing_log
: Event log recording details of ETL into data warehouse; jsonb
Outgoing warehouse data prepared for external consumers
fhir_questionnaire_responses_v1
is a view of FHIR Questionnaire Responses stored in an encounter's details
encounter_id
: Internal id of the encounter this response is linked with; integerlink_id
: Identifier of the question this response relates to; textstring_response
: If the response to the question is a string, it is present in this field; text[]boolean_response
: If the response to the question is a boolean, it is present in this field; booleandate_response
: If the response to the question is a date, it is present in this field; text[]integer_response
: If the response to the question is an integer, it is present in this field; integer[]numeric_response
: If the response to the question is numeric, it is present in this field; numeric[]code_response
: If the response to this question is encoded, it is present in this field; text[]
genome_submission_metadata_v1
is a view of minimal metadata used for consensus genome submissions
sfs_sample_identifier
: A unique external identifier assigned to this sample; textsfs_sample_barcode
: Short suffix of UUID (CualID) used as a tracking barcode, relates to the sample identifier; citextsfs_collection_barcode
: Short suffix of UUID (CualID) used as a tracking barcode, relates to the collection identifier; citextcollection_date
: The date on which this sample was collected; timestampswab_type
: The type of swab used to collect the sample (e.g.ans
,mtp
,np
); textsource
: The study group for which this sample belongs to (e.g.SCH
,SFS
,AIRS
, etc.); textpuma
: The puma code associated with the collection of this sample; textcounty
: The county to which the census tract associated with the collection of this sample; textstate
: The state associated with the collection of this sample; textbaseline_surveillance
: Whether or not this sample can be considered baseline surveillance; This includes samples that were randomly sampled for genomic surveillance, not identified in a targeted sampling effort, and/or sampled across targeted efforts in order to better represent a community; For more details see descriptions here; boolean
hcov19_presence_absence_result_v1
is a view of hCov-19 Samples with Non-Clinical Presence / Absence Results
presence_absence_id
: Internal id of the presence / absence result this result is linked with; integerresult_ts
: The timestamp associated with when this result was released. Note that this is the internal samplify timestamp (in PST) when the result was released if it is after 2022-07-18, the internal samplify timestamp (in GMT) if it is after 2022-06-09 (but before 2022-07-18), or the last modified date of the record if it is before 2022-06-09; timestamphcov19_result_release_date
: The timestamp when this record was created; timestamptarget
: Internal id of the target for the presence/absence test; integerorganism
: Internal id of the organism detected by this target; most-specific available; integer
incidence_model_observation_v4
is a view of warehoused samples and important questionnaire responses for modeling and visualization
encounter
: External identifier for this encounter; case-sensitive; textencountered_week
: The week during which this encounter occurred; textsite
: External identifier for the site at which this encounter occurred; textsite_type
: Classification of the role of a site (e.g. retrospective, clinical); textindividual
: External identifier for this individual (e.g. study participant id); textsex
: Sex assigned at birth; assigned-sex (see fhir documentation here)age_bin_fine
: Five year range into which an individual's age at encounter falls; intervalrangeage_range_fine_lower
: Lower bound of the above five year range; inclusive; numericage_range_fine_upper
: Upper bound of the above five year range; exclusive; numericage_range_coarse
: Coarse age range into which an individual's age at encounter falls; intervalrangeage_range_coarse_lower
: Lower bound of the above coarse age range; inclusive; numericage_range_coarse_upper
: Upper bound of the above coarse age range; exclusive; numericaddress_identifier
: External identifier by which this residence location is known; citextresidence_census_tract
: Census tract containing this residence in the location hierearchy; textresidence_puma
: Puma code associated with this residence in the location hierarchy; textflu_shot
: Whether or not this individual had received a flu shot when encountered; booleansymptoms
: Symptoms experienced by the individual at the time of encounter; text arraymanifest_origin
: The origin of the sample (e.g.hmc_retro
,sch_retro
); textswab_type
: The type of swab used to collect the sample (e.g.ans
,mtp
,np
); textcollection_matrix
: The collection matrix method for this sample (e.g.dry
,utm_vtm
); textsample
: A unique external identifier assigned to this sample; text
observation_with_presence_absence_result_v3
is a view joining the incidence_model_observation_v4
, the presence_absence_result_v2
, and the hcov19_observation_v1
views.
target
: Internal id of the target for the presence/absence observation; integerorganism
: Internal id of the organism detected by this target; most-specific available; integerpresent
: The result of a presence/absence test. True if the target is found, False if not, null if inconclusive; booleanpresence
: The result of a presence/absence test. 1 if the target is found, 0 if not, null if inconclusive; integerdevice
: The device this presence/absence test was run on (e.g.TaqmanQPCR
,OpenArray
); textassay_type
: The category of the assay for this test (e.g.Clia
,Research
); text- all other columns are defined above in the documentation for
incidence_model_observation_v4
.
sample_with_best_available_encounter_data_v1
is a view of warehoused samples combined with best available envounter date and site details.
sample_id
: Internal id of this sample; integerhas_encounter_data
: Whether or not this sample has associated encounter data; booleanbest_available_encounter_date
: The date on which the best available encounter occurred; dateseason
: Season during which this encounter occurred; Y1 is prior to October 1st 2019, Y2 prior to November 1st 2020, Y3 prior to November 1st 2021, and Y4 prior to November 1st 2022; textbest_available_site_id
: Internal identifier for the site where this encounter took place; integerbest_available_site
: External identifier for the site where this encounter took place; textbest_available_site_type
: Classification of the role of a site (e.g.retrospective
,clinical
) where this encounter took place; textbest_available_site_category
: Classification of the category of a site (e.g.community
,hospital
) where this encounter took place; text
fhir_encounter_details_v2
is a view of SCAN encounter data in the FHIR format.
For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.
scan_enrollments_v1
is a view of enrollment data from the SCAN project.
illness_questionnaire_date
: The date on which the illness questionnaire for this enrollment was completed; textscan_study_arm
: The study arm under which this enrollment occurred; textpriority_code
: The priority code, if it exists, used by the participant when creating this enrollment; textpuma
: The puma code for the location associated with this enrollment record; textneighborhood_district
: The neighborhood associated with this location (e.g.ballard
,downtown
,lake_union
); textgeo_location_name
: The area name based off the Puma location associated with this enrollment. See descriptions; textkit_received
: Whether or not a kit was sent along with the enrollment for this participant; boolean
scan_encounters_with_best_available_vaccination_data_v1
is a view of SCAN encounters with best available vaccination data transformation applied
individual
: External identifier for this individual (e.g. study participant id); textencounter_id
: Internal id of this encounter; integerencountered
: Date on which this encounter occurred; datept_entered_covid_vax
: Whether or not an individual reported covid vaccine information for this encounter; textpt_entered_covid_doses
: The number of doses of a covid vaccine an individual reported having had for this encounter; textcalculated_covid_doses
: The number of doses it was calculated an individual had at this encounter; textcalculated_vac_date_1
: The calculated date at which an individual got their first dose of the vaccine (most common first dose date); textcalculated_vac_name_1
: The calculated vaccine name of an individual's first dose (most common name); textcalculated_vac_date_2
:The calculated date at which an individual got their second dose of the vaccine (most common second dose date); textcalculated_vac_name_2
: The calculated vaccine name of an individual's second dose (most common name); textcalculated_vac_date_3
: The calculated date at which an individual got their third dose of the vaccine (most common third dose date); textcalculated_vac_name_3
: The calculated vaccine name of an individual's third dose (most common name); textvac_date_1_out_of_range
: Whether the first dose date is out of order given the other calculated dates; This can nullify or alter the calculatedbest_available_vac_status
for an individual; integervac_date_2_out_of_range
: Whether the second dose date is out of order given the other calculated dates; This can nullify or alter the calculatedbest_available_vac_status
for an individual; integervac_date_3_out_of_range
: Whether the third dose date is out of order given the other calculated dates; This can nullify or alter the calculatedbest_available_vac_status
for an individual; integerbest_available_vac_status
: The best available vaccination status for an individual given the calculations above; possible values arenot_vaccinated
,boosted
,fully_vaccinated
,invalid
,partially_vaccindated
,unknown
; text
scan_encounters_v1
is a view of encounter data from the SCAN project. Note that duplicate encounters in this view may exist if an encounter encompasses multiple tests. This can happen in rare occasions when 2 kits are sent out for an encounter and registered in REDCap, with one kit on the incident report and not-tested and one kit tested.
For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.
uw_reopening_encounters_v1
is a view of encounter data tied to enrollment questionnaire data for HCT encounters.
For additional details about columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.
uw_reopening_enrollment_fhir_encounter_details_v1
is a view of enrollment details in the FHIR format.
For additional details about the columns in this view, check out the REDCap codebook for the project at this location. The full column list is excluded here due to its length.
Rows in the deliverables_log
table represent: Deliverables sent by either our Linelist or Return of Results jobs
deliverables_log_id
: Internal id of this deliverable; integersample_barcode
: The sample barcode associated with this deliverable; citextcollection_barcode
: The collection barcode associated with this deliverable; citextdetails
: Additional information about this deliverable which does not have a place in the relational schema; jsonbprocess_name
: The process which sent this deliverable; textsent
: When this deliverable was sent; timestamp
Rows in the test_quota
table represent: Quota entries for HCT testing invites
name
: Name of the process using this quota; texttimespan
: Timespan during which this quota is active; timestamp rangemax
: The maximum number of testing entries alloted to this entry timespan; integerused
: The number of testing entries currently used during this entry timespan; integer