Split data ingestion into process and validate #54

pipliggins · 2024-06-25T12:31:12Z

Most of the time spent on the ingestion pipeline is taken up by the process of validating the fhirflat format, which requires creating a pydantic class object for each resource then re-creating the FHIRflat parquet.

Splitting up the pipeline into two steps (which can be run together using a single command if desired) should allow for greater parallelisation (see #44) and allows for data to be read in and edited before validation, without having to run the whole pipeline again for a few validation errors.

pipliggins self-assigned this Jun 25, 2024

pipliggins mentioned this issue Jul 30, 2024

Split ingestion pipeline #61

Merged

pipliggins closed this as completed in #61 Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split data ingestion into process and validate #54

Split data ingestion into process and validate #54

pipliggins commented Jun 25, 2024

Split data ingestion into process and validate #54

Split data ingestion into process and validate #54

Comments

pipliggins commented Jun 25, 2024