Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split data ingestion into process and validate #54

Closed
pipliggins opened this issue Jun 25, 2024 · 0 comments · Fixed by #61
Closed

Split data ingestion into process and validate #54

pipliggins opened this issue Jun 25, 2024 · 0 comments · Fixed by #61
Assignees

Comments

@pipliggins
Copy link
Collaborator

Most of the time spent on the ingestion pipeline is taken up by the process of validating the fhirflat format, which requires creating a pydantic class object for each resource then re-creating the FHIRflat parquet.

Splitting up the pipeline into two steps (which can be run together using a single command if desired) should allow for greater parallelisation (see #44) and allows for data to be read in and edited before validation, without having to run the whole pipeline again for a few validation errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant