Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Configuration Improvements #155

Open
2 tasks
mmcdermott opened this issue Aug 13, 2024 · 1 comment
Open
2 tasks

Pipeline Configuration Improvements #155

mmcdermott opened this issue Aug 13, 2024 · 1 comment
Labels
MEDS-Extract MEDS-Transform Issues for the data pre-processing transformations in MEDS_transforms Needs Clarification This issue needs further clarification before it can be operationalized Pipeline Configuration and Stage Management Issues relating to proper definition and usability of different stages in a pipeline priority:low A low priority issue. Usability / Interface

Comments

@mmcdermott
Copy link
Owner

Right now, the pipeline configuration across multiple stages, while being good overall, has some non-trivial problems:

  • Each stage can only be data or metadata, not both, based on how output directories work. To fix this, all stages should store data outputs in $output_dir/data and metadata outputs in $output_dir/metadata, like the overall MEDS directory.
  • Stages that end up doing nothing (e.g., extract metadata if there is no metadata block, e.g., Exit metadata extraction if there is no _metadata in the event configs #154), will yield empty directories that will confuse subsequent stages. Instead, subsequent stages should (somehow) know to look backwards through prior stages to find their input when output directories are empty or not properly constructed maybe? Or empty stages should just symlink their inputs to their outputs? It is unclear.
@mmcdermott mmcdermott added priority:low A low priority issue. MEDS-Extract Pipeline Configuration and Stage Management Issues relating to proper definition and usability of different stages in a pipeline Usability / Interface MEDS-Transform Issues for the data pre-processing transformations in MEDS_transforms Needs Clarification This issue needs further clarification before it can be operationalized labels Aug 13, 2024
@mmcdermott
Copy link
Owner Author

Tagging @Oufattole for tracking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
MEDS-Extract MEDS-Transform Issues for the data pre-processing transformations in MEDS_transforms Needs Clarification This issue needs further clarification before it can be operationalized Pipeline Configuration and Stage Management Issues relating to proper definition and usability of different stages in a pipeline priority:low A low priority issue. Usability / Interface
Projects
None yet
Development

No branches or pull requests

1 participant