Multirun capabilities and other improvements #10

mmcdermott · 2024-06-11T13:07:35Z

Summary by CodeRabbit

Documentation
- Enhanced installation instructions and detailed steps for running the MEDS extraction ETL process.
- Added a README for the eICU-CRD v2.0 dataset extraction example.
New Features
- Introduced a batch script for job execution with defined resource allocations.
- Added integration tests to verify ETL pipeline correctness.
- Added new configuration files for various pipeline stages and data processing.
Bug Fixes
- Updated file handling and processing conditions to improve reliability.
Refactor
- Restructured configuration files to define stages and parameters more clearly.
- Adjusted extraction configuration keys and values for consistency.
Chores
- Updated pre-commit hooks for code cleanup and unused import removal.

…onfig.

…ith the new hydra setup and custom resolvers

… from the right sources

… Still untested.

…ece errors.

…other typos

…ow in the pre-MEDS stage

… floats vs. ints occurring during the event conversion currently, though

…to eICU

…MIC as well.

Updated configs and added a resolver to get informative help messages from either the config or the script docstrings.

coderabbitai · 2024-06-11T13:07:53Z

Warning

Review failed

The pull request is closed.

Walkthrough

The changes primarily enhance the MEDS data extraction and preprocessing pipeline, adding new configurations, scripts, and instructions for handling MIMIC-IV and eICU datasets. Key updates include improved installation commands, parallel processing options, and detailed guidance for running ETL processes locally, in parallel, or over Slurm. Additionally, new configuration files and scripts streamline the data processing stages, ensuring efficient and accurate data handling.

Changes

File(s)	Change Summary
`.pre-commit-config.yaml`	Added arguments `--in-place` and `--remove-all-unused-imports` to the `autoflake` hook.
`MIMIC-IV_Example/README.md`	Updated installation and MEDS extraction ETL process instructions, including local, parallel, and Slurm options.
`MIMIC-IV_Example/joint_script.sh`	No changes to exported entities; script processes MIMIC-IV data through various steps.
`MIMIC-IV_Example/joint_script_slurm.sh`	No changes to exported entities; script processes MIMIC-IV data using `submitit` Hydra launcher.
`MIMIC-IV_Example/pre_MEDS.py`	Modified `fix_static_data` function, added conditional checks and file handling logic in `main`.
`MIMIC-IV_Example/sbatch_joint_script.sh`	Introduced a batch script for running jobs with defined resource allocations and output configurations.
`README.md`	Enhanced MEDS data extraction and preprocessing pipeline instructions, added integration tests, updated configuration files, and provided details on running the pipeline in parallel.
`configs/extraction.yaml`	Restructured to define a pipeline for extracting raw MEDS events, added stages, and updated configurations.
`configs/pipeline.yaml`	Introduced configurations for a MEDS pipeline ETL process, defining global IO settings and Hydra settings for job execution.
`configs/preprocess.yaml`	Restructured pipeline stages, introduced new stages, and updated parameters.
`eICU_Example/README.md`	Provided steps for extracting a MEDS dataset from eICU-CRD v2.0 dataset, outlining installation, data preparation, and ETL process.
`eICU_Example/configs/event_configs.yaml`	Introduced structured configuration for various patient events and data points in eICU datasets.
`eICU_Example/configs/pre_MEDS.yaml`	Introduced configuration settings for data processing pipeline, including Hydra configuration for job naming and logging.
`eICU_Example/configs/table_preprocessors.yaml`	Introduced configurations for medical data tables, specifying offset and pseudotime columns, output data columns, and warning items.
`eICU_Example/joint_script.sh`	No changes to exported entities; script automates processing of eICU data through various steps.
`eICU_Example/joint_script_slurm.sh`	No changes to exported entities; script facilitates processing of eICU data using `submitit` Hydra launcher.
`eICU_Example/pre_MEDS.py`	Added functions for pre-MEDS data wrangling for eICU datasets, utilizing Hydra for configuration management and logging.
`tests/test_extraction.py`	Modified extraction configuration kwargs, updated directory paths and assertions, adjusted directory names, and error handling in test logic.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant ETL_Process
    participant Slurm
    User->>Script: Initiate ETL Process
    Script->>ETL_Process: Start Local/Parallel Processing
    Script->>Slurm: Submit Job (if Slurm)
    Slurm->>ETL_Process: Execute ETL Stages
    ETL_Process->>Script: Process Data
    Script->>User: Completion Notification

Poem

In data's maze, we find our way,
With scripts and configs, night and day.
From MIMIC to eICU's embrace,
Extraction flows with newfound grace.
Parallel paths and Slurm's might,
Data's journey, a wondrous flight.
Through pipelines clear, our goals in sight.
🌟🐇✨

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

mmcdermott added 30 commits May 27, 2024 12:09

partial thoughts -- not working

4f18745

New structure based on conversation with Nassim

84eaf5f

Updated configs further and started README documentation for this.

2cde29a

Got the custom OmegaConf resolvers working for populating the stage c…

aaec6f3

…onfig.

Got tests to pass (including integration) on the extraction scripts w…

732a000

…ith the new hydra setup and custom resolvers

Updated MIMIC examples

841f661

Added some content to README that still needs to be re-worked a bit

bbd673d

Added joint script demonstrating joblib launcher

67f8b6c

Minor changes mostly to joint script

7b58581

Made the locking process more robust

8aa1db7

Added a slurm script -- yet untested

42bc74e

Updates to pipeline.yaml

f844168

cleaned files

6f910b9

Not remotely working; moving to local for dev

4eadda5

Updated configs and added a resolver to get informative help messages…

7c2e767

… from the right sources

Starting eICU scripts and configs

bc78cd4

Added (again untested) allergy table

4c7e2cb

Improved the structure of the pipeline and added a bunch more tables.…

f3463f5

… Still untested.

Forgot table configs -- likely currently malformed.

542b7cb

Added soon to be deleted microlab table

167acb0

docs update

15815f3

Added partial event configs for all tables.

bda16e8

Revised main script

26a386b

Fixed a variety of lint errors

e900096

Adjusted a tiny thing in the yaml

2f92036

Updated scripts to have help messages and to error if any internal pi…

c482a78

…ece errors.

Every column in the raw files should apparently be lowercase... also …

e80be1f

…other typos

Fixing a typo in config for diagnosis

9ad4b92

Fixed numerous typos and issues. Makes it through much of the files n…

9ced80f

…ow in the pre-MEDS stage

Linted

1168641

mmcdermott added 23 commits June 1, 2024 17:03

Corrected more typos

39cf464

Working most of the way through. Some error about vitalsaperiodic and…

74a8624

… floats vs. ints occurring during the event conversion currently, though

Incorporating fixes from #8 -- thanks @prenc!

f979ea4

Merge branch 'multirun' into eICU

afe875b

Merge branch 'multirun' into docstrings

b117b51

Make log dir stage dependent

21dfc19

Made submitit launcher script work

d9501a7

Added singleton sbatch script

6878bf2

Adding inits to make tests pass despite shared 'pre_MEDS.py' name

dced00b

Make it always retype numerical values

7d74d60

typo fix

3ec1436

Undoing recent changes as they don't help

1169cc9

Merged with main

c2d1444

Use diagonal relaxed to combine the event subshards

637e4bd

Merge branch 'eICU' of github.com:mmcdermott/MEDS_polars_functions in…

694e166

…to eICU

fixed error in joint script help message for eICU. should apply to MI…

eb94a1d

…MIC as well.

Fixed up sbatch script

0af21c7

Allowing for skipping the unique-by in the merge stage.

5cebbfa

Added a note to eICU example

f48ddb7

Updated scripts and added note to README.md for eICU

e152a17

Merged

b54bc0e

Updated some docstrings

f741555

Merge pull request #7 from mmcdermott/docstrings

1423051

Updated configs and added a resolver to get informative help messages from either the config or the script docstrings.

mmcdermott merged commit fd56b71 into main Jun 11, 2024

mmcdermott deleted the multirun branch June 11, 2024 13:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multirun capabilities and other improvements #10

Multirun capabilities and other improvements #10

mmcdermott commented Jun 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 11, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

Documentation and Community

Multirun capabilities and other improvements #10

Multirun capabilities and other improvements #10

Conversation

mmcdermott commented Jun 11, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Jun 11, 2024 • edited Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

mmcdermott commented Jun 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 11, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)