Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multirun capabilities and other improvements #10

Merged
merged 53 commits into from
Jun 11, 2024
Merged

Multirun capabilities and other improvements #10

merged 53 commits into from
Jun 11, 2024

Conversation

mmcdermott
Copy link
Owner

@mmcdermott mmcdermott commented Jun 11, 2024

Summary by CodeRabbit

  • Documentation

    • Enhanced installation instructions and detailed steps for running the MEDS extraction ETL process.
    • Added a README for the eICU-CRD v2.0 dataset extraction example.
  • New Features

    • Introduced a batch script for job execution with defined resource allocations.
    • Added integration tests to verify ETL pipeline correctness.
    • Added new configuration files for various pipeline stages and data processing.
  • Bug Fixes

    • Updated file handling and processing conditions to improve reliability.
  • Refactor

    • Restructured configuration files to define stages and parameters more clearly.
    • Adjusted extraction configuration keys and values for consistency.
  • Chores

    • Updated pre-commit hooks for code cleanup and unused import removal.

…ith the new hydra setup and custom resolvers
@mmcdermott mmcdermott merged commit fd56b71 into main Jun 11, 2024
@mmcdermott mmcdermott deleted the multirun branch June 11, 2024 13:07
Copy link
Contributor

coderabbitai bot commented Jun 11, 2024

Warning

Review failed

The pull request is closed.

Walkthrough

The changes primarily enhance the MEDS data extraction and preprocessing pipeline, adding new configurations, scripts, and instructions for handling MIMIC-IV and eICU datasets. Key updates include improved installation commands, parallel processing options, and detailed guidance for running ETL processes locally, in parallel, or over Slurm. Additionally, new configuration files and scripts streamline the data processing stages, ensuring efficient and accurate data handling.

Changes

File(s) Change Summary
.pre-commit-config.yaml Added arguments --in-place and --remove-all-unused-imports to the autoflake hook.
MIMIC-IV_Example/README.md Updated installation and MEDS extraction ETL process instructions, including local, parallel, and Slurm options.
MIMIC-IV_Example/joint_script.sh No changes to exported entities; script processes MIMIC-IV data through various steps.
MIMIC-IV_Example/joint_script_slurm.sh No changes to exported entities; script processes MIMIC-IV data using submitit Hydra launcher.
MIMIC-IV_Example/pre_MEDS.py Modified fix_static_data function, added conditional checks and file handling logic in main.
MIMIC-IV_Example/sbatch_joint_script.sh Introduced a batch script for running jobs with defined resource allocations and output configurations.
README.md Enhanced MEDS data extraction and preprocessing pipeline instructions, added integration tests, updated configuration files, and provided details on running the pipeline in parallel.
configs/extraction.yaml Restructured to define a pipeline for extracting raw MEDS events, added stages, and updated configurations.
configs/pipeline.yaml Introduced configurations for a MEDS pipeline ETL process, defining global IO settings and Hydra settings for job execution.
configs/preprocess.yaml Restructured pipeline stages, introduced new stages, and updated parameters.
eICU_Example/README.md Provided steps for extracting a MEDS dataset from eICU-CRD v2.0 dataset, outlining installation, data preparation, and ETL process.
eICU_Example/configs/event_configs.yaml Introduced structured configuration for various patient events and data points in eICU datasets.
eICU_Example/configs/pre_MEDS.yaml Introduced configuration settings for data processing pipeline, including Hydra configuration for job naming and logging.
eICU_Example/configs/table_preprocessors.yaml Introduced configurations for medical data tables, specifying offset and pseudotime columns, output data columns, and warning items.
eICU_Example/joint_script.sh No changes to exported entities; script automates processing of eICU data through various steps.
eICU_Example/joint_script_slurm.sh No changes to exported entities; script facilitates processing of eICU data using submitit Hydra launcher.
eICU_Example/pre_MEDS.py Added functions for pre-MEDS data wrangling for eICU datasets, utilizing Hydra for configuration management and logging.
tests/test_extraction.py Modified extraction configuration kwargs, updated directory paths and assertions, adjusted directory names, and error handling in test logic.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Script
    participant ETL_Process
    participant Slurm
    User->>Script: Initiate ETL Process
    Script->>ETL_Process: Start Local/Parallel Processing
    Script->>Slurm: Submit Job (if Slurm)
    Slurm->>ETL_Process: Execute ETL Stages
    ETL_Process->>Script: Process Data
    Script->>User: Completion Notification
Loading

Poem

In data's maze, we find our way,
With scripts and configs, night and day.
From MIMIC to eICU's embrace,
Extraction flows with newfound grace.
Parallel paths and Slurm's might,
Data's journey, a wondrous flight.
Through pipelines clear, our goals in sight.
🌟🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant