Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding details to run a MIMIC-IV example (and some other small fixes) #6

Merged
merged 17 commits into from
May 26, 2024

Conversation

mmcdermott
Copy link
Owner

@mmcdermott mmcdermott commented May 26, 2024

I'm still validating the outputs here, but the instructions in the pipeline run end to end for a MIMIC-IV example.

Summary by CodeRabbit

  • New Features

    • Added a comprehensive guide for extracting the MEDS dataset from MIMIC-IV.
    • Introduced configurations for various hospital-related events and preprocessing tasks.
    • Implemented new data wrangling scripts for pre-MEDS processing.
  • Enhancements

    • Added optional mimic dependency for extended functionality.
    • Improved logging for event extraction and processing scripts.
  • Bug Fixes

    • Corrected file suffix handling in event sharding scripts.
  • Documentation

    • Updated README with detailed instructions for MEDS extraction.
    • Added docstrings and comments for new functions and configurations.

Copy link
Contributor

coderabbitai bot commented May 26, 2024

Walkthrough

This update introduces a comprehensive guide and configurations for extracting a MEDS dataset from MIMIC-IV. It includes new configuration files for event settings and preprocessing tasks, a Python script for data wrangling, and logging enhancements in various scripts for better traceability. Additionally, optional dependencies were added to the pyproject.toml file.

Changes

Files/Paths Change Summary
MIMIC-IV_Example/README.md Added detailed guide on extracting a MEDS dataset from MIMIC-IV.
MIMIC-IV_Example/configs/event_configs.yaml Introduced configurations for hospital events, including patient ID column and event details.
MIMIC-IV_Example/configs/pre_MEDS.yaml Introduced configuration settings for preprocessing tasks, including directories and Hydra settings.
MIMIC-IV_Example/pre_MEDS.py Added script for pre-MEDS data wrangling, including functions for data loading, joining, and processing.
pyproject.toml Added mimic to optional dependencies.
scripts/extraction/convert_to_sharded_events.py Added logging for event extraction in the compute_fn function.
scripts/extraction/shard_events.py Updated import, function calls, and improved suffix handling in the scan_with_row_idx function.
src/MEDS_polars_functions/event_conversion.py Added logging statements for event extraction processes in extract_event and convert_to_events functions.
src/MEDS_polars_functions/utils.py Added get_shard_prefix function to extract shard prefix from file paths.

In the realm of data's endless stream,
A new guide emerges, a dream supreme.
Configs and scripts, all finely tuned,
For MIMIC-IV's secrets to be pruned.
Logs now whisper, in code they sing,
MEDS extraction, a wondrous thing.
🐇✨

Tip

New Features and Improvements

Review Settings

Introduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments.

AST-based Instructions

CodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation.

Community-driven AST-based Rules

We are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information.

New Static Analysis Tools

We are continually expanding our support for static analysis tools. We have added support for biome, hadolint, and ast-grep. Update the settings in your .coderabbit.yaml file or head over to the settings page to enable or disable the tools you want to use.

Tone Settings

Users can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:

  • Mr. T: "You must talk like Mr. T in all your code reviews. I pity the fool who doesn't!"
  • Pirate: "Arr, matey! Ye must talk like a pirate in all yer code reviews. Yarrr!"
  • Snarky: "You must be snarky in all your code reviews. Snark, snark, snark!"

Revamped Settings Page

We have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to .coderabbit.yaml will be reflected in the settings page, and vice versa.

Miscellaneous

  • Turn off free summarization: You can switch off free summarization of PRs opened by users not on a paid plan using the enable_free_tier setting.
  • Knowledge-base scope: You can now set the scope of the knowledge base to either the repository (local) or the organization (global) level using the knowledge_base setting. In addition, you can specify Jira project keys and Linear team keys to limit the knowledge base scope for those integrations.
  • High-level summary placement: You can now customize the location of the high-level summary in the PR description using the high_level_summary_placeholder setting (default @coderabbitai summary).
  • Revamped request changes workflow: You can now configure CodeRabbit to auto-approve or request changes on PRs based on the review feedback using the request_changes_workflow setting.

Note

Free review on us!

CodeRabbit is offering free reviews until Mon May 27 2024 to showcase some of the refinements we've made.

Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to full the review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 86ff60a and 04626d2.
Files selected for processing (9)
  • MIMIC-IV_Example/README.md (1 hunks)
  • MIMIC-IV_Example/configs/event_configs.yaml (1 hunks)
  • MIMIC-IV_Example/configs/pre_MEDS.yaml (1 hunks)
  • MIMIC-IV_Example/pre_MEDS.py (1 hunks)
  • pyproject.toml (1 hunks)
  • scripts/extraction/convert_to_sharded_events.py (1 hunks)
  • scripts/extraction/shard_events.py (4 hunks)
  • src/MEDS_polars_functions/event_conversion.py (3 hunks)
  • src/MEDS_polars_functions/utils.py (1 hunks)
Files skipped from review due to trivial changes (2)
  • MIMIC-IV_Example/configs/pre_MEDS.yaml
  • pyproject.toml
Additional Context Used
LanguageTool (4)
MIMIC-IV_Example/README.md (4)

Near line 58: Possible missing comma found.
Context: ...original files will be written in a new directory which will be used as the input to the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA


Near line 73: Possible missing comma found.
Context: ...ry we'll denote as $MIMICIV_MEDS_DIR. Note this is a different directory than the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA


Near line 135: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...etimeevents5.icu/ingredientevents` Lots of questions remain about how to appropria...
Rule ID: A_LOT_OF


Near line 142: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...athtimes between the hosp table and the patients table? 2. How to handle the dob nonsens...
Rule ID: AI_HYDRA_LEO_APOSTROPHE_S_XS

Ruff (8)
MIMIC-IV_Example/pre_MEDS.py (8)

8-8: Module level import not at top of file


9-9: Module level import not at top of file


10-10: Module level import not at top of file


12-12: Module level import not at top of file


13-13: Module level import not at top of file


14-14: Module level import not at top of file


15-15: Module level import not at top of file


17-21: Module level import not at top of file

Markdownlint (1)
MIMIC-IV_Example/README.md (1)

41: null
Bare URL used

Additional comments not posted (10)
src/MEDS_polars_functions/utils.py (1)

24-45: The implementation of get_shard_prefix is correct and well-documented.

scripts/extraction/convert_to_sharded_events.py (1)

74-74: The added logging statement enhances the visibility of the event extraction process.

MIMIC-IV_Example/configs/event_configs.yaml (1)

1-215: The added event configurations are well-structured and necessary for defining the data extraction process.

MIMIC-IV_Example/pre_MEDS.py (3)

24-35: The implementation of load_raw_mimic_file is correct and efficiently handles large data files using lazy evaluation.


38-44: The implementation of add_discharge_time_by_hadm_id is correct and efficiently handles the joining of dataframes.


47-65: The implementation of fix_static_data is correct and effectively handles the transformation of static data.

MIMIC-IV_Example/README.md (1)

1-151: The updated documentation in the README.md is comprehensive and provides detailed instructions for the MEDS extraction process.

scripts/extraction/shard_events.py (1)

17-23: The changes in shard_events.py improve the flexibility and efficiency of the file handling process and enhance code reuse through the use of utility functions.

Also applies to: 42-276

src/MEDS_polars_functions/event_conversion.py (2)

280-280: Enhanced logging provides valuable debug information and clarity on the data processing steps.

Also applies to: 283-283, 298-298, 302-302, 306-306


506-506: The added logging statement enhances traceability of event processing, aiding in debugging and monitoring.

@mmcdermott mmcdermott merged commit e1b5da7 into main May 26, 2024
1 of 2 checks passed
@mmcdermott mmcdermott deleted the MIMIC_IV branch May 26, 2024 13:32
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Review Details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits Files that changed from the base of the PR and between 04626d2 and 7e72489.
Files selected for processing (2)
  • MIMIC-IV_Example/README.md (1 hunks)
  • pyproject.toml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
  • pyproject.toml
Additional Context Used
LanguageTool (3)
MIMIC-IV_Example/README.md (3)

Near line 76: Possible missing comma found.
Context: ...ry we'll denote as $MIMICIV_MEDS_DIR. Note this is a different directory than the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA


Near line 138: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...etimeevents5.icu/ingredientevents` Lots of questions remain about how to appropria...
Rule ID: A_LOT_OF


Near line 140: Possible missing comma found.
Context: ..._. How should those be slotted into the timeline which is otherwise stored at the _datet...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA

Markdownlint (1)
MIMIC-IV_Example/README.md (1)

44: null
Bare URL used


## Step 3: Run the MEDS extraction ETL

We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comma after "Note" for better readability.

- Note this is a different directory than the pre-MEDS directory
+ Note, this is a different directory than the pre-MEDS directory

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`.
We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`. Note, this is a different directory than the pre-MEDS directory

3. `hosp/services`
4. `icu/datetimeevents`
5. `icu/ingredientevents`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more formal tone in documentation.

- Lots of questions remain about how to appropriately handle timestamps of the data
+ Several questions remain about how to appropriately handle timestamps of the data

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
Several questions remain about how to appropriately handle timestamps of the data


## Step 1: Download MIMIC-IV

Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a more descriptive hyperlink text instead of a bare URL.

- Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on
+ Download the MIMIC-IV dataset from [PhysioNet MIMIC-IV](https://physionet.org/content/mimiciv/2.2/) following the instructions on

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on
Download the MIMIC-IV dataset from [PhysioNet MIMIC-IV](https://physionet.org/content/mimiciv/2.2/) following the instructions on

5. `icu/ingredientevents`

Lots of questions remain about how to appropriately handle timestamps of the data -- e.g., things like HCPCS
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comma after "timeline" for better readability.

- How should those be slotted into the timeline which is otherwise stored at the _datetime_ resolution?
+ How should those be slotted into the timeline, which is otherwise stored at the _datetime_ resolution?

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Suggested change
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the timeline, which is otherwise stored at the

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant