-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding details to run a MIMIC-IV example (and some other small fixes) #6
Conversation
WalkthroughThis update introduces a comprehensive guide and configurations for extracting a MEDS dataset from MIMIC-IV. It includes new configuration files for event settings and preprocessing tasks, a Python script for data wrangling, and logging enhancements in various scripts for better traceability. Additionally, optional dependencies were added to the Changes
Tip New Features and ImprovementsReview SettingsIntroduced new personality profiles for code reviews. Users can now select between "Chill" and "Assertive" review tones to tailor feedback styles according to their preferences. The "Assertive" profile posts more comments and nitpicks the code more aggressively, while the "Chill" profile is more relaxed and posts fewer comments. AST-based InstructionsCodeRabbit offers customizing reviews based on the Abstract Syntax Tree (AST) pattern matching. Read more about AST-based instructions in the documentation. Community-driven AST-based RulesWe are kicking off a community-driven initiative to create and share AST-based rules. Users can now contribute their AST-based rules to detect security vulnerabilities, code smells, and anti-patterns. Please see the ast-grep-essentials repository for more information. New Static Analysis ToolsWe are continually expanding our support for static analysis tools. We have added support for Tone SettingsUsers can now customize CodeRabbit to review code in the style of their favorite characters or personalities. Here are some of our favorite examples:
Revamped Settings PageWe have redesigned the settings page for a more intuitive layout, enabling users to find and adjust settings quickly. This change was long overdue; it not only improves the user experience but also allows our development team to add more settings in the future with ease. Going forward, the changes to Miscellaneous
Note Free review on us!CodeRabbit is offering free reviews until Mon May 27 2024 to showcase some of the refinements we've made. TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (9)
- MIMIC-IV_Example/README.md (1 hunks)
- MIMIC-IV_Example/configs/event_configs.yaml (1 hunks)
- MIMIC-IV_Example/configs/pre_MEDS.yaml (1 hunks)
- MIMIC-IV_Example/pre_MEDS.py (1 hunks)
- pyproject.toml (1 hunks)
- scripts/extraction/convert_to_sharded_events.py (1 hunks)
- scripts/extraction/shard_events.py (4 hunks)
- src/MEDS_polars_functions/event_conversion.py (3 hunks)
- src/MEDS_polars_functions/utils.py (1 hunks)
Files skipped from review due to trivial changes (2)
- MIMIC-IV_Example/configs/pre_MEDS.yaml
- pyproject.toml
Additional Context Used
LanguageTool (4)
MIMIC-IV_Example/README.md (4)
Near line 58: Possible missing comma found.
Context: ...original files will be written in a new directory which will be used as the input to the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA
Near line 73: Possible missing comma found.
Context: ...ry we'll denote as$MIMICIV_MEDS_DIR
. Note this is a different directory than the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA
Near line 135: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...etimeevents5.
icu/ingredientevents` Lots of questions remain about how to appropria...
Rule ID: A_LOT_OF
Near line 142: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...athtimes between the hosp table and the patients table? 2. How to handle the dob nonsens...
Rule ID: AI_HYDRA_LEO_APOSTROPHE_S_XS
Ruff (8)
MIMIC-IV_Example/pre_MEDS.py (8)
8-8: Module level import not at top of file
9-9: Module level import not at top of file
10-10: Module level import not at top of file
12-12: Module level import not at top of file
13-13: Module level import not at top of file
14-14: Module level import not at top of file
15-15: Module level import not at top of file
17-21: Module level import not at top of file
Markdownlint (1)
MIMIC-IV_Example/README.md (1)
41: null
Bare URL used
Additional comments not posted (10)
src/MEDS_polars_functions/utils.py (1)
24-45
: The implementation ofget_shard_prefix
is correct and well-documented.scripts/extraction/convert_to_sharded_events.py (1)
74-74
: The added logging statement enhances the visibility of the event extraction process.MIMIC-IV_Example/configs/event_configs.yaml (1)
1-215
: The added event configurations are well-structured and necessary for defining the data extraction process.MIMIC-IV_Example/pre_MEDS.py (3)
24-35
: The implementation ofload_raw_mimic_file
is correct and efficiently handles large data files using lazy evaluation.
38-44
: The implementation ofadd_discharge_time_by_hadm_id
is correct and efficiently handles the joining of dataframes.
47-65
: The implementation offix_static_data
is correct and effectively handles the transformation of static data.MIMIC-IV_Example/README.md (1)
1-151
: The updated documentation in the README.md is comprehensive and provides detailed instructions for the MEDS extraction process.scripts/extraction/shard_events.py (1)
17-23
: The changes inshard_events.py
improve the flexibility and efficiency of the file handling process and enhance code reuse through the use of utility functions.Also applies to: 42-276
src/MEDS_polars_functions/event_conversion.py (2)
280-280
: Enhanced logging provides valuable debug information and clarity on the data processing steps.Also applies to: 283-283, 298-298, 302-302, 306-306
506-506
: The added logging statement enhances traceability of event processing, aiding in debugging and monitoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Review Details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- MIMIC-IV_Example/README.md (1 hunks)
- pyproject.toml (1 hunks)
Files skipped from review as they are similar to previous changes (1)
- pyproject.toml
Additional Context Used
LanguageTool (3)
MIMIC-IV_Example/README.md (3)
Near line 76: Possible missing comma found.
Context: ...ry we'll denote as$MIMICIV_MEDS_DIR
. Note this is a different directory than the ...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA
Near line 138: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...etimeevents5.
icu/ingredientevents` Lots of questions remain about how to appropria...
Rule ID: A_LOT_OF
Near line 140: Possible missing comma found.
Context: ..._. How should those be slotted into the timeline which is otherwise stored at the _datet...
Rule ID: AI_HYDRA_LEO_MISSING_COMMA
Markdownlint (1)
MIMIC-IV_Example/README.md (1)
44: null
Bare URL used
|
||
## Step 3: Run the MEDS extraction ETL | ||
|
||
We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comma after "Note" for better readability.
- Note this is a different directory than the pre-MEDS directory
+ Note, this is a different directory than the pre-MEDS directory
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`. | |
We will assume you want to output the final MEDS dataset into a directory we'll denote as `$MIMICIV_MEDS_DIR`. Note, this is a different directory than the pre-MEDS directory |
3. `hosp/services` | ||
4. `icu/datetimeevents` | ||
5. `icu/ingredientevents` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more formal tone in documentation.
- Lots of questions remain about how to appropriately handle timestamps of the data
+ Several questions remain about how to appropriately handle timestamps of the data
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Several questions remain about how to appropriately handle timestamps of the data |
|
||
## Step 1: Download MIMIC-IV | ||
|
||
Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider using a more descriptive hyperlink text instead of a bare URL.
- Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on
+ Download the MIMIC-IV dataset from [PhysioNet MIMIC-IV](https://physionet.org/content/mimiciv/2.2/) following the instructions on
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
Download the MIMIC-IV dataset from https://physionet.org/content/mimiciv/2.2/ following the instructions on | |
Download the MIMIC-IV dataset from [PhysioNet MIMIC-IV](https://physionet.org/content/mimiciv/2.2/) following the instructions on |
5. `icu/ingredientevents` | ||
|
||
Lots of questions remain about how to appropriately handle timestamps of the data -- e.g., things like HCPCS | ||
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comma after "timeline" for better readability.
- How should those be slotted into the timeline which is otherwise stored at the _datetime_ resolution?
+ How should those be slotted into the timeline, which is otherwise stored at the _datetime_ resolution?
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the | |
events are stored at the level of the _date_, not the _datetime_. How should those be slotted into the timeline, which is otherwise stored at the |
I'm still validating the outputs here, but the instructions in the pipeline run end to end for a MIMIC-IV example.
Summary by CodeRabbit
New Features
Enhancements
mimic
dependency for extended functionality.Bug Fixes
Documentation