-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match MEDS label schema as per #72 #80
Conversation
WalkthroughThe recent update significantly enhances data handling capabilities within the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/aces/main.py (1 hunks)
Additional context used
GitHub Check: codecov/patch
src/aces/__main__.py
[warning] 50-51: src/aces/main.py#L50-L51
Added lines #L50 - L51 were not covered by tests
Additional comments not posted (1)
src/aces/__main__.py (1)
50-51
: LGTM! Add tests for the new conditional logic.The added conditional check ensures that the renaming operation is only performed if the "index_timestamp" column exists, which enhances robustness. However, the new logic is not covered by tests.
Would you like me to assist in generating the unit tests or open a GitHub issue to track this task?
Tools
GitHub Check: codecov/patch
[warning] 50-51: src/aces/main.py#L50-L51
Added lines #L50 - L51 were not covered by tests
@justin13601 We'll probably need to extend this to encompass other changes to match the That PR is now merged, so we can reliably match MEDS v0.3 at this point. It would be worth explicitly adding a pyproject.toml dependency on meds at version 0.3 (once it is pushed) and using that to either fully or partially (partially if we include an inclusive or exclusive flag on the prediction_time) validate the output schema as well. |
There is more that we need to change for this; I think the right strategy is to emulate the code I linked in the related issue to this PR. That will handle both column names and pyarrow types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- pyproject.toml (1 hunks)
- src/aces/main.py (3 hunks)
- src/aces/expand_shards.py (3 hunks)
Additional context used
GitHub Check: codecov/patch
src/aces/__main__.py
[warning] 85-91: src/aces/main.py#L85-L91
Added lines #L85 - L91 were not covered by tests
[warning] 93-94: src/aces/main.py#L93-L94
Added lines #L93 - L94 were not covered by tests
[warning] 96-100: src/aces/main.py#L96-L100
Added lines #L96 - L100 were not covered by tests
[warning] 102-102: src/aces/main.py#L102
Added line #L102 was not covered by tests
[warning] 138-142: src/aces/main.py#L138-L142
Added lines #L138 - L142 were not covered by tests
Additional comments not posted (2)
pyproject.toml (1)
25-25
: Dependency addition looks good.The addition of the
meds == 0.3
dependency aligns with the comments summary and supports the validation of the MEDS label schema.src/aces/expand_shards.py (1)
6-6
: Good use ofPath
for file handling.The use of
Path.glob
enhances the robustness and readability of the code for file discovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/aces/main.py (3 hunks)
- src/aces/expand_shards.py (3 hunks)
- src/aces/predicates.py (2 hunks)
Files skipped from review as they are similar to previous changes (1)
- src/aces/expand_shards.py
Additional context used
GitHub Check: codecov/patch
src/aces/__main__.py
[warning] 85-87: src/aces/main.py#L85-L87
Added lines #L85 - L87 were not covered by tests
[warning] 93-98: src/aces/main.py#L93-L98
Added lines #L93 - L98 were not covered by tests
[warning] 100-101: src/aces/main.py#L100-L101
Added lines #L100 - L101 were not covered by tests
[warning] 103-107: src/aces/main.py#L103-L107
Added lines #L103 - L107 were not covered by tests
[warning] 109-109: src/aces/main.py#L109
Added line #L109 was not covered by tests
[warning] 112-114: src/aces/main.py#L112-L114
Added lines #L112 - L114 were not covered by tests
[warning] 119-119: src/aces/main.py#L119
Added line #L119 was not covered by tests
[warning] 121-121: src/aces/main.py#L121
Added line #L121 was not covered by tests
[warning] 125-125: src/aces/main.py#L125
Added line #L125 was not covered by tests
[warning] 161-167: src/aces/main.py#L161-L167
Added lines #L161 - L167 were not covered by tests
Additional comments not posted (4)
src/aces/__main__.py (3)
22-32
: Constants for MEDS label types are well-defined.The constants
MEDS_LABEL_MANDATORY_TYPES
andMEDS_LABEL_OPTIONAL_TYPES
correctly specify the expected data types for MEDS label DataFrames.
35-125
: Function logic for schema validation is robust.The
get_and_validate_label_schema
function effectively ensures that the DataFrame conforms to the expected MEDS schema.Tools
GitHub Check: codecov/patch
[warning] 85-87: src/aces/main.py#L85-L87
Added lines #L85 - L87 were not covered by tests
[warning] 93-98: src/aces/main.py#L93-L98
Added lines #L93 - L98 were not covered by tests
[warning] 100-101: src/aces/main.py#L100-L101
Added lines #L100 - L101 were not covered by tests
[warning] 103-107: src/aces/main.py#L103-L107
Added lines #L103 - L107 were not covered by tests
[warning] 109-109: src/aces/main.py#L109
Added line #L109 was not covered by tests
[warning] 112-114: src/aces/main.py#L112-L114
Added lines #L112 - L114 were not covered by tests
[warning] 119-119: src/aces/main.py#L119
Added line #L119 was not covered by tests
[warning] 121-121: src/aces/main.py#L121
Added line #L121 was not covered by tests
[warning] 125-125: src/aces/main.py#L125
Added line #L125 was not covered by tests
160-167
: MEDS data processing logic inmain
is correct.The function correctly renames columns and validates the schema for MEDS data.
Tools
GitHub Check: codecov/patch
[warning] 161-167: src/aces/main.py#L161-L167
Added lines #L161 - L167 were not covered by testssrc/aces/predicates.py (1)
Line range hint
294-319
: Column renaming ingenerate_plain_predicates_from_meds
is consistent with MEDS schema.The renaming of columns from
"time"
to"timestamp"
and"patient_id"
to"subject_id"
aligns with the schema changes.
Summary by CodeRabbit
New Features
meds
library, enhancing project capabilities.expand_shards
function to utilize modern file handling techniques and provide clearer output formats.Bug Fixes