-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Adds AUMCdb example #203
base: dev
Are you sure you want to change the base?
Conversation
Update AUMC with new meds transform
Release Candidate 0.0.7
Merge new changes meds transforms
WalkthroughThe changes introduce several new configuration files, scripts, and a README for the AUMC Example and MIMIC-IV Example projects. These updates include detailed instructions for dataset extraction, structured schemas for patient data management, and enhancements for data processing workflows. The scripts facilitate the conversion of raw data into structured formats, incorporating error handling and parallel processing capabilities. Additionally, modifications to existing files clarify directory paths and improve user guidance on handling data. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Script
participant DataProcessor
participant Config
User->>Script: Run data processing command
Script->>Config: Load configurations
Script->>DataProcessor: Process raw data
DataProcessor->>DataProcessor: Convert to pre-MEDS format
DataProcessor->>DataProcessor: Shard and merge data
DataProcessor->>Script: Return processed data
Script->>User: Output results
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Outside diff range and nitpick comments (11)
AUMC_Example/configs/event_configs.yaml (6)
3-13
: LGTM, but please fix the trailing spaces.The patient structure is correctly defined. However, please remove the trailing spaces at lines 4 and 10 to adhere to the formatting conventions.
Apply this diff to fix the trailing spaces:
- dob: + dob: code: "DOB" time: col(dateofbirth) gender: code: ["GENDER", "col(gender)"] time: null - dod: + dod: code: "DEATH" time: col(dateofdeath)Tools
yamllint
[error] 4-4: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
14-39
: LGTM, but please fix the trailing spaces.The admissions structure is correctly defined. However, please remove the trailing spaces at lines 24 and 34 to adhere to the formatting conventions.
Apply this diff to fix the trailing spaces:
code: - "ICU_DISCHARGE" - col(destination) time: col(dischargedattime) weight: code: - "WEIGHT_AT_ADMISSION" - col(weightsource) - col(weightgroup) time: col(admittedattime) - height: + height: code: - "HEIGHT_AT_ADMISSION" - col(heightsource) - col(heightgroup) time: col(admittedattime)Tools
yamllint
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 34-34: trailing spaces
(trailing-spaces)
50-58
: LGTM, but please fix the trailing space.The listitems structure is correctly defined. However, please remove the trailing space at line 58 to adhere to the formatting conventions.
Apply this diff to fix the trailing space:
time: col(measuredattime) - +Tools
yamllint
[error] 58-58: trailing spaces
(trailing-spaces)
68-75
: LGTM, but please fix the trailing space.The procedureorderitems structure is correctly defined. However, please remove the trailing space at line 69 to adhere to the formatting conventions.
Apply this diff to fix the trailing space:
- event: + event: code: - PROCEDURE - col(ordercategoryname) - col(item) time: col(registeredattime)Tools
yamllint
[error] 69-69: trailing spaces
(trailing-spaces)
76-88
: LGTM, but please fix the trailing spaces.The processitems structure is correctly defined. However, please remove the trailing spaces at lines 77 and 84 to adhere to the formatting conventions.
Apply this diff to fix the trailing spaces:
- start: + start: code: - PROCESS - START - col(item) time: col(starttime) - end: + end: code: - PROCESS - END - col(item) time: col(stoptime)Tools
yamllint
[error] 77-77: trailing spaces
(trailing-spaces)
[error] 84-84: trailing spaces
(trailing-spaces)
90-123
: LGTM, but please fix the trailing spaces.The drugitems structure is correctly defined. However, please remove the trailing spaces at lines 90, 91, 99, and 108 to adhere to the formatting conventions.
Apply this diff to fix the trailing spaces:
-drugitems: - start: +drugitems: + start: code: - DRUG - START - col(ordercategory) - col(item) - col(action) time: col(starttime) - rate: + rate: code: - DRUG - RATE - col(ordercategory) - col(item) - col(rateunit) time: col(starttime) numeric_value: col(rate) - dose: + dose: code: - DRUG - DOSE - col(ordercategory) - col(item) - col(doseunit) time: col(starttime) numeric_value: col(dose) end: code: - DRUG - END - col(ordercategory) - col(item) time: col(stoptime)Tools
yamllint
[error] 90-90: trailing spaces
(trailing-spaces)
[error] 91-91: trailing spaces
(trailing-spaces)
[error] 99-99: trailing spaces
(trailing-spaces)
[error] 108-108: trailing spaces
(trailing-spaces)
AUMC_Example/run.sh (1)
73-74
: Reminder: Address the TODO comment.The TODO comment indicates that wget blocks need to be added once testing is validated. Please ensure that this functionality is implemented and the TODO comment is resolved before considering the script as complete.
Do you want me to open a GitHub issue to track this task?
AUMC_Example/joint_script.sh (1)
1-102
: Suggestions for improvement:
Consider using absolute paths or validating the existence of input files and directories to make the script more robust. Using relative paths assumes that the referenced files and directories exist in the expected locations, which could lead to errors if the script is run from a different directory or if the file structure changes.
Ensure that the external scripts and configurations referenced by this script are well-documented and maintained. The script's functionality depends on those files, so any changes or issues there could affect the script's behavior.
Consider providing a default value or a way to automatically determine the optimal number of parallel workers based on available system resources. While passing the number of workers as an argument provides flexibility, it also requires the user to specify an appropriate value. Automatically determining the optimal value could make the script easier to use and more efficient.
AUMC_Example/README.md (2)
115-123
: Create GitHub issues for open questions.The section raises important questions about handling data times and merging death times from different tables. These concerns require further investigation and resolution. Consider creating GitHub issues to track these open questions and facilitate discussion and implementation of appropriate solutions.
Do you want me to open GitHub issues for these questions?
Tools
LanguageTool
[uncategorized] ~116-~116: Possible missing comma found.
Context: ..._. How should those be slotted into the timeline which is otherwise stored at the _datet...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~121-~121: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...athtimes between the hosp table and the patients table? 2. How to handle the dob nonsens...(AI_HYDRA_LEO_APOSTROPHE_S_XS)
129-134
: Provide more details or examples.The section mentions the possibility of performing additional pre-MEDS processing but does not provide specific details or examples. To give users a better understanding of the potential future work, consider adding some concrete examples or a high-level description of the types of pre-MEDS processing steps that could be implemented.
AUMC_Example/pre_MEDS.py (1)
55-63
: Consider improving the date of birth estimation logic.The current logic assumes that the patient was born at the midpoint of the year, which may not be accurate. While this assumption is reasonable given the limited information, it's worth exploring if there are better ways to estimate the date of birth for patients with multiple admissions.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- AUMC_Example/README.md (1 hunks)
- AUMC_Example/configs/event_configs.yaml (1 hunks)
- AUMC_Example/configs/pre_MEDS.yaml (1 hunks)
- AUMC_Example/configs/table_preprocessors.yaml (1 hunks)
- AUMC_Example/joint_script.sh (1 hunks)
- AUMC_Example/local_parallelism_runner.yaml (1 hunks)
- AUMC_Example/pre_MEDS.py (1 hunks)
- AUMC_Example/run.sh (1 hunks)
- AUMC_Example/slurm_runner.yaml (1 hunks)
- MIMIC-IV_Example/README.md (3 hunks)
- MIMIC-IV_Example/run.sh (1 hunks)
Additional context used
yamllint
AUMC_Example/configs/event_configs.yaml
[error] 4-4: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 34-34: trailing spaces
(trailing-spaces)
[error] 58-58: trailing spaces
(trailing-spaces)
[error] 69-69: trailing spaces
(trailing-spaces)
[error] 77-77: trailing spaces
(trailing-spaces)
[error] 84-84: trailing spaces
(trailing-spaces)
[error] 90-90: trailing spaces
(trailing-spaces)
[error] 91-91: trailing spaces
(trailing-spaces)
[error] 99-99: trailing spaces
(trailing-spaces)
[error] 108-108: trailing spaces
(trailing-spaces)
AUMC_Example/configs/table_preprocessors.yaml
[error] 5-5: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
[error] 20-20: trailing spaces
(trailing-spaces)
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 42-42: trailing spaces
(trailing-spaces)
[error] 56-56: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
[error] 78-78: trailing spaces
(trailing-spaces)
[error] 95-95: trailing spaces
(trailing-spaces)
[error] 112-112: trailing spaces
(trailing-spaces)
[error] 116-116: no new line character at the end of file
(new-line-at-end-of-file)
LanguageTool
AUMC_Example/README.md
[style] ~28-~28: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... chmod +x pre_MEDS.py cd ..If you want to install locally, use:
bash git clon...(REP_WANT_TO_VB)
[uncategorized] ~65-~65: Possible missing comma found.
Context: ...original files will be written in a new directory which will be used as the input to the ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~83-~83: Possible missing comma found.
Context: ...ctory we'll denote as$AUMC_MEDS_DIR
. Note this is a different directory than the ...(AI_HYDRA_LEO_MISSING_COMMA)
[style] ~114-~114: The phrase ‘lots of’ might be wordy and overused. Consider using an alternative.
Context: ...including: INSERT TABLES IGNORED HERE Lots of questions remain about how to appropria...(A_LOT_OF)
[uncategorized] ~116-~116: Possible missing comma found.
Context: ..._. How should those be slotted into the timeline which is otherwise stored at the _datet...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~121-~121: It seems likely that a singular genitive (’s) apostrophe is missing.
Context: ...athtimes between the hosp table and the patients table? 2. How to handle the dob nonsens...(AI_HYDRA_LEO_APOSTROPHE_S_XS)
[uncategorized] ~126-~126: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Notes Note: If you use the slurm system and you launch the hydra submitit jobs from...(COMMA_COMPOUND_SENTENCE)
Markdownlint
AUMC_Example/README.md
109-109: Punctuation: ':'
Trailing punctuation in heading(MD026, no-trailing-punctuation)
3-3: null
Bare URL used(MD034, no-bare-urls)
40-40: null
Bare URL used(MD034, no-bare-urls)
Ruff
AUMC_Example/pre_MEDS.py
11-11:
gzip
imported but unusedRemove unused import:
gzip
(F401)
105-109: f-string used as docstring. Python will interpret this as a joined string, rather than a docstring.
(B021)
Additional comments not posted (41)
AUMC_Example/local_parallelism_runner.yaml (1)
1-3
: Approve the parallel processing configuration.The introduction of the
parallelize
section withn_workers
andlauncher
parameters is a valuable addition to enable parallel processing capabilities. Here are some key points:
The dynamic
n_workers
configuration, retrieved from theN_WORKERS
environment variable, allows for flexibility in adjusting the level of parallelism based on the environment. This is particularly useful when running the application in different environments with varying resource availability.The explicit use of the Joblib library, specified by the
launcher
parameter, is a reliable choice for managing parallel execution. Joblib is a well-established library that provides efficient and easy-to-use tools for parallel processing in Python.By leveraging parallel processing, the application can significantly speed up tasks that are suitable for concurrent execution. This can lead to improved performance and faster completion times for computationally intensive operations.
However, it's important to consider the following:
Ensure that the tasks being parallelized are independent and can be safely executed concurrently without introducing race conditions or data inconsistencies.
Monitor the resource utilization, especially when using a high number of worker processes, to avoid overloading the system and ensure optimal performance.
Consider providing documentation or guidelines on setting the
N_WORKERS
environment variable to help users determine an appropriate value based on their system's capabilities and the specific requirements of the application.Overall, the parallel processing configuration looks good and can provide significant benefits in terms of performance and efficiency.
AUMC_Example/configs/pre_MEDS.yaml (1)
1-11
: Excellent configuration setup!The YAML configuration file demonstrates several best practices:
- Using environment variables for
input_dir
andcohort_dir
allows for flexibility and portability across different environments.- Dynamically generating the job name based on the current timestamp ensures unique identification for each execution instance, which is beneficial for tracking and debugging purposes.
- Specifying structured logging directories enhances the organization and traceability of logs.
- Leveraging Hydra's features for job and run management can simplify the overall application setup.
This configuration provides a clean and structured approach to manage input/output directories and logging while promoting flexibility and ease of monitoring.
AUMC_Example/slurm_runner.yaml (7)
5-12
: LGTM!The configuration parameters for the
shard_events
task seem appropriate. The allocated resources should be sufficient for most sharding operations, and the use of the "short" partition aligns with the expected task duration.
13-21
: LGTM!The configuration parameters for the
split_and_shard_subjects
task seem appropriate. The allocated resources should be sufficient for most splitting and sharding operations on subjects, and the use of the "short" partition aligns with the expected task duration.
22-29
: LGTM!The configuration parameters for the
convert_to_sharded_events
task seem appropriate. The allocated resources should be sufficient for most conversion operations, and the use of the "short" partition aligns with the expected task duration.
30-37
: LGTM!The configuration parameters for the
merge_to_MEDS_cohort
task seem appropriate. The allocated resources, particularly the high memory allocation, should be sufficient for most merging operations, and the use of the "short" partition aligns with the expected task duration.
38-45
: LGTM!The configuration parameters for the
extract_code_metadata
task seem appropriate. The allocated resources should be sufficient for most metadata extraction operations, and the use of the "short" partition aligns with the expected task duration.
46-54
: LGTM!The configuration parameters for the
finalize_MEDS_metadata
task seem appropriate. The allocated resources should be sufficient for most metadata finalization operations, and the use of the "short" partition aligns with the expected task duration. The lower resource allocation compared to other tasks suggests that this task is less computationally intensive.
55-61
: LGTM!The configuration parameters for the
finalize_MEDS_data
task seem appropriate. The allocated resources, particularly the high memory allocation, should be sufficient for most data finalization operations, and the use of the "short" partition aligns with the expected task duration.AUMC_Example/configs/event_configs.yaml (3)
1-2
: LGTM!The patient ID column is correctly defined.
41-49
: LGTM!The numericitems structure is correctly defined.
59-67
: LGTM!The freetextitems structure is correctly defined.
AUMC_Example/configs/table_preprocessors.yaml (7)
1-17
: LGTM!The
admissions
configuration is well-structured and includes all the necessary fields for processing admission data.Tools
yamllint
[error] 5-5: trailing spaces
(trailing-spaces)
[error] 10-10: trailing spaces
(trailing-spaces)
19-35
: Configuration looks good!The
numericitems
configuration includes all the necessary fields and follows the expected structure. Thewarning_items
field raises a valid question about handling theregisteredat
andupdatedat
fields, which should be addressed.Tools
yamllint
[error] 20-20: trailing spaces
(trailing-spaces)
[error] 24-24: trailing spaces
(trailing-spaces)
37-53
: Configuration looks good!The
listitems
configuration follows the same structure asnumericitems
and includes all the necessary fields. Thewarning_items
field raises the same question about handling theregisteredat
andupdatedat
fields, which should be addressed consistently across configurations.Tools
yamllint
[error] 38-38: trailing spaces
(trailing-spaces)
[error] 42-42: trailing spaces
(trailing-spaces)
55-72
: Configuration looks good!The
freetextitems
configuration follows the same structure asnumericitems
andlistitems
, with the addition of acomment
field inoutput_data_cols
. Thewarning_items
field raises the same question about handling theregisteredat
andupdatedat
fields, which should be addressed consistently across configurations.Tools
yamllint
[error] 56-56: trailing spaces
(trailing-spaces)
[error] 60-60: trailing spaces
(trailing-spaces)
74-97
: Configuration looks good, but please address the warning items.The
drugitems
configuration follows the expected structure and includes additional drug-related fields inoutput_data_cols
. However, thewarning_items
field raises two important questions:
- Ignoring several flags may be a mistake and should be carefully reviewed.
- The timing of administered dose recording should be clarified to ensure accurate data processing.
Please address these warning items to ensure the configuration is complete and accurate.
Tools
yamllint
[error] 78-78: trailing spaces
(trailing-spaces)
[error] 95-95: trailing spaces
(trailing-spaces)
99-107
: LGTM!The
procedureorderitems
configuration follows the expected structure and includes all the necessary fields for processing procedure order data.
108-116
: Configuration looks good!The
processitems
configuration follows the expected structure and includes the necessary fields for processing process item data.Tools
yamllint
[error] 112-112: trailing spaces
(trailing-spaces)
[error] 116-116: no new line character at the end of file
(new-line-at-end-of-file)
MIMIC-IV_Example/run.sh (1)
38-42
: LGTM!The validation check for the
do_unzip
flag in the positional arguments is a good addition. It improves the robustness of the script by ensuring that the script is executed with the correct parameters and prevents incorrect usage related to thedo_unzip
flag.AUMC_Example/run.sh (4)
1-4
: LGTM!The shebang line and the
set -e
command are correctly used to ensure the script fails if any internal command fails. This is a good practice to prevent the script from continuing execution if any command encounters an error.
6-22
: LGTM!The
display_help
function provides clear and well-formatted usage instructions for the script. It includes details about the required and optional arguments, making it easy for users to understand how to use the script correctly.
24-25
: LGTM!Unsetting the
SLURM_CPU_BIND
environment variable is a good practice when running the script on a Slurm interactive node with Slurm parallelism. This ensures that the script can run correctly in such environments without encountering issues related to CPU binding.
27-105
: LGTM!The code segment for handling command-line arguments and setting up the environment is implemented correctly. It follows a logical flow and provides clear error messages when required arguments are missing. The handling of the optional
do_unzip
flag is done properly, with appropriate error handling for invalid values.The file paths for configuration files and the pre-MEDS script are set using absolute paths, which is a good practice. Exporting the environment variables separately ensures that any errors during assignment are caught.
The unzipping of
.csv.gz
files is performed conditionally based on thedo_unzip
flag, and appropriate messages are displayed based on the presence of files to unzip. The pre-MEDS conversion is executed using the specified Python script and input/output directories.Setting the
N_WORKERS
environment variable to 1 if it is not already set helps avoid issues with the runners. Finally, the extraction pipeline is run using the specified configuration file.Overall, the code segment is well-structured and follows best practices for handling command-line arguments and setting up the environment.
AUMC_Example/joint_script.sh (4)
1-22
: LGTM!The script header and help function look good:
- Using
set -e
is a good practice to fail fast on any error.- The
display_help
function provides clear usage instructions.- The help message includes all the necessary details about the script's purpose, arguments, and options.
24-41
: LGTM!The argument validation logic looks good:
- Checking for the help flag and displaying the help message is a common pattern.
- Validating the number of arguments ensures that the script is invoked correctly.
- Assigning the arguments to variables improves code readability and maintainability.
42-49
: LGTM!The memory optimization note is helpful:
- It provides guidance for users processing large datasets like AUMCdb.
- The suggested optimizations help keep the memory burden reasonable.
- Including example parameters makes it easier for users to apply the optimizations.
51-102
: LGTM!The data processing stages look good:
- The script follows a logical sequence of steps to process the data.
- Using separate commands or scripts for each stage makes the code modular and maintainable.
- Running stages in parallel with multiple workers can significantly speed up processing.
- Using the Hydra framework for job management simplifies the configuration of parallel stages.
MIMIC-IV_Example/README.md (5)
21-21
: Improved clarity in environment variable description.The updated description for
MIMICIV_PRE_MEDS_DIR
provides a clearer understanding of its purpose, distinguishing it from the directory for raw data.
22-22
: Improved clarity in environment variable description.The updated description for
MIMICIV_MEDS_COHORT_DIR
provides a clearer understanding of its purpose, specifying that it stores the final MEDS data.
49-49
: Improved correctness of metadata file download instructions.The addition of the
cd
command ensures that the metadata files are downloaded to the correct directory (MIMICIV_RAW_DIR
), improving the clarity and correctness of the instructions.
68-68
: Updated command to use the correct environment variable.The modification to the MEDS ETL command ensures that it uses the
MIMICIV_MEDS_COHORT_DIR
environment variable, which aligns with the clarified purpose of storing final MEDS data.
70-72
: Helpful note on memory usage and environment variable settings.The added note provides valuable information to users about potential memory usage issues during the ETL process. The suggestion to reduce the shard size in the configuration file offers a practical way to manage memory better. Additionally, the emphasis on checking environment variable settings helps ensure a smooth ETL process.
AUMC_Example/README.md (4)
7-36
: LGTM!The installation instructions are clear, concise, and cover both PyPI and local cloning methods. The inclusion of conda environment setup and package installation steps ensures a complete setup for users.
Tools
LanguageTool
[style] ~28-~28: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ... chmod +x pre_MEDS.py cd ..If you want to install locally, use:
bash git clon...(REP_WANT_TO_VB)
38-41
: LGTM!The download instructions are clear and provide the necessary information for users to obtain the AUMC dataset. The mention of requiring raw
.csv
files and introducing the$AUMC_RAW_DIR
variable adds clarity to the process.Tools
Markdownlint
40-40: null
Bare URL used(MD034, no-bare-urls)
42-107
: LGTM!The instructions for running the MEDS ETL process are well-structured and informative. The breakdown into data preparation and MEDS extraction sub-steps provides clarity. The mention of different parallelism options and the expected time and resource requirements is helpful for users.
Tools
LanguageTool
[uncategorized] ~65-~65: Possible missing comma found.
Context: ...original files will be written in a new directory which will be used as the input to the ...(AI_HYDRA_LEO_MISSING_COMMA)
[uncategorized] ~83-~83: Possible missing comma found.
Context: ...ctory we'll denote as$AUMC_MEDS_DIR
. Note this is a different directory than the ...(AI_HYDRA_LEO_MISSING_COMMA)
126-127
: LGTM!The note about running
unset SLURM_CPU_BIND
when launching hydra submitit jobs from an interactive slurm node is a helpful tip for users who may encounter errors in that scenario.Tools
LanguageTool
[uncategorized] ~126-~126: Use a comma before ‘and’ if it connects two independent clauses (unless they are closely connected and short).
Context: ...Notes Note: If you use the slurm system and you launch the hydra submitit jobs from...(COMMA_COMPOUND_SENTENCE)
AUMC_Example/pre_MEDS.py (4)
26-36
: LGTM!The function correctly loads the raw AUMCdb file into a Polars DataFrame using
pl.scan_csv
. The largeinfer_schema_length
value andutf8-lossy
encoding ensure that the schema is inferred correctly and encoding errors are handled gracefully.
67-73
: LGTM!The filtering and column selection logic is correct. It keeps only the first admission for each patient and selects the relevant columns for the patient and admission DataFrames.
76-129
: LGTM!The function correctly returns a function that joins a DataFrame to the patient table and adds pseudotimes. It handles various input types, validates the arguments, and selects the relevant columns for the output DataFrame.
Tools
Ruff
105-109: f-string used as docstring. Python will interpret this as a joined string, rather than a docstring.
(B021)
132-207
: LGTM!The main function correctly orchestrates the overall workflow. It loads the preprocessor configurations, processes the patient table, and iterates through the raw cohort files to apply the appropriate preprocessor functions. The output DataFrames are written to Parquet files, and the function handles skipping already processed or unsupported files.
- "starttime" | ||
- "stoptime" | ||
output_data_cols: | ||
- "item" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a new line at the end of the file.
The static analysis tool has identified a missing new line at the end of the file. Please add one to adhere to formatting best practices.
Tools
yamllint
[error] 116-116: no new line character at the end of file
(new-line-at-end-of-file)
- "measuredat" | ||
- "registeredat" | ||
- "updatedat" | ||
pseudotime_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- pseudotime_col:
+ pseudotime_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pseudotime_col: | |
pseudotime_col: |
Tools
yamllint
[error] 60-60: trailing spaces
(trailing-spaces)
- "specialty" | ||
|
||
numericitems: | ||
offset_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- offset_col:
+ offset_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
offset_col: | |
offset_col: |
Tools
yamllint
[error] 20-20: trailing spaces
(trailing-spaces)
offset_col: | ||
- "start" | ||
- "stop" | ||
pseudotime_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- pseudotime_col:
+ pseudotime_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pseudotime_col: | |
pseudotime_col: |
Tools
yamllint
[error] 78-78: trailing spaces
(trailing-spaces)
offset_col: | ||
- "start" | ||
- "stop" | ||
pseudotime_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- pseudotime_col:
+ pseudotime_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pseudotime_col: | |
pseudotime_col: |
Tools
yamllint
[error] 112-112: trailing spaces
(trailing-spaces)
- "measuredat" | ||
- "registeredat" | ||
- "updatedat" | ||
pseudotime_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- pseudotime_col:
+ pseudotime_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
pseudotime_col: | |
pseudotime_col: |
Tools
yamllint
[error] 42-42: trailing spaces
(trailing-spaces)
- "How should we deal with `registeredat` and `updatedat`?" | ||
|
||
listitems: | ||
offset_col: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove trailing spaces.
The static analysis tool has identified trailing spaces on this line. Please remove them to adhere to formatting best practices.
- offset_col:
+ offset_col:
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
offset_col: | |
offset_col: |
Tools
yamllint
[error] 38-38: trailing spaces
(trailing-spaces)
Currently, some tables are ignored, including: | ||
|
||
INSERT TABLES IGNORED HERE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the placeholder for ignored tables.
The section mentions that some tables are currently ignored but does not provide the specific table names. Please update the placeholder INSERT TABLES IGNORED HERE
with the actual names of the ignored tables to provide clarity to users.
|
||
root = rootutils.setup_root(__file__, dotenv=True, pythonpath=True, cwd=True) | ||
|
||
import gzip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The gzip
module is imported but not used in the script. Please remove the unused import to keep the code clean.
-import gzip
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import gzip |
Tools
Ruff
11-11:
gzip
imported but unusedRemove unused import:
gzip
(F401)
f"""Takes the {table_name} table and converts it to a form that includes pseudo-timestamps. | ||
|
||
The output of this process is ultimately converted to events via the `{table_name}` key in the | ||
`configs/event_configs.yaml` file. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the docstring for the returned function.
The docstring for the returned function is an f-string, which may not be interpreted correctly as a docstring by Python. Please use a regular docstring instead.
- f"""Takes the {table_name} table and converts it to a form that includes pseudo-timestamps.
-
- The output of this process is ultimately converted to events via the `{table_name}` key in the
- `configs/event_configs.yaml` file.
- """
+ """Takes the {table_name} table and converts it to a form that includes pseudo-timestamps.
+
+ The output of this process is ultimately converted to events via the `{table_name}` key in the
+ `configs/event_configs.yaml` file.
+ """
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
f"""Takes the {table_name} table and converts it to a form that includes pseudo-timestamps. | |
The output of this process is ultimately converted to events via the `{table_name}` key in the | |
`configs/event_configs.yaml` file. | |
""" | |
"""Takes the {table_name} table and converts it to a form that includes pseudo-timestamps. | |
The output of this process is ultimately converted to events via the `{table_name}` key in the | |
`configs/event_configs.yaml` file. | |
""" |
Tools
Ruff
105-109: f-string used as docstring. Python will interpret this as a joined string, rather than a docstring.
(B021)
Additionally, some fixes for the MIMIC-IV example
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores