Skip to content

Commit

Permalink
Merge branch 'dev' into 141_better_loggers
Browse files Browse the repository at this point in the history
  • Loading branch information
mmcdermott committed Aug 28, 2024
2 parents 30c6065 + 8f06067 commit eadfdc4
Show file tree
Hide file tree
Showing 71 changed files with 2,743 additions and 1,362 deletions.
16 changes: 8 additions & 8 deletions MIMIC-IV_Example/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,10 +76,10 @@ This is a step in a few parts:
- the `hosp/diagnoses_icd` table with the `hosp/admissions` table to get the `dischtime` for each
`hadm_id`.
- the `hosp/drgcodes` table with the `hosp/admissions` table to get the `dischtime` for each `hadm_id`.
2. Convert the patient's static data to a more parseable form. This entails:
- Get the patient's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
2. Convert the subject's static data to a more parseable form. This entails:
- Get the subject's DOB in a format that is usable for MEDS, rather than the integral `anchor_year` and
`anchor_offset` fields.
- Merge the patient's `dod` with the `deathtime` from the `admissions` table.
- Merge the subject's `dod` with the `deathtime` from the `admissions` table.

After these steps, modified files or symlinks to the original files will be written in a new directory which
will be used as the input to the actual MEDS extraction ETL. We'll use `$MIMICIV_PREMEDS_DIR` to denote this
Expand Down Expand Up @@ -109,14 +109,14 @@ This is a step in 4 parts:
This step uses the `./scripts/extraction/shard_events.py` script. See `joint_script*.sh` for the expected
format of the command.

2. Extract and form the patient splits and sub-shards. The `./scripts/extraction/split_and_shard_patients.py`
2. Extract and form the subject splits and sub-shards. The `./scripts/extraction/split_and_shard_subjects.py`
script is used for this step. See `joint_script*.sh` for the expected format of the command.

3. Extract patient sub-shards and convert to MEDS events. The
3. Extract subject sub-shards and convert to MEDS events. The
`./scripts/extraction/convert_to_sharded_events.py` script is used for this step. See `joint_script*.sh` for
the expected format of the command.

4. Merge the MEDS events into a single file per patient sub-shard. The
4. Merge the MEDS events into a single file per subject sub-shard. The
`./scripts/extraction/merge_to_MEDS_cohort.py` script is used for this step. See `joint_script*.sh` for the
expected format of the command.

Expand All @@ -139,7 +139,7 @@ timeline which is otherwise stored at the _datetime_ resolution?

Other questions:

1. How to handle merging the deathtimes between the hosp table and the patients table?
1. How to handle merging the deathtimes between the hosp table and the subjects table?
2. How to handle the dob nonsense MIMIC has?

## Notes
Expand All @@ -153,4 +153,4 @@ may need to run `unset SLURM_CPU_BIND` in your terminal first to avoid errors.

If you wanted, some other processing could also be done here, such as:

1. Converting the patient's dynamically recorded race into a static, most commonly recorded race field.
1. Converting the subject's dynamically recorded race into a static, most commonly recorded race field.
14 changes: 7 additions & 7 deletions MIMIC-IV_Example/configs/event_configs.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
patient_id_col: subject_id
subject_id_col: subject_id
hosp/admissions:
ed_registration:
code: ED_REGISTRATION
Expand Down Expand Up @@ -27,7 +27,7 @@ hosp/admissions:
time: col(dischtime)
time_format: "%Y-%m-%d %H:%M:%S"
hadm_id: hadm_id
# We omit the death event here as it is joined to the data in the patients table in the pre-MEDS step.
# We omit the death event here as it is joined to the data in the subjects table in the pre-MEDS step.

hosp/diagnoses_icd:
diagnosis:
Expand Down Expand Up @@ -108,7 +108,7 @@ hosp/omr:
time: col(chartdate)
time_format: "%Y-%m-%d"

hosp/patients:
hosp/subjects:
gender:
code:
- GENDER
Expand Down Expand Up @@ -295,18 +295,18 @@ icu/inputevents:
description: ["omop_concept_name", "label"] # List of strings are columns to be collated
itemid: "itemid (omop_source_code)"
parent_codes: "{omop_vocabulary_id}/{omop_concept_code}"
patient_weight:
subject_weight:
code:
- PATIENT_WEIGHT_AT_INFUSION
- SUBJECT_WEIGHT_AT_INFUSION
- KG
time: col(starttime)
time_format: "%Y-%m-%d %H:%M:%S"
numeric_value: patientweight
numeric_value: subjectweight

icu/outputevents:
output:
code:
- PATIENT_FLUID_OUTPUT
- SUBJECT_FLUID_OUTPUT
- col(itemid)
- col(valueuom)
time: col(charttime)
Expand Down
8 changes: 4 additions & 4 deletions MIMIC-IV_Example/joint_script.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ function display_help() {
echo "Usage: $0 <MIMICIV_RAW_DIR> <MIMICIV_PREMEDS_DIR> <MIMICIV_MEDS_DIR> <N_PARALLEL_WORKERS>"
echo
echo "This script processes MIMIC-IV data through several steps, handling raw data conversion,"
echo "sharding events, splitting patients, converting to sharded events, and merging into a MEDS cohort."
echo "sharding events, splitting subjects, converting to sharded events, and merging into a MEDS cohort."
echo
echo "Arguments:"
echo " MIMICIV_RAW_DIR Directory containing raw MIMIC-IV data files."
Expand Down Expand Up @@ -88,11 +88,11 @@ MEDS_extract-shard_events \
etl_metadata.dataset_version="2.2" \
event_conversion_config_fp=./MIMIC-IV_Example/configs/event_configs.yaml "$@"

echo "Splitting patients in serial"
MEDS_extract-split_and_shard_patients \
echo "Splitting subjects in serial"
MEDS_extract-split_and_shard_subjects \
input_dir="$MIMICIV_PREMEDS_DIR" \
cohort_dir="$MIMICIV_MEDS_DIR" \
stage="split_and_shard_patients" \
stage="split_and_shard_subjects" \
etl_metadata.dataset_name="MIMIC-IV" \
etl_metadata.dataset_version="2.2" \
event_conversion_config_fp=./MIMIC-IV_Example/configs/event_configs.yaml "$@"
Expand Down
6 changes: 3 additions & 3 deletions MIMIC-IV_Example/joint_script_slurm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ function display_help() {
echo "Usage: $0 <MIMICIV_RAW_DIR> <MIMICIV_PREMEDS_DIR> <MIMICIV_MEDS_DIR> <N_PARALLEL_WORKERS>"
echo
echo "This script processes MIMIC-IV data through several steps, handling raw data conversion,"
echo "sharding events, splitting patients, converting to sharded events, and merging into a MEDS cohort."
echo "sharding events, splitting subjects, converting to sharded events, and merging into a MEDS cohort."
echo "This script uses slurm to process the data in parallel via the 'submitit' Hydra launcher."
echo
echo "Arguments:"
Expand Down Expand Up @@ -72,8 +72,8 @@ MEDS_extract-shard_events \
event_conversion_config_fp=./MIMIC-IV_Example/configs/event_configs.yaml \
stage=shard_events

echo "Splitting patients on one worker"
MEDS_extract-split_and_shard_patients \
echo "Splitting subjects on one worker"
MEDS_extract-split_and_shard_subjects \
--multirun \
worker="range(0,1)" \
hydra/launcher=submitit_slurm \
Expand Down
Loading

0 comments on commit eadfdc4

Please sign in to comment.