- Pre-processing code refining project in UniHPF
- NOTE: This repository requires
python>=3.9
andJava>=8
- NOTE: Since there is a performance issue related to
transformers
library, it is recommended to usetransformers==4.29.1
.
pip install numpy pandas tqdm treelib transformers==4.29.1 pyspark
main.py --ehr {eicu, mimiciii, mimiciv}
- It automatically download the corresponding dataset from physionet, but requires appropriate certification.
- You can also use the downloaded dataset with
--data {data path}
option - You can check sample implementation of pytorch
dataset
onsample_dataset.py