-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scripts for running all preprocessing steps #21
Comments
Hi I got a " Segmentation fault" error for "python -m preprocessing.preprocess --cachedir data --num_workers 4" |
First please make sure that you run the script in a separate conda environment with the packages in https://github.com/USC-Melady/Benchmarking_DL_MIMICIII/blob/master/install.sh installed. Could you provide the full log/output from the script? |
Hi Chuizheng, I do follow your instruction and the log is following:
Package Planenvironment location: /home/jid20004/anaconda3 added / updated specs: The following packages will be downloaded:
The following NEW packages will be INSTALLED: _anaconda_depends pkgs/main/linux-64::_anaconda_depends-2019.03-py36_0 The following packages will be UPDATED: cairo 1.14.12-h7636065_2 --> 1.14.12-h8948797_3 The following packages will be DOWNGRADED: anaconda 5.2.0-py36_3 --> custom-py36_1 Proceed ([y]/n)? y Downloading and Extracting Packages |
@jrdeng93 Could you do this in a separate conda env? I am not sure if there is any conflicting package in your conda create -n mimic4 "python<3.8"
conda activate mimic4
bash install.sh Also, did you notice any high memory consumption when running the script? If that is your case, you can try increasing the size of your swap memory. There could possibly be other reasons related to platform/hardware/software installed, and you may want to use some debugging tools to find out the reason. |
@mengcz13 hi,i ran the command 'python -m preprocessing.preprocess --cachedir data --num_workers 8' twenty-five days ago, |
We have the database installed on one server with 64GB RAM, but the RAM usage won't exceed 32GB. We run the processing script on another server with 256 GB RAM but I haven't monitored the peak usage. In which step do you get stuck? What is the peak RAM usage? |
@mengcz13 thanks for your reply. i can not figure out the step which I get stuck. Here are the log. 1_getItemIdList: Select all itemids from TABLE INPUTEVENTS, OUTPUTEVENTS, CHARTEVENTS, LABEVENTS, MICROBIOLOGYEVENTS, PRESCRIPTIONS. 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 808/808 [11:17<00:00, 1.19it/s] process USER PR NI VIRT RES SHR � %CPU %MEM TIME+ COMMAND |
@mengcz13 GOD! In fact, i run the script in the vscode terminal, as I said above, I press ctrl + C in this terminal , it threw |
Seems it is because the RAM is used up. Maybe the database program or something else consumes too much RAM? A workaround is that I can send the processed data (truncated and aggregated) for training these models. |
@mengcz13 that's great if you can send the data to me. How can I contact you?my email address is ligaungops@gamil.com. Thanks a lot. |
I've sent an email sharing the processed files. |
@mengcz13 sorry,the address above is wrong.! here are my email liguangops@gamil.com, Please! thanks a lot. |
@liguang-ops @mengcz13 |
Hi all,
We have updated the script for running all preprocessing steps based on the constructed MIMIC-III dataset. We have also optimized the speed via adding necessary indices to the database. The script is in replacement of previous jupyter notebooks for preprocessing, which are now under the branch
dep_notebooks
.To generate all input files used for models, run
The preprocessing should finish within 1 day with --num_workers 4.
All input files are stored under
data/
and require 36GB disk space.The text was updated successfully, but these errors were encountered: