Best Student Team & 4th Place Solution of SIIM-FISABIO-RSNA COVID-19 Detection
Identify and localize COVID-19 abnormalities on chest radiographs
This is a collaboration between BUET and NVIDIA
Name | Affiliation | Country | Position |
---|---|---|---|
Md Awsafur Rahman | Dept. of EEE, BUET | π§π© | Undergrad Student |
Bishmoy Paul | Dept. of EEE, BUET | π§π© | Undergrad Student |
Najibul Haque Sarker | Dept. of CSE, BUET | π§π© | Undergrad Student |
Zaber Ibn Abdul Hakim | Dept. of CSE, BUET | π§π© | Undergrad Student |
Chris Deotte | Nvidia | πΊπΈ | Senior Data Scientist |
Below you can find an outline of how to reproduce our solution.
If you run into any trouble with the setup/code or have any questions please contact me at awsaf49@gmail.com
- GPU : 4x Tesla V100
- GPU Memory : 4x32 GiB
- CUDA Version : 11.0
- Driver Version : 450.119.04
- CPU RAM : 16 GiB
- DISK : 2 TB
- python-gdcm==3.0.9.1
- pydicom==2.1.2
- joblib==1.0.1
- tensorflow==2.4.1
- torch==1.7.0
- torchvision==0.8.1
- numpy==1.19.5
- pandas==1.2.4
- matplotlib==3.4.2
- opencv-python==4.5.2.54
- opencv-python-headless==4.5.2.54
- Pillow==8.2.0
- PyYAML>=5.3.1
- scipy==1.6.3
- tqdm==4.61.1
- tensorboard==2.4.1
- seaborn==0.11.1
- ensemble_boxes==1.0.6
- albumentations==1.0.1
- thop==0.0.31.post2005241907
- Cython==0.29.23
- pycocotools==2.0
- addict==2.4.0
- timm==0.4.12
- efficientnet==1.1.1
External Packages with version number are listed on requirements.txt
! pip install -qr requirements.txt
- Download competition data and extract it to
./data/siim-covid19-detection
- Download chexpert dataset and extract to
./data/chexpert
- Download RSNA competion data and extract it to
./data/rsna-pneumonia-detection-challenge
- Download Ricord dataset and extract it to
./data/ricord
After this ./data
directory should look something like this.
.
βββ data
β βββ chexpert
β β βββ train
β β βββ train.csv
β β βββ valid
β β βββ valid.csv
β βββ ricord
β β βββ MIDRC-RICORD
β β βββ MIDRC-RICORD-meta.csv
β βββ rsna-pneumonia-detection-challenge
β β βββ GCP Credits Request Link - RSNA.txt
β β βββ stage_2_detailed_class_info.csv
β β βββ stage_2_sample_submission.csv
β β βββ stage_2_test_images
β β βββ stage_2_train_images
β β βββ stage_2_train_labels.csv
β βββ siim-covid19-detection
β βββ sample_submission.csv
β βββ test
β βββ train
β βββ train_image_level.csv
β βββ train_study_level.csv
In case you are wondering to have a look at complete directory structure, you can see it in data_structure.txt
After this run prepare_data.py
. It does the following
- Read training data from RAW_DATA_DIR (specified in SETTINGS.json)
- Run any preprocessing steps
- Save the cleaned data to CLEAN_DATA_DIR (specified in SETTINGS.json)
- --img-size image size in which we want our cleaned to to be
- --debug if given 1, it will only process 100 images
! python prepare_data.py
Simply run the train.py
script. It does the following
- Read training data from TRAIN_DATA_CLEAN_PATH (specified in SETTINGS.json)
- Pretrains classification and detection backbones in chexpert data.
- Finetunes them on competition data and external data.
- Save model to MODEL_DIR (specified in SETTINGS.json)
- --settings-path path to SETTINGS.json. Default value uses the correct path.
- --clsbs-path path to json file containing necessary batch sizes for different classification models. Default value uses the correct path.
- --detbs-path path to json file containing necessary batch sizes for different detection models. Default value uses the correct path.
- --debug will process only 100 images
! python train.py
Before proceeding download this already trained checkpoints and unzip them into the path specified in CHECKPOINT_DIR in SETTINGS.json.
./checkpoints
then should look like
.
βββ checkpoints
β βββ 2cls
β βββ 4cls
β βββ det
For predicting on test data run predict.py
. It does the following
- Read test data from TEST_DATA_CLEAN_PATH (specified in SETTINGS.json)
- Loads models from MODEL_DIR(specified in SETTINGS.json) when everything is trained from scratch or CHECKPOINT_DIR(specified in SETTINGS.json) when predicting from our previously trained checkpoints.
- Use our models to make predictions on new samples
- Save our predictions to SUBMISSION_DIR (specified in SETTINGS.json)
- --mode if used "full", then it will use the weights saved in MODEL_DIR (which was saved after training from scratch) and when used "fast" it will use the weights saved in CHECKPOINT_DIR (already trained checkpoints)
- --debug if given 1, it will infer on only first 100 images
!python predict.py --mode "fast"
or
! python predict.py --mode "full"
- Weights & Biases for tracking training.
- efficientnet for efficientnet model.