Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



5 Commits

Repository files navigation

1st place solution for RSNA Screening Mammography Breast Cancer Detection competition on Kaggle

Solution write up:

overall pipeline


Please download those trained models and put in assets/trained/:

# this assume that kaggle api is installed:
kaggle datasets download -d dangnh0611/rsna-breast-cancer-detection-best-ckpts -p assets/trained
unzip -d assets/trained/
rm assets/trained/



  • assets: contain neccessary data files, trained models
    • assets/data/: csv label for external datasets (BMCD and CMMD), breast ROI box annotation in YOLOv5 format
    • assets/public_pretrains/: publicly available pretrains
    • assets/trained/: trained models, used for winning submission
  • datasets/: where to store datasets (competition + external), expected to contain both raw and cleaned version.
    • datasets/raw/: raw version of competion data + all external datasets: BMCD, CDD-CESM, CMMD, MiniDDSM, Vindr. For how to correctly structure datasets, please refer to docs/
  • docker/: Dockerfile
  • docs/: documentations
  • src/: contain almost source code for this project
    • src/roi_det: for training breast ROI detection model (YOLOX)
    • src/pytorch-image-models: for training classification model (Convnext-small)
    • src/submit: code to generate predictions (submission)
    • src/tools: contain python scripts, bash scripts to prepair datasets, training and convert models,..
    • src/utils: Utilities for dicom processing,..
  • SETTINGS.json: define relative paths for IO

SETTINGS.json defines base paths for IO:

  • RAW_DATA_DIR: Where to store raw dataset, including both competition dataset and external datasets.
  • PROCESSED_DATA_DIR: Where to store processed/cleaned datasets
  • MODEL_CHECKPOINT_DIR: Store intermediate checkpoints during training
  • MODEL_FINAL_SELECTION_DIR: Where to store final (best) models used for submission
  • SUBMISSION_DIR: Where to store final submission/inference results
  • ASSETS_DIR: Store trained models, manually annotated datasets/files. This must not be changed and define here for easier looking up only.
  • TEMP_DIR: Where to store intermediate results/files


The following machine were used to create the final solution: NVIDIA DGX A100. Most of my experiments can be done using 1-3 A100 GPUs. However, final results can be easily reproduced using a single A100 GPU (40GB GPU Memory).


Refer to docs/ for details on how to correctly setup datasets.


There are some stages to reproduce the entire solutions. I will briefly describe it for easier further understanding.

  1. Train a YOLOX on some of competition images for breast ROI detection
    • Convert competition dicom files to 8-bits png images
    • Convert detection labels in YOLOv5 format to COCO format (YOLOX accepts COCO format without any modifications)
    • Train a YOLOX-nano 416x416 model on those images (521 train images, 50 val images)
    • Convert trained YOLOX model from Torch to TensorRT engine.
  2. Using trained YOLOX TensorRT engine to crop breast ROI region, save to disk as 8-bits pngs
    • Clean and re-structure raw datasets (competition data + external data) in an unified way (standardize the format/structure)
    • Dicom decoding --> ROI detection (YOLOX) --> ROI crop --> normalization --> save to disk
  3. Train Convnext-small model for classification using those saved ROI images
    • Do a 4-folds splits on competition data.
    • Train 4 Convnext-small model on each folds
    • Select best checkpoint for each fold
    • Convert those models from Torch to TensorRT
  4. Inference on test data (submission)


All the following instructions assume that datasets (competition + external data) are all set up. There are 4 options to reproduce the solutions:

  1. Use trained models

    • No training, just use trained models in assets/trained to make predictions
  2. Do not re-train YOLOX, fully reproduce Convnext-small classification models

    • Skip re-train the YOLOX part, use (my) trained YOLOX for further steps
    • Re-train 4x Convnext-small classification models. This part can be 100% reproduced (give you identical models/training log/result) without any randomness.
    • This method should give 100% identical score on both CV/LB/PB
  3. Re-train all parts (reproduce from scratch)

    • Won't use any of (my) trained models in any parts, but re-train all of theme from scratch
    • This may not give 100% identical results/scores. The reason is that YOLOX can't be fully reproduced to get EXACTLY same model as used in winning submission. More details here
    • Note that dataset used for training Convnext-small classification models is generated base on YOLOX's prediction, so changes in YOLOX will cause changes in Convnext-small classification models --> Convnext-small classification models will also be unreproducible (in a 100% way).
    • But in general, it should give nearly identical results/scores within a reasonable margin.

5.1. Use trained models to make predictions

5.1.1. Convert trained YOLOX to TensorRT

A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:

PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/ \
    -expn trained_yolox_nano_416_to_tensorrt \
    -f src/roi_det/YOLOX/exps/projects/rsna/ \
    -c assets/trained/yolox_nano_416_roi_torch.pth \
    --save-path assets/trained/yolox_nano_416_roi_trt.pth \
    -b 1


  • Create new directory {MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/.
  • The converted YOLOX TensorRT engine will also be saved to ./assets/trained/yolox_nano_416_roi_trt.pth

5.1.2. Convert trained 4 x Convnext-small models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/ --mode trained

Behaviours: Save a 4-folds combined TensorRT engine to ./assets/trained/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.1.3. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/ --mode trained --trt


  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv

5.2. Keep trained YOLOX, re-train Convnext-small classification models

5.2.1. Convert trained YOLOX to TensorRT

A YOLOX-nano 416 engine which was optimized for NVIDIA A100 is provided at assets/trained/yolox_nano_416_roi_trt_a100.pth. However, the recommended way is to convert it to TensorRT, optimized for your environment/hardware:

PYTHONPATH=$(pwd)/src/roi_det/YOLOX:$PYTHONPATH python3 src/roi_det/YOLOX/tools/ \
    -expn trained_yolox_nano_416_to_tensorrt \
    -f src/roi_det/YOLOX/exps/projects/rsna/ \
    -c assets/trained/yolox_nano_416_roi_torch.pth \
    --save-path assets/trained/yolox_nano_416_roi_trt.pth \
    -b 1


  • Create new directory {MODEL_CHECKPOINT_DIR}/yolox_roi_det/trained_yolox_nano_416_to_tensorrt/.
  • The converted YOLOX TensorRT engine will also be saved to ./assets/trained/yolox_nano_416_roi_trt.pth

5.2.2. Prepair datasets to train classification models

python3 src/tools/ --num-workers 8 --roi-yolox-engine-path assets/trained/yolox_nano_416_roi_trt.pth


  • Create a stage1_images in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images for the intermediate stage.
  • Create a new directory {PROCESSED_DATA_DIR}/classification/ contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/ and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv for each dataset.

5.2.3. Perform 4-folds splitting on competition data

python3 src/tools/

Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/

5.2.4. Training 4 x Convnext-small classification models

python3 src/tools/ --mode fully_reproduce

This will save a file named in current directory, which include commands and instructions to train Convnext-small classification models. To reproduce using single GPU, simply run

sh ./

This could take 8 days to finish training (around 2 days for each fold).

Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training


  • This assumes that directory {MODEL_CHECKPOINT_DIR}/timm_classification/ is empty before start any train commands
  • Saving checkpoints/logs to {MODEL_CHECKPOINT_DIR}/timm_classification/, contains 6 sub-directories named
    • fully_reproduce_train_fold_2
    • fully_reproduce_train_fold_3
    • stage1_fully_reproduce_train_fold_0
    • stage1_fully_reproduce_train_fold_1
    • stage2_fully_reproduce_train_fold_0
    • stage2_fully_reproduce_train_fold_1

5.2.5. Checkpoints selection

python3 src/tools/ --mode fully_reproduce


  • This could overwrite convnext checkpoint files in {MODEL_FINAL_SELECTION_DIR}/
  • Select and copy the 4 best checkpoints for each folds to {MODEL_FINAL_SELECTION_DIR}/:
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar

5.2.6. Convert selected best Convnext models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/ --mode reproduce

Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.2.7. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/ --mode partial_reproduce --trt


  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv

5.3. Re-train all parts from scratch

5.3.1. Prepair dataset for training YOLOX ROI detector

python3 src/tools/ --num-workers 4


  • Copy mannual annotated breast ROI box in YOLOv5 format from ./assets/data/roi_det_yolov5_format/ to {PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/
  • Decode 571 dicom files in competition dataset to 8-bits png, stored at {PROCESSED_DATA_DIR}/roi_det_yolox/yolov5_format/images/
  • Convert from YOLOv5 format to COCO format, stored at {PROCESSED_DATA_DIR}/roi_det_yolox/coco_format/

5.3.2. Retrain YOLOX for breast ROI detection

sh src/tools/


  • Train YOLOX, saving checkpoints to {MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
  • (Optional) Perform evaluation on best checkpoint, print results
  • Convert newly trained best checkpoint to TensorRT, stored in {MODEL_CHECKPOINT_DIR}/yolox_roi_det/yolox_nano_416_reproduce/
  • Copy best Torch checkpoint to {MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_torch.pth
  • Copy the converted best TensorRT engine in previous step to {MODEL_FINAL_SELECTION_DIR}/yolox_nano_416_roi_trt.pth

5.3.3. Prepair datasets to train classification models

This will use newly trained YOLOX in previous step as breast ROI extractor.

python3 src/tools/ --num-workers 8


  • Create a stage1_images in each raw dataset directory: {RAW_DATA_DIR}/{dataset_name}/stage1_images for the intermediate stage.
  • Create a new directory {PROCESSED_DATA_DIR}/classification/ contains 8-bits png images {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_images/ and cleaned label file {PROCESSED_DATA_DIR}/classification/{dataset_name}/cleaned_label.csv for each dataset.

5.3.4. Perform 4-folds splitting on competition data

python3 src/tools/

Behaviors: Create new directory and saving csv files in {PROCESSED_DATA_DIR}/rsna-breast-cancer-detection/cv/v2/

5.3.5. Training 4 x Convnext-small classification models

python3 src/tools/ --mode fully_reproduce

This will save a file named in current directory, which include commands and instructions to train Convnext-small classification models. To reproduce using single GPU, simply run

sh ./

This could take 8 days to finish training (around 2 days for each fold).

Or if you have multiple GPUs and want to speed up training, simply follow instructions in the generated train script and run each command in parallel using different GPUs. For more details on the training process, take a look at my write up, part 4.3.Training


  • This assumes that directory {MODEL_CHECKPOINT_DIR}/timm_classification/ is empty before start any train commands
  • Saving checkpoints/logs to {MODEL_CHECKPOINT_DIR}/timm_classification/, contains 6 sub-directories named
    • fully_reproduce_train_fold_2
    • fully_reproduce_train_fold_3
    • stage1_fully_reproduce_train_fold_0
    • stage1_fully_reproduce_train_fold_1
    • stage2_fully_reproduce_train_fold_0
    • stage2_fully_reproduce_train_fold_1

5.3.6. Checkpoints selection

python3 src/tools/ --mode fully_reproduce


  • This could overwrite convnext checkpoint files in {MODEL_FINAL_SELECTION_DIR}/
  • Select and copy the 4 best checkpoints for each folds to {MODEL_FINAL_SELECTION_DIR}/:
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_0.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_1.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_2.pth.tar
    • {MODEL_FINAL_SELECTION_DIR}/best_convnext_fold_3.pth.tar

5.3.7. Convert selected best Convnext models to TensorRT

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/tools/ --mode reproduce

Behaviours: Save a 4-folds combined TensorRT engine to {MODEL_FINAL_SELECTION_DIR}/best_ensemble_convnext_small_batch2_fp32.engine'.

It takes 5-10 minutes for Kaggle's P100 GPU to finish, but take about 1 hour for A100 GPU (my case).

5.3.8. Submission

PYTHONPATH=$(pwd)/src/pytorch-image-models/:$PYTHONPATH python3 src/submit/ --mode reproduce --trt


  • Create a temporary directory storing 8-bits png images at {TEMP_DIR}/pngs/ and expected to be removed once inference done.
  • Save submission csv result to {SUBMISSION_DIR}/submission.csv