Clinical Explainability Failure (CEF) & Explainability Failure Rate (EFR) – changing the way we validate classification algorithms?
This repository includes the explanation and access to the data used for experiments in our paper.
We used two Chext X-ray abnormality detection algorithms in our experiments:
- Pneumonia Detection & Classification
- CheXNeXt
Total Studies: 611 frontal Chest X-rays
Pneumonia Detection & Classification | CheXNeXt | |
---|---|---|
# Cases in the consolidation class | 157 | 157 |
# Consolidation classified by the model | 136 | 221 |
True Positives | 90 | 120 |
Clinical Explainability Failures | 2 | 16 |
Explainability Failure Ratio | 2/90(2.22%) | 16/120(13.33%) |
You can access the data using this link. Data is stored in the following structure:
- pneumonia_detection
- overlap
- no_overlap
- chexnext
- overlap
- no_overlap
overlap: Cases where the bounding box generated by the algorithm and the radiologist overlapped.
no_overlap: No overlap between the bounding boxes by the algorithm and the radiologist
Note: Bounding boxes in black outline are generated by the algorithm. Bounding boxes in white outline are marked by the radiologist.
Please refer to the sample images.
For any issues related to the paper, please write to Vasanth Venugopal. For data access related issues, please write to Rohit Takhar.