Conducting Explainable AI (XAI) on Disasters Image Classification Model Augmented with Pretrained CNN
This is the implementation of the paper Natural disasters detection using explainable deep learning.
It contains the code necessary to implement a CNN model concerning disasters while including XAI visualization for selected images.
The dataset from the Disaster Image Classification and MEDIC has been used.
Clone the project from GitHub
$ git clone https://github.com/tariqshaban/disaster-classification-with-xai.git
TensorFlow with Keras needs to be installed (preferably utilizing GPU acceleration "CUDA-enabled")
No further configuration is required.
Simply run the notebook on any IPython program.
The main operations conducted in this repository are thus:
- Modify the
global variables
section:- Generic seed
- Epochs
- Learning rate:
- 0.01
- 0.001
- 0.0001
- Pretrained model (base model)
- ResNet50
- InceptionV3
- VGG19
- EfficientNetB0
- EfficientNetB7
- EfficientNetV2B0
- EfficientNetV2L
- ViT-B-32
- Preprocessing method (in concurrence with the pretrained model)
- Optimization algorithm:
- Root Mean Squared Propagation (RMSProp)
- Adam, a replacement for stochastic gradient descent
- Read and decode the dataset into an array of pairs, denoting the true label of the image and the image name itself.
- Randomly partitioning the dataset into training, validation, and test (70% 20% 10%).
- Build a CNN model with the following characteristics:
- Hyperparameters:
- The specified number of epochs
- The specified learning rate
- The specified optimizer
- Layers:
- The selected base model
- Identity layer; since directly accessing the base model for the Grad-CAM is not possible (Non-ViT models only)
- GlobalAveragePooling2D layer (Non-ViT models only)
- Multiple dropout layers: 20%
- Rescaling layer: Part of the images' preprocessing (ViT models only)
- Multiple dense layers with softmax activation function
- Hyperparameters:
- Plotting the model's performance:
- Training accuracy
- Validation accuracy
- Training loss
- Validation loss
- Testing confusion matrix (applicable since the model is for multi-label classification)
- Visualize image samples:
- Display the original image
- Display the image augmented with LIME explainer
- Display the image augmented with Grad-CAM explainer (Non-ViT models only)
- Display the image augmented with Grad-CAM++ explainer (Non-ViT models only)
- Modify the global variables based on the observed results.
Note: Classical machine learning classifiers are added to assess the effectiveness of the deep learning models; however, they are not fitted with XAI. Such machine learning models include:
- Bagging
- Decision tree
- Random forest
- K-nearest neighbors
- SVM
- Linear SVM (with SGD training)
- Logistic regression (with SGD training)
HOG (Histogram of Oriented Gradients) was used as a feature descriptor to extract the edge orientation of the images. Then, the result was flattened to be trained by these models.
The following methods should be invoked to build and evaluate the model, as well as to implement XAI techniques:
# Download and filter the dataset
load_dataset()
# Ready the dataset and partition it into training, validation, and testing
prime_dataset()
# Build the model, and optionally plot performance measurements
model = build_model(measure_performance=True)
# Fetches a single image via a specified URL in the form of a matrix as a nested list
img = url_to_image('https://www.enr.com/ext/resources/News/2016/September/north_carolina_hurricane_matthew.jpg')
# Conduct XAI methods for an image on a predefined model; XAI methods include LIME, Grad-CAM, and Grad-CAM++
plot_XAI(img, model)
# Predict the image's class based on a predefined model
predict_image_class(img, model)
# Fetches a single image directly from the dataset in the form of a matrix as a nested list
img = path_to_image('05_01_1225.png')
# Conduct XAI methods for an image on a predefined model; XAI methods include LIME, Grad-CAM, and Grad-CAM++
plot_XAI(img, model)
# Predict the image's class based on a predefined model
predict_image_class(img, model)
Machine Learning Model | Disaster Image Classification Dataset | MEDIC Dataset |
---|---|---|
Bagging | %61.83 | %43.22 |
Decision Tree | %44.98 | %33.55 |
Random Forest | %64.10 | %46.02 |
K-Nearest Neighbors | %35.67 | %41.86 |
SVM | ✅ %72.52 | ✅ %54.46 |
Linear SVM (with SGD training) | %66.08 | %43.66 |
Logistic Regression (with SGD training) | %65.49 | %43.27 |
Disaster Image Classification Dataset | Learning Rate of 0.01 | Learning Rate of 0.001 | Learning Rate of 0.0001 | ||||
---|---|---|---|---|---|---|---|
Pretrained model | Optimizer | Accuracy | Loss | Accuracy | Loss | Accuracy | Loss |
ResNet50 | RMSProp | %91.86 | 0.4534 | %94.13 | 0.3170 | %94.43 | 0.2185 |
Adam | %90.40 | 0.5933 | %94.43 | 0.3208 | %94.65 | 0.1973 | |
InceptionV3 | RMSProp | %85.64 | 0.4963 | %90.76 | 0.3915 | %91.94 | 0.3172 |
Adam | %65.01 | 0.8476 | %90.25 | 0.4804 | %90.98 | 0.3253 | |
VGG19 | RMSProp | %90.98 | 0.5177 | %92.45 | 0.2995 | %93.18 | 0.3274 |
Adam | %91.13 | 0.5561 | %92.60 | 0.3322 | %92.67 | 0.2999 | |
EfficientNetB0 | RMSProp | %93.99 | 0.5230 | %93.99 | 0.3112 | %94.43 | 0.2093 |
Adam | %93.55 | 0.4254 | %93.99 | 0.3295 | %94.21 | 0.2096 | |
EfficientNetB7 | RMSProp | %92.08 | 0.7207 | %92.52 | 0.4363 | %93.04 | 0.3009 |
Adam | %91.50 | 0.6972 | %92.45 | 0.4202 | %92.96 | 0.2996 | |
EfficientNetV2B0 | RMSProp | %95.09 | 0.4872 | %94.87 | 0.2668 | %95.16 | 0.1838 |
Adam | %94.13 | 0.5253 | %94.57 | 0.2977 | %95.16 | ✅ 0.1834 | |
EfficientNetV2L | RMSProp | %91.72 | 0.4893 | %92.45 | 0.3260 | %93.62 | 0.2588 |
Adam | %91.50 | 0.5071 | %92.23 | 0.3382 | %93.11 | 0.2658 | |
ViT-B-32 | RMSProp | %93.84 | 1.3044 | %95.01 | 0.5274 | ✅ %95.23 | 0.2551 |
Adam | %94.21 | 1.1693 | %94.21 | 0.5438 | %95.09 | 0.2557 |
MEDIC Dataset | Learning Rate of 0.01 | Learning Rate of 0.001 | Learning Rate of 0.0001 | ||||
---|---|---|---|---|---|---|---|
Pretrained model | Optimizer | Accuracy | Loss | Accuracy | Loss | Accuracy | Loss |
ResNet50 | RMSProp | %69.77 | 0.9564 | %74.99 | 0.8523 | %75.17 | 0.7892 |
Adam | %71.79 | 0.9097 | %74.57 | 0.8770 | %74.99 | 0.7892 | |
InceptionV3 | RMSProp | %56.90 | 1.2042 | %69.14 | 1.0372 | %71.76 | 0.8921 |
Adam | %66.00 | 1.1068 | %71.29 | 0.9361 | %71.87 | 0.7187 | |
VGG19 | RMSProp | %70.79 | 1.0027 | %73.10 | 0.8454 | %74.99 | 0.7925 |
Adam | %71.06 | 0.9752 | %72.26 | 0.8487 | %74.41 | 0.7893 | |
EfficientNetB0 | RMSProp | %74.62 | 0.8699 | %75.72 | 0.8556 | %76.82 | 0.7424 |
Adam | %73.81 | 0.8858 | %75.64 | 0.8607 | %76.51 | 0.7434 | |
EfficientNetB7 | RMSProp | %71.45 | 0.9280 | %74.04 | 0.9616 | %75.36 | 0.7886 |
Adam | %72.84 | 0.9325 | %73.78 | 0.9565 | %75.36 | 0.7891 | |
EfficientNetV2B0 | RMSProp | %72.26 | 0.8460 | %76.46 | 0.8137 | %77.06 | 0.7157 |
Adam | %75.15 | 0.8510 | %76.38 | 0.7877 | %77.22 | ✅ 0.7140 | |
EfficientNetV2L | RMSProp | %73.68 | 0.8792 | %74.65 | 0.8268 | %75.88 | 0.7582 |
Adam | %72.94 | 0.9182 | %75.15 | 0.8221 | %75.93 | 0.7572 | |
ViT-B-32 | RMSProp | %73.78 | 1.2433 | %76.43 | 1.5416 | ✅ %76.93 | 0.8988 |
Adam | %74.83 | 1.4110 | %76.27 | 1.7522 | %76.85 | 0.9353 |
Based on the table, ViT-B-32 (RMSProp) at a learning rate of 0.0001 returned the highest accuracy, while EfficientNetV2B0 (Adam) at a learning rate of 0.0001 returned the lowest loss.
The following images are the result of using ViT-B-32 with RMSProp optimizer on 0.0001 learning rate (for the Disaster Image Classification Dataset).
Note that the model started converging at the 8th epoch since the pretrained model's weight has expedited the learning process.
Regardless of the hyperparameters enforced, all models generally have a relatively higher error rate in distinguishing between urban fire and wildfire, as well as between infrastructure damage and landslide, such observed behaviour seems logical; due to the shared characteristics between these classes.
The following are the XAI interpretation on random image samples, either from the dataset itself, or from external sources.
ResNet50 has been used for the XAI instead of the best model (ViT-B-32); since Grad-CAM and Grad-CAM++ require a 2D layer; which is only available in the pretrained CNN models.
All the images have been successfully classified to their true label.
- It appears that some of the provided true labels of the images are incorrect. A fair amount of images is not refined, that is, some images contain banners or even watermarks that might hinder the model’s performance.
Ahmad M. Mustafa, Rand Agha, Lujain Ghazalat, Tariq Sha’ban, Natural disasters detection using explainable deep learning, Intelligent Systems with Applications, 2024, 200430, ISSN 2667-3053.
@article{MUSTAFA2024200430,
title = {Natural disasters detection using explainable deep learning},
journal = {Intelligent Systems with Applications},
pages = {200430},
year = {2024},
issn = {2667-3053},
doi = {https://doi.org/10.1016/j.iswa.2024.200430},
url = {https://www.sciencedirect.com/science/article/pii/S2667305324001042},
author = {Ahmad M. Mustafa and Rand Agha and Lujain Ghazalat and Tariq Sha’ban},
}