Artificial intelligence for lung disease detection using chest CT scan images

Artificial intelligence has the potential to help in disease detection using CT scan images from patient's chests. In this project, we apply two convolutional neural networks for image classification. Two data sets were gathered from Kaggle and Github for training Convolutional Nural Networks (CNN). First, a two-class classification model was trained on balanced data (covid vs normal) to differentiate the healthy cases from covid cases. Second, a neural network was trained to separate four classes including pneumocystis, covid, streptococcus, and normal. Two common approaches in image processing to deal with imbalanced data are class weight adjustment and over-sampling (check main branch of this project). The oversampling was done along with data augmentation (Applying different transformers for this purpose, flip, rotation, zoom) for a four-class classification project. The models were run on the local machine with a few epochs and later uploaded into the google-colab to benefit from Colab GPU.

Instruction

Gathering data: The X-Ray images were gathered from Kaggle and Github. The data (All_data) then was divided into three Train, Validation, and Test folders (Dataset_augmented_subfolders) (two class classification project). In four class classification project, the data (All_data_4_classes) was augmented and oversampled (Dataset_augmented_4_classes) and then was divided into four subfolders including Normal, Covid, Pneumocystis, Streptococcus (Data_augmented_4_classes_train_test_val).

Assembled Deep Net Model Layers: In multiclass classification using DNN, two to three convolution layers are suggested. Also, the softmax activation function for the last layer is recommended. Note that the categorical_crossentropy is almost default for multiclass classifiers. Convolution layers are used for image processing. The reason is if we use dense layers we will lose positional information in images. In four class classification projects, I found that the relu activation function results in higher accuracy. I used Adam optimizer with a learning rate of 0.001. It is a common practice of practitioners , to begin with, small networks with a small number of layers and then changing the architecture step by step taking the Bias and variance into account. I had high bias in network which I solved by increasing the network size and by changing activation function type.

Preparing Images: Using ImageDataGenerator does the normalization. Note that in the validation and test section, I just applied the normalization. In two-class classification, the number of images is equal so there is no need for balancing the dataset. However, for the four-class classification, I have imbalanced data and I need to consider it to prevent bias. In four class classification, I augmented and oversampled for all four classes. The Normal and Covid cases were augmented and over-sampled from 190 to 1000 images. The Pneumocystis and Streptococcus were augmented and over-sampled from 21 and 12 to 1000 images.

Suggestion

Balancing data using a generator is one option for dealing with imbalanced data but it is not always the best.
The weighted objective function can be used as a second option to deal with unbalanced datasets.
Generally, using either above options results in losing lots of features which results in low model accuracy.
The results for the two-class and four-class classification projects were promising.
Different learning rates should be applied to see it will affect the output.

Results

Both classification models' accuracy reached 80%. Deep Convolutional Network Network (CNN) Classification results for four classes are seen below.

Name		Name	Last commit message	Last commit date
Latest commit History 185 Commits
.ipynb_checkpoints		.ipynb_checkpoints
All_data		All_data
All_data_4_classes		All_data_4_classes
Assets		Assets
Codes		Codes
Data_augmented_4_classes_train_test_val		Data_augmented_4_classes_train_test_val
Dataset_augmented_4_classes		Dataset_augmented_4_classes
Dataset_augmented_subfolders		Dataset_augmented_subfolders
Extract and filter images from data set		Extract and filter images from data set
Figures		Figures
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artificial intelligence for lung disease detection using chest CT scan images

Instruction

Suggestion

Results

About

Releases

Packages

Languages

License

Atashnezhad/Lung_Disease_Detection_Deeplearning

Folders and files

Latest commit

History

Repository files navigation

Artificial intelligence for lung disease detection using chest CT scan images

Instruction

Suggestion

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages