Skip to content

Atashnezhad/Lung_Disease_Detection_Deeplearning

Repository files navigation

Artificial intelligence for lung disease detection using chest CT scan images

Artificial intelligence has the potential to help in disease detection using CT scan images from patient's chests. In this project, we apply two convolutional neural networks for image classification. Two data sets were gathered from Kaggle and Github for training Convolutional Nural Networks (CNN). First, a two-class classification model was trained on balanced data (covid vs normal) to differentiate the healthy cases from covid cases. Second, a neural network was trained to separate four classes including pneumocystis, covid, streptococcus, and normal. Two common approaches in image processing to deal with imbalanced data are class weight adjustment and over-sampling (check main branch of this project). The oversampling was done along with data augmentation (Applying different transformers for this purpose, flip, rotation, zoom) for a four-class classification project. The models were run on the local machine with a few epochs and later uploaded into the google-colab to benefit from Colab GPU.

Instruction

Gathering data: The X-Ray images were gathered from Kaggle and Github. The data (All_data) then was divided into three Train, Validation, and Test folders (Dataset_augmented_subfolders) (two class classification project). In four class classification project, the data (All_data_4_classes) was augmented and oversampled (Dataset_augmented_4_classes) and then was divided into four subfolders including Normal, Covid, Pneumocystis, Streptococcus (Data_augmented_4_classes_train_test_val).

Assembled Deep Net Model Layers: In multiclass classification using DNN, two to three convolution layers are suggested. Also, the softmax activation function for the last layer is recommended. Note that the categorical_crossentropy is almost default for multiclass classifiers. Convolution layers are used for image processing. The reason is if we use dense layers we will lose positional information in images. In four class classification projects, I found that the relu activation function results in higher accuracy. I used Adam optimizer with a learning rate of 0.001. It is a common practice of practitioners , to begin with, small networks with a small number of layers and then changing the architecture step by step taking the Bias and variance into account. I had high bias in network which I solved by increasing the network size and by changing activation function type.

Preparing Images: Using ImageDataGenerator does the normalization. Note that in the validation and test section, I just applied the normalization. In two-class classification, the number of images is equal so there is no need for balancing the dataset. However, for the four-class classification, I have imbalanced data and I need to consider it to prevent bias. In four class classification, I augmented and oversampled for all four classes. The Normal and Covid cases were augmented and over-sampled from 190 to 1000 images. The Pneumocystis and Streptococcus were augmented and over-sampled from 21 and 12 to 1000 images.

Suggestion

  • Balancing data using a generator is one option for dealing with imbalanced data but it is not always the best.
  • The weighted objective function can be used as a second option to deal with unbalanced datasets.
  • Generally, using either above options results in losing lots of features which results in low model accuracy.
  • The results for the two-class and four-class classification projects were promising.
  • Different learning rates should be applied to see it will affect the output.

Results

Both classification models' accuracy reached 80%. Deep Convolutional Network Network (CNN) Classification results for four classes are seen below.