COVID-19 Chest X-Ray Data Analysis

https://github.com/devindatt/covid19-chest_xrays-analysis/blob/master/ezgif.com-crop.gif

Disclaimer:

This analysis is only performed on a SMALL dataset (50 images) and shouldn't be considered true academic research of any kind to draw on any conclusions of its validity. Further research needs to be performed by trained medical practitioner on a much larger dataset (+10,000s) to see a true correlation. Having said that there does seems to be an interesting relationship that could help medical staff for quick diagnosis.

Objective

To apply Data Science processes to train a deep learning model using Keras and TensorFlow to see if we can "predict" COVID-19 from only analyzing chest X-rays of patients
To see if AI can be used to build a quick diagnosis tool for incoming patients

Reasoning for this Analysis

COVID-19 tests are currently hard to come by so we need to rely on other diagnosis measures.
COVID-19 attacks the epithelial cells that line the respiratory tract (source here)
Nearly all hospitals have X-ray imaging machines use X-rays to analyze the health of a patient’s lungs
X-rays and CT scans are used to diagnose pneumonia, lung inflammation, abscesses, and/or enlarged lymph nodes.
Since X-ray analysis requires a radiology expert to interpret scan results which are in short supply can we develop tools to shortcut the path to a reliable diagnosis for medical practitioners

Dataset Details

COVID19 Data:

Created by Dr. Joseph Cohen, a postdoctoral fellow of Sergio Bengio at the University of Montreal.
This is is dataset that consists of scans of chest x-ray images found here for cases of MERS, SARS, ARDS as well as COVID-19
Select only the scans for COVID-19 cases, which only had 25 images
Choosing only Posterior-Anterior view (back-to-front) scans, assuming for now this is the best view for a model to detect the presence of the virus

Non-COVID19 Data:

To balance the dataset I choose non-convid19 cases from the Kaggle chest X-ray Pneumonia dataset
Total images in curated 50 X-ray scans (25 COVID19, 25 Non-COVID19)

Running Code

Assuming you download and keep the default settings, the dataset should be in the 'dataset' folder and the model file is in the main directory, the following command should allow you to run the training model as the only mandatory parameter is the location of the dataset folder.

$ python3 train_covid19_main.py --dataset dataset

The above command will output the resulting train/validation plots in a file with a default name 'plot.png'. If you want to change this name you can simply use the 'plot' parameter and use any name, such as:

$ python3 train_covid19_main.py --dataset dataset --plot plot_filename.png

You can also use short hand parameters and even use your own model instead of the default 'covid19' model:

Paramenters	Description	Defaults
-d, --dataset	path to input dataset (required)	dataset
-p, --plot	path to output loss/accuracy plot	plot.png
-m, --model	path to model file to use for training	covid19.model
-lr, --learning	models learning rate parameter used in training	0.001
-e, --epoch	number of the epochs the model to train on	25
-bs, --batchsize	batch size parameter used during the training	8

Analysis Step:

Initial model settings, which you can change

Learning rate = 0.001
Trainig epochs = 25
Batch size = 8

Gather the images in the dataset directory and initialize arrays one for list of image data and another for the image class (covid or non-covid)
Resize images to 224x224 pixels ignoring aspect ratio
Convert data and labels to NumPy arrays while normalizing pixel intensities to the range
Compute one-hot encoding on the labels to allow easier feature analysis
Split into 80% training and 20% testing datasets
Initialize the training data augmentation object generator
Perform Transfer Learning by loading in the VGG16 model layers but leave off the Fully Connected layer as this will need to be retrained
Create a new Fully Connected layer and connect to the base model
Start training new model but freeze the base parameters so it only trains the FC layers
Make some predications
Print out results
Save trained model

Results:

A - Shows the Training Loss is low (~28%) and high accuracy (95%)
B - Shows the models Precision is High (83%), and model to the ground truth Recall (89%), F1 score (~90%)
C - Shows on the validation testing High accuracy (90%), sensitivity (80%), and specificity (100%)

The green circle shows how the validation results are closely tracking the training results as increasing the epochs, which is what we want to see if there is something the model able to see some type of correlation.

My ethical conclusion here is, you don’t need a degree in medicine to make an impact in the medical field — deep learning practitioners working closely with doctors and medical professionals can solve complex problems, save lives, and make the world a better place.

Fraud associated with COVID19 Outbreak:

Selling fake COVID19 test kits - namely by selling fake COVID-19 test kits
Victims on social media falling for fake COVID-19 home testing kits - finding victims on social media platforms and chat applications

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
dataset		dataset
README.md		README.md
covid19_keras_dataset2.png		covid19_keras_dataset2.png
ezgif.com-crop.gif		ezgif.com-crop.gif
model_helpers.py		model_helpers.py
result_sshot1.png		result_sshot1.png
train_covid19_main.py		train_covid19_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Chest X-Ray Data Analysis

Disclaimer:

Objective

Reasoning for this Analysis

Dataset Details

COVID19 Data:

Non-COVID19 Data:

Running Code

Analysis Step:

Results:

Fraud associated with COVID19 Outbreak:

About

Releases

Packages

Languages

devindatt/covid19-chest_xrays-analysis

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Chest X-Ray Data Analysis

Disclaimer:

Objective

Reasoning for this Analysis

Dataset Details

COVID19 Data:

Non-COVID19 Data:

Running Code

Analysis Step:

Results:

Fraud associated with COVID19 Outbreak:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages