https://github.com/devindatt/covid19-chest_xrays-analysis/blob/master/ezgif.com-crop.gif
This analysis is only performed on a SMALL dataset (50 images) and shouldn't be considered true academic research of any kind to draw on any conclusions of its validity. Further research needs to be performed by trained medical practitioner on a much larger dataset (+10,000s) to see a true correlation. Having said that there does seems to be an interesting relationship that could help medical staff for quick diagnosis.
- To apply Data Science processes to train a deep learning model using Keras and TensorFlow to see if we can "predict" COVID-19 from only analyzing chest X-rays of patients
- To see if AI can be used to build a quick diagnosis tool for incoming patients
- COVID-19 tests are currently hard to come by so we need to rely on other diagnosis measures.
- COVID-19 attacks the epithelial cells that line the respiratory tract (source here)
- Nearly all hospitals have X-ray imaging machines use X-rays to analyze the health of a patient’s lungs
- X-rays and CT scans are used to diagnose pneumonia, lung inflammation, abscesses, and/or enlarged lymph nodes.
- Since X-ray analysis requires a radiology expert to interpret scan results which are in short supply can we develop tools to shortcut the path to a reliable diagnosis for medical practitioners
- Created by Dr. Joseph Cohen, a postdoctoral fellow of Sergio Bengio at the University of Montreal.
- This is is dataset that consists of scans of chest x-ray images found here for cases of MERS, SARS, ARDS as well as COVID-19
- Select only the scans for COVID-19 cases, which only had 25 images
- Choosing only Posterior-Anterior view (back-to-front) scans, assuming for now this is the best view for a model to detect the presence of the virus
- To balance the dataset I choose non-convid19 cases from the Kaggle chest X-ray Pneumonia dataset
- Total images in curated 50 X-ray scans (25 COVID19, 25 Non-COVID19)
Assuming you download and keep the default settings, the dataset should be in the 'dataset' folder and the model file is in the main directory, the following command should allow you to run the training model as the only mandatory parameter is the location of the dataset folder.
$ python3 train_covid19_main.py --dataset dataset
The above command will output the resulting train/validation plots in a file with a default name 'plot.png'. If you want to change this name you can simply use the 'plot' parameter and use any name, such as:
$ python3 train_covid19_main.py --dataset dataset --plot plot_filename.png
You can also use short hand parameters and even use your own model instead of the default 'covid19' model:
Paramenters | Description | Defaults |
---|---|---|
-d, --dataset | path to input dataset (required) | dataset |
-p, --plot | path to output loss/accuracy plot | plot.png |
-m, --model | path to model file to use for training | covid19.model |
-lr, --learning | models learning rate parameter used in training | 0.001 |
-e, --epoch | number of the epochs the model to train on | 25 |
-bs, --batchsize | batch size parameter used during the training | 8 |
Initial model settings, which you can change
- Learning rate = 0.001
- Trainig epochs = 25
- Batch size = 8
- Gather the images in the dataset directory and initialize arrays one for list of image data and another for the image class (covid or non-covid)
- Resize images to 224x224 pixels ignoring aspect ratio
- Convert data and labels to NumPy arrays while normalizing pixel intensities to the range
- Compute one-hot encoding on the labels to allow easier feature analysis
- Split into 80% training and 20% testing datasets
- Initialize the training data augmentation object generator
- Perform Transfer Learning by loading in the VGG16 model layers but leave off the Fully Connected layer as this will need to be retrained
- Create a new Fully Connected layer and connect to the base model
- Start training new model but freeze the base parameters so it only trains the FC layers
- Make some predications
- Print out results
- Save trained model
- A - Shows the Training Loss is low (~28%) and high accuracy (95%)
- B - Shows the models Precision is High (83%), and model to the ground truth Recall (89%), F1 score (~90%)
- C - Shows on the validation testing High accuracy (90%), sensitivity (80%), and specificity (100%)
The green circle shows how the validation results are closely tracking the training results as increasing the epochs, which is what we want to see if there is something the model able to see some type of correlation.
My ethical conclusion here is, you don’t need a degree in medicine to make an impact in the medical field — deep learning practitioners working closely with doctors and medical professionals can solve complex problems, save lives, and make the world a better place.
- Selling fake COVID19 test kits - namely by selling fake COVID-19 test kits
- Victims on social media falling for fake COVID-19 home testing kits - finding victims on social media platforms and chat applications