This bioinformatics project is focused on tracing the origins of SARS-CoV-2, the virus known for causing COVID-19, by comparing its genetic material with that of coronaviruses found in animals. We use a technique called the Levenshtein Distance algorithm to measure the number of differences between the amino acid sequences of two different viral genomes. This serves as a proxy for determining the minimum number of mutations that must have occurred within the viral RNA to transform a coronavirus common within other animals into SARS-CoV-2, and thus, the most likely original animal host.
Our research suggests that pangolins or bats are the most likely sources of SARS-CoV-2, with slightly more evidence pointing towards the former. For a detailed exploration of our study, I have made the full report available within the repository.
Figure 1: An Excel snapshot capturing the minimum number of mutations required for various coronaviruses to match the reference SARS-CoV-2 strain. The dataset includes five samples from each animal species tested, along with MERS and SARS for comparison, and five other SARS-CoV-2 variants.
Figure 2: A bar graph illustrating the comparative analysis of mutation frequencies. The graph compares the number of mutations across different coronaviruses, including samples from each animal species tested, relative to the reference SARS-CoV-2 strain.
FASTAs/
: This directory contains FASTA files, which are formatted sequences of RNA genomes from various animal coronaviruses. All FASTA files have been found and extracted from the National Center for Biotechnology Information (NCBI) database.Project.py
: This script runs the core analysis. It facilitates the process of RNA translation into amino acids, aligns the genetic sequences, and calculates their differences.Report.py
: This is the final write-up to conclude the project. The write-up includes a detailed summary of the experimental research, methods, and final conclusions, taking into consideration both the evidence gathered and the general consensus of the scientific community at the time.
-
Download or clone the repository: Clone or download this repository to your local machine to get started.
git clone https://github.com/AndyAnderson8/SARS-CoV-2-Origin-Study.git cd SARS-CoV-2-Origin-Study
-
Running analysis: Simply launch the
Project.py
file.python Project.py