Skip to content

Independent bioinformatics project on the origin of SARS-CoV-2 utilizing the Levenshtein distance algorithm. Uses real-world genome data of various animal coronaviruses as points of comparison.

License

Notifications You must be signed in to change notification settings

AndyAnderson8/SARS-CoV-2-Origin-Study

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SARS-CoV-2 Origin Study

Description

This bioinformatics project is focused on tracing the origins of SARS-CoV-2, the virus known for causing COVID-19, by comparing its genetic material with that of coronaviruses found in animals. We use a technique called the Levenshtein Distance algorithm to measure the number of differences between the amino acid sequences of two different viral genomes. This serves as a proxy for determining the minimum number of mutations that must have occurred within the viral RNA to transform a coronavirus common within other animals into SARS-CoV-2, and thus, the most likely original animal host.

Results

Our research suggests that pangolins or bats are the most likely sources of SARS-CoV-2, with slightly more evidence pointing towards the former. For a detailed exploration of our study, I have made the full report available within the repository.

Data Visualizations

Minimum Mutation Analysis Table

Minimum Mutation Analysis Table Figure 1: An Excel snapshot capturing the minimum number of mutations required for various coronaviruses to match the reference SARS-CoV-2 strain. The dataset includes five samples from each animal species tested, along with MERS and SARS for comparison, and five other SARS-CoV-2 variants.

Comparative Mutation Frequency Graph

Comparative Mutation Frequency Graph Figure 2: A bar graph illustrating the comparative analysis of mutation frequencies. The graph compares the number of mutations across different coronaviruses, including samples from each animal species tested, relative to the reference SARS-CoV-2 strain.

Repository Contents

  • FASTAs/: This directory contains FASTA files, which are formatted sequences of RNA genomes from various animal coronaviruses. All FASTA files have been found and extracted from the National Center for Biotechnology Information (NCBI) database.
  • Project.py: This script runs the core analysis. It facilitates the process of RNA translation into amino acids, aligns the genetic sequences, and calculates their differences.
  • Report.py: This is the final write-up to conclude the project. The write-up includes a detailed summary of the experimental research, methods, and final conclusions, taking into consideration both the evidence gathered and the general consensus of the scientific community at the time.

Installation and Setup

  1. Download or clone the repository: Clone or download this repository to your local machine to get started.

     git clone https://github.com/AndyAnderson8/SARS-CoV-2-Origin-Study.git
     cd SARS-CoV-2-Origin-Study
  2. Running analysis: Simply launch the Project.py file.

    python Project.py

License

MIT

About

Independent bioinformatics project on the origin of SARS-CoV-2 utilizing the Levenshtein distance algorithm. Uses real-world genome data of various animal coronaviruses as points of comparison.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages