Skip to content

jorgeMFS/canvas

Repository files navigation

Panther


Complexity ANalysis VirAl Sequences

License: MIT

Install GIT LFS (GIT LARGE FILE STORAGE)

If git LFS, please intall it using the following steps:

mkdir -p gitLFS
cd gitLFS/
wget https://github.com/git-lfs/git-lfs/releases/download/v2.9.0/git-lfs-linux-amd64-v2.9.0.tar.gz
tar -xf git-lfs-linux-amd64-v2.9.0.tar.gz
chmod 755 install.sh
sudo ./install.sh

Download Project

Get CANVAS project using:

git clone https://github.com/jorgeMFS/canvas.git
cd canvas/

Using Docker

To perform installation correctly, docker and docker compose must be installed in the system (see https://docs.docker.com/engine/install/ubuntu/).

Then follow these instructions:

git clone https://github.com/jorgeMFS/canvas.git
cd canvas
docker-compose build
docker-compose up -d && docker exec -it canvas bash && docker-compose down

Install Tools

Give run permissions to the files and Install Tools:

chmod +x *.sh
bash Make.sh;

Result Replication

To run the pipeline and obtain all the Reports in the folder reports, use the following commands. Note that it is not required to perform database reconstruction and feature recreation to perform any other tasks. However, if you wish to recreate the features reports, you must perform the database reconstruction task.

Cmix vs GeCo3 time-compression Analysis

To obtain the Human Herpesvirus plot run:

cd scripts || exit;
python compare_cmix_hhv.py 

Compression Benchmark Analysis

To obtain the Compression Benchmark plots run:

cd python || exit;
python select_best_nc_model.py;

Synthetic Sequence Analysis

To perform the synthetic sequence test run:

cd scripts || exit;
bash Stx_seq_test.sh;

Classification

To perform classification run the following code:

cd python || exit;
python prepare_classification.py; #recreate classification dataset
python classifier.py; #perform classifications

IR Analysis

To perform the complete IR analysis and create:

  • boxplots;
  • 2d scatter plots;
  • 3d scatter plots;
  • top taxonomic group lists;
  • Occurrence of each Genus.

Execute this code:

cd python || exit;
python ir_analysis.py; # Performs complete IR analysis

Human Herpesvirus Analysis

To obtain the Human Herpesvirus plot run:

cd scripts || exit;
bash Herpesvirales.sh;

Cmix vs GeCo3 time-compression Analysis

To obtain the Human Herpesvirus plot run:

cd scripts || exit;
python compare_cmix_hhv.py 

Database reconstruction

If you wish to reconstruct the Viral database, run the following script:

cd scripts || exit;
bash Build_DB.sh;

Create Features for Analysis and Classification

To create the features for analysis and classification (very time consuming, may take several days) run:

cd scripts || exit;
bash Process_features.sh;

Benchmarck Compression Reports

To recreate the compression reports used for benchmark (very time consuming, may take several days) run:

cd scripts || exit;
bash Compress.sh;

Cladograms

The cladograms require GUI application. As such, the reproduction of the trees has to be performed outside of the docker on the Ubuntu system on the /canvas folder:

chmod +x *.sh
bash so_dependencies.sh #install Ubuntu system dependencies required for the script to run and Anaconda
conda create -n canvas python=3.6
conda activate canvas
bash Make.sh #install python libs
bash Install_programs.sh #install tools using conda

Afterwards, to obtain the cladogram plots run:

cd python || exit;
python phylo_tree.py;

Website

Check out the website of this project: https://asilab.github.io/canvas/

CITE

Please cite the followings, if you use CANVAS:

Jorge Miguel Silva, Diogo Pratas, Tânia Caetano, Sérgio Matos, The complexity landscape of viral genomes, GigaScience, Volume 11, 2022, giac079, https://doi.org/10.1093/gigascience/giac079

@article{10.1093/gigascience/giac079,
    author = {Silva, Jorge Miguel and Pratas, Diogo and Caetano, Tânia and Matos, Sérgio},
    title = "{The complexity landscape of viral genomes}",
    journal = {GigaScience},
    volume = {11},
    year = {2022},
    month = {08},
    issn = {2047-217X},
    doi = {10.1093/gigascience/giac079},
    url = {https://doi.org/10.1093/gigascience/giac079},
    note = {giac079},
    eprint = {https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giac079/45332144/giac079.pdf},
}

Requirements

  • Ubunto 18.0 or higher
  • Docker and docker-compose
  • Anaconda
  • Python3.6

ISSUES

Please let us know if there is any issues.

LICENSE

CANVAS is under MIT license. For more information, click here.

About

Complexity Analisys Viral Sequences

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published