ECOLE: Learning to call copy number variants on whole exome sequencing data

ECOLE is a deep learning based software that performs CNV call predictions on WES data using read depth sequences.

The manuscript can be found here: ECOLE: Learning to call copy number variants on whole exome sequencing data

The repository with the necessary data and scripts to reproduce the results in the paper can be found here: ECOLE results reproduction

Deep Learning, Copy Number Variation, Whole Exome Sequencing

Authors

Berk Mandiracioglu, Furkan Ozden, Gun Kaynar, M. Alper Yilmaz, Can Alkan, A. Ercument Cicek

Questions & comments

[firstauthorname].[firstauthorsurname]@gmail.com [lastauthorsurname]@cs.bilkent.edu.tr

Installation

ECOLE is a python3 script and it is easy to run after the required packages are installed.

Requirements

For easy requirement handling, you can use ECOLE_environment.yml files to initialize conda environment with requirements installed:

$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Note that the provided environment yml file is for Linux systems. For MacOS users, the corresponding versions of the packages might need to be changed.

Features

ECOLE provides GPU support optionally. See GPU Support section.

Instructions Manual for ECOLE

Important notice: Please call the ECOLE_call.py script from the scripts directory.

Required Arguments

-m, --model

Pretrained models of the paper, one of the following: (1) ecole, (2) ecole-ft-expert, (3) ecole-ft-somatic.

-bs, --batch_size

Batch size to be used to perform CNV call on the samples.

-i, --input

Relative or direct path for are the processed WES samples, including read depth data.

-o, --output

Relative or direct output directory path to write ECOLE output file.

-c, --cnv

Level of resolution you desire, choose one of the options: (1) exonlevel, (2) merged.

-n, --normalize

Relative or direct path for mean&std stats of read depth values to normalize. These values are obtained precalculated from the training dataset before the pretraining.

Optional Arguments

-g, --gpu

Set to PCI BUS ID of the gpu in your system.
You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:

-v, --version

-Check the version of ECOLE.

-h, --help

-See help page.

Usage Example

Usage of ECOLE is very simple!

Step-0: Install conda package management

This project uses conda package management software to create virtual environment and facilitate reproducability.
For Linux users:
Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.
Replace this Anaconda3-version.num-Linux-x86_64.sh with your choice

$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh

Step-1: Set Up your environment.

It is important to set up the conda environment which includes the necessary dependencies.
Please run the following lines to create and activate the environment:

$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Step-2: Run the preprocessing script.

It is necessary to perform preprocessing on WES data samples to obtain read depth and other meta data and make them ready for CNV calling.
Please run the following line:

$ source preprocess_samples.sh

Step-3: Run ECOLE on data obtained in Step-2

Here, we demonstrate an example to run ECOLE on gpu device 0, and obtain exon-level CNV call.
Please run the following script:

$ source ecole_call.sh

You can change the argument parameters within the script to run it on cpu and/or to obtain merged CNV calls.

Output file of ECOLE

At the end of the CNV calling procedure, ECOLE will write its output file to the directory given with -o option. In this tutorial it is ./ecole_calls_output
Output file of ECOLE is a tab-delimited .bed like format.
Columns in the output file of ECOLE are the following with order: 1. Sample Name, 2. Chromosome, 3. CNV Start Index, 4. CNV End Index, 5. ECOLE Prediction
Following figure is an example of ECOLE output file.

Instructions Manual for Finetuning ECOLE

Important notice: Please call the ECOLE_finetune.py script from the scripts directory.

Required Arguments

-bs, --batch_size

Batch size to be used to perform CNV call on the samples.

-i, --input

Relative or direct path for are the processed WES samples, including read depth data.

-o, --output

Relative or direct output directory path to write ECOLE output file.

-n, --normalize

Relative or direct path for mean&std stats of read depth values to normalize. These values are obtained precalculated from the training dataset before the pretraining.

-e, --epochs

The number of epochs the finetuning will be performed.

-lr, --learning_rate

The learning rate to be used in finetuning

-lmp, --load_model_path

The path for the pretrained model weights to be loaded for finetuning

Optional Arguments

-g, --gpu

Set to PCI BUS ID of the gpu in your system.
You can check, PCI BUS IDs of the gpus in your system with various ways. Using gpustat tool check IDs of the gpus in your system like below:

-v, --version

-Check the version of ECOLE.

-h, --help

-See help page.

Finetune Example

We provide an ECOLE Finetuning example with WES sample of NA12891 using only chromosome 21. Step-0 and Step-1 are the same as the ECOLE call example.

Step-0: Install conda package management

This project uses conda package management software to create virtual environment and facilitate reproducability.
For Linux users:
Please take a look at the Anaconda repo archive page, and select an appropriate version that you'd like to install.
Replace this Anaconda3-version.num-Linux-x86_64.sh with your choice

$ wget -c https://repo.continuum.io/archive/Anaconda3-vers.num-Linux-x86_64.sh
$ bash Anaconda3-version.num-Linux-x86_64.sh

Step-1: Set Up your environment.

It is important to set up the conda environment which includes the necessary dependencies.
Please run the following lines to create and activate the environment:

$ conda env create --name ecole_env -f ECOLE_environment.yml
$ conda activate ecole_env

Step-2: Run the preprocessing script for preparing the samples for finetuning.

It is necessary to perform preprocessing on WES data samples to obtain read depth and other meta data and make them ready for ECOLE finetuning.
ECOLE Finetuning requires .bam and ground truth calls as provided under /finetune_example_data. Please see the below image for a sample ground truths format.
Please run the following line:

$ source finetune_preprocess_samples.sh

Step-3: Start ECOLE Finetuning on data obtained in Step-2

Here, we demonstrate an example to run ECOLE Finetuning on gpu device 0.
Please run the following script:

$ source ecole_finetune.sh

You can change the argument parameters within the script to run it on cpu.

Output file of ECOLE

At the end of ECOLE Finetuning, the script will save its model weights file to the directory given with -o option. In this tutorial it is ./ecole_finetuned_model_weights

Citations

License

CC BY-NC-SA 2.0
For commercial usage, please contact.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
example_data		example_data
finetune_example_data		finetune_example_data
models		models
scripts		scripts
.gitignore		.gitignore
ECOLE_environment.yml		ECOLE_environment.yml
README.md		README.md
ecole_call.sh		ecole_call.sh
ecole_finetune.sh		ecole_finetune.sh
ecole_stats.txt		ecole_stats.txt
example_output.png		example_output.png
finetune_ground_truths.png		finetune_ground_truths.png
finetune_output.png		finetune_output.png
finetune_preprocess_samples.sh		finetune_preprocess_samples.sh
hglft_genome_64dc_dcbaa0.bed		hglft_genome_64dc_dcbaa0.bed
preprocess_samples.sh		preprocess_samples.sh

ciceklab/ECOLE

Folders and files

Latest commit

History

Repository files navigation

ECOLE: Learning to call copy number variants on whole exome sequencing data

Authors

Questions & comments

Table of Contents

Installation

Requirements

Note that the provided environment yml file is for Linux systems. For MacOS users, the corresponding versions of the packages might need to be changed.

Features

Instructions Manual for ECOLE

Required Arguments

-m, --model

-bs, --batch_size

-i, --input

-o, --output

-c, --cnv

-n, --normalize

Optional Arguments

-g, --gpu

-v, --version

-h, --help

Usage Example

Step-0: Install conda package management

Step-1: Set Up your environment.

Step-2: Run the preprocessing script.

Step-3: Run ECOLE on data obtained in Step-2

Output file of ECOLE

Instructions Manual for Finetuning ECOLE

Required Arguments

-bs, --batch_size

-i, --input

-o, --output

-n, --normalize

-e, --epochs

-lr, --learning_rate

-lmp, --load_model_path

Optional Arguments

-g, --gpu

-v, --version

-h, --help

Finetune Example

Step-0: Install conda package management

Step-1: Set Up your environment.

Step-2: Run the preprocessing script for preparing the samples for finetuning.

Step-3: Start ECOLE Finetuning on data obtained in Step-2

Output file of ECOLE

Citations

License

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages