Genomic Annotations

Genomic Annotations is a project aimed at annotating genomic data using data from various databases. The primary objective is to ensure fast annotation to keep up with AI model training speed.

Setup and Installation

To set up and run the project, follow these steps:

Clone the repository:

git clone <repo_url>

Install the required packages. Make sure you have Python 3.6 installed. Use the following command to install the dependencies:

pip install -r requirements.txt

Download the necessary annotation files. The following files are required for each type of annotation:
1. 164 cell type regulation annotations:
  - hg37
  - hg38 was created manually from hg37.
2. Classifications:
  - hg37
  - hg38
3. Regulation regions:
  - hg37
  - hg38
4. Methylation:
  - hg37
  - hg38

Building the annotations databases

Run the following command to initiate the database building process:

To build cell type regulation DB:

python3 build_cell_type_regulation_db.py <inputpath> <outputpath> <hg>

To build classifications DB:

python3 build_classifications_db.py <inputpath> <outputpath> <hg>

To build regulation regions DB:

python3 build_regulatory_regions_db.py <inputpath> <outputpath> <hg>

To build methylation DB:

python3 build_methylation_db.py <inputpath> <outputpath> <hg>

Parameters in all cases:

inputpath: The local path to cell_type_regulation.bed.gz.
outputpath: The local path to save the DB.
hg: {37, 38}. The desired reference genome. Use 37 for hg37 and 38 for hg38.

This commands will process the files and create the necessary databases for annotations in the path you provided.

Usage

To test the 164 cell type regulation annotation speed, run:

python3 runtime_test_cell_type_annotation.py <path> <hg> <numberofsamples=1> <outputforamt=flat> <sample>

To test the classifications annotation speed, run:

python3 runtime_test_classifications_annotation.py <path> <hg> <numberofsamples=1> <outputforamt=flat> <sample>

To test the regulatory regions annotation speed, run:

python3 runtime_test_regulatory_regions_annotation.py <path> <hg> <numberofsamples=1> <outputforamt=flat> <sample>

To test the methylation annotation speed, run:

python3 runtime_test_methylation_annotation.py <path> <hg> <numberofsamples=1> <outputforamt=flat> <sample>

Parameters in all cases:

path: The local path to the cell type regulation DB.
hg: {37, 38}. The desired reference genome. Use 37 for hg37 and 38 for hg38.
numberofsamples: The desired number of randomly generated samples on which to test the runtime. The default value is 1.
outputforamt: {'flat', 'matrix'}. The desired output format of the annotation. Use 'flat' for a one-dimensional feature vector and 'matrix' for a matrix with features for each nucleotide. The default value is 'flat'.
sample: -s chromosome start_pos end_pos flag. (Optional) Specify a specific sample to annotate.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
annotate_cell_type_regulation.py		annotate_cell_type_regulation.py
annotate_classifications.py		annotate_classifications.py
annotate_methylation.py		annotate_methylation.py
annotate_regulatory_regions.py		annotate_regulatory_regions.py
build_cell_type_regulation_db.py		build_cell_type_regulation_db.py
build_classifications_db.py		build_classifications_db.py
build_methylation_db.py		build_methylation_db.py
build_regulatory_regions_db.py		build_regulatory_regions_db.py
consts.py		consts.py
requirements.txt		requirements.txt
runtime_test_cell_type_annotation.py		runtime_test_cell_type_annotation.py
runtime_test_classifications_annotation.py		runtime_test_classifications_annotation.py
runtime_test_methylation_annotation.py		runtime_test_methylation_annotation.py
runtime_test_regulatory_regions_annotation.py		runtime_test_regulatory_regions_annotation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomic Annotations

Setup and Installation

Building the annotations databases

To build cell type regulation DB:

To build classifications DB:

To build regulation regions DB:

To build methylation DB:

Usage

To test the 164 cell type regulation annotation speed, run:

To test the classifications annotation speed, run:

To test the regulatory regions annotation speed, run:

To test the methylation annotation speed, run:

About

Releases 10

Packages

Languages

KerenRozen/genomic_annotations

Folders and files

Latest commit

History

Repository files navigation

Genomic Annotations

Setup and Installation

Building the annotations databases

To build cell type regulation DB:

To build classifications DB:

To build regulation regions DB:

To build methylation DB:

Usage

To test the 164 cell type regulation annotation speed, run:

To test the classifications annotation speed, run:

To test the regulatory regions annotation speed, run:

To test the methylation annotation speed, run:

About

Resources

Stars

Watchers

Forks

Releases 10

Packages 0

Languages

Packages