Skip to content

Usage: annot_variants.py

Kathleen Keough edited this page May 22, 2018 · 3 revisions

annot_variants.py generates a dataframe that stores annotations for each variant in the specified locus and genome that tell us whether the variant generates allele-specific sgRNA sites for the Cas variety/varieties specified.

Note: This program requires a gens file generated by get_gens_dfs.py.

Usage:

annot_variants.py [-v] <gens_file> <cas> <pams_dir> <ref_genome_fasta> <out> [--guide_len=<S>]

annot_variants.py -C | --cas-list

Expanded Examples:

Generating a single annotation file for SpCas9 and SaCas9:

python3 annot_variants.py\
 INPUT_gens.h5\
 SpCas9,SaCas9\
 hg19_pams/\
 chr1.fa\
 OUTPUT_annot

Where hg19_pams/ is a directory containing the following files:

chr1_SaCas9_pam_sites_for.npy		chr1_SpCas9_pam_sites_for.npy
chr1_SaCas9_pam_sites_rev.npy		chr1_SpCas9_pam_sites_rev.npy

pam_sites.npy files for hg19 and hg38 can be downloaded from here.

List avalible cas enzymes.

python3 annot_variants.py --cas-list\

Arguments and Options:

Arguments: Details
gens_file Explicit genotypes file generated by get_chr_tables.py For more info, see this wiki page.
cas One or more cas enzymes to use, comma-separated.
pams_dir Directory where pam locations in the reference genome are located. pam_sites.npy files for hg19 and hg38 can be downloaded from here.
ref_genome_fasta Fasta file for reference genome used, e.g. hg38.
out Directory in which to save the output files.
Options: Details
-h or —help Displays information about annot_variants.py.
-c or --cas-list List cas enzymes avalible for analysis.
--guide_len Guide length (default=20)