Skip to content

Usage: gen_sgRNAs.py

Kathleen Keough edited this page May 20, 2019 · 14 revisions

gen_sgRNAs.py is a script that is designed to take the output of generate_targ_dfs (in preprocessing) and output allele-specific sgRNAs. This page is intended to help users understand the various options available when using gen_sgRNAs.py.

Note: This program requires an annotation file generated by annot_variants.py.

Usage:

gen_sgRNAs.py [-chvrd] <bcf> <annots_file> <locus> <pams_dir> <ref_fasta> <out> <cas_types> <guide_length> [<gene_vars>] [--crispor=<ref_gen>] [--hom] [--bed] [--max_indel=<S>] [--ref_guides] [--min-score=<S>]

Expanded Examples:

Generating allele-specific sgRNAs for both SpCas9 and SaCas9:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 INPUT_sgrnas\
 SpCas9,SaCas9\
 20

Generate allele-specific guides in order-ready format for Synthego (as RNA sequences).

python3 gen_sgRNAs.py\
 -r\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20

Generating personalized non-allele-specific sgRNAs:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20\
 --hom

Generating allele-specific sgRNAs while ignoring INDELs larger than 5bp:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20\
 --max_indel=5

Generating allele-specific sgRNAs with CRISPOR scores

On your first run, you will need to create a conda environment:

cd /path/to/ExcisionFinder/scripts/; conda env create -f conda.yml
python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 INPUT_sgrnas\
 SpCas9\
 20\
 --crispor\
 hg19

where hg19 is a folder with the following files:

hg19.2bit	hg19.fa.amb	hg19.fa.bwt	hg19.fa.sa
hg19.fa		hg19.fa.ann	hg19.fa.pac	hg19.sizes

Arguments and Options:

Arguments: Details
bcf BCF/VCF file with genotypes.
annots_file Annotated variant targetability for allele-specificity.
locus Locus of interest in format chrom:start-stop.
pams_dir Directory where pam locations in the reference genome are located.
ref_genome_fasta Fasta file for reference genome used, e.g. hg38.
out Directory in which to save the output files.
cas_types Cas types you would like to analyze, comma-separated (e.g. SpCas9,SaCas9).
guide_length Guide length, commonly 20 bp, comma-separated if different for different cas types.
gene_vars Optional. Gene variants HDF5 file originating from 1000 Genomes Data, formatted in order to add rsID and allele frequency (AF) data to variants. Pre-generated, download here.
Options: Details
-h or —help Displays information about gen_sgRNAs.py.
-c Do not take the reverse complement of the guide sequence for '-' stranded guides (when the PAM is on the 5' end). Default behavior is to take the reverse complement.
-v Run in verbose mode, nice to track progress or for debugging.
--hom Use 'homozygous' mode, which is basically finding all CRISPR sites (non-allele-specific) in a more personalized way by taking in individual variants.
—crispor=<ref_gen> Add CRISPOR specificity scores to outputted guides. From Haeussler et al. Genome Biology 2016. For more information, see Using CRISPOR with gen_sgRNAs.py. <ref_gen>is the directory name of reference genome (complete) which can be downloaded from UCSC (see wiki). This is required if you specify --crispor
--bed Design sgRNAs for multiple regions specified in a BED file.
--max_indel=<S> Maximum size for INDELS. Must be smaller than guide_length [default: 5].
-r Returns sgRNA sequences as RNA, rather than DNA sequences. Useful for ordering guides.
-d e.g. when variant makes or breaks a PAM. By default, when a position has an allele with no PAM, that allele's sgRNA is returned as a '-'. With -d a poly-G/C sequence is returned, in case a dummy sgRNA sequence is needed for downstream analysis.
-C --cas-list List available cas types and exits.
--ref_guides Design guides for reference genome, ignoring variants in region.
--min-score User may specify minimum predicted CRISPOR specificity score to allow in sgRNAs