Usage: gen_sgRNAs.py

gen_sgRNAs.py is a script that is designed to take the output of generate_targ_dfs (in preprocessing) and output allele-specific sgRNAs. This page is intended to help users understand the various options available when using gen_sgRNAs.py.

Note: This program requires an annotation file generated by annot_variants.py.

Usage:

gen_sgRNAs.py [-chvrd] <bcf> <annots_file> <locus> <pams_dir> <ref_fasta> <out> <cas_types> <guide_length> [<gene_vars>] [--crispor=<ref_gen>] [--hom] [--bed] [--max_indel=<S>] [--ref_guides] [--min-score=<S>]

Expanded Examples:

Generating allele-specific sgRNAs for both SpCas9 and SaCas9:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 INPUT_sgrnas\
 SpCas9,SaCas9\
 20

Generate allele-specific guides in order-ready format for Synthego (as RNA sequences).

python3 gen_sgRNAs.py\
 -r\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20

Generating personalized non-allele-specific sgRNAs:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20\
 --hom

Generating allele-specific sgRNAs while ignoring INDELs larger than 5bp:

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 OUTPUT_sgrnas\
 SpCas9\
 20\
 --max_indel=5

Generating allele-specific sgRNAs with CRISPOR scores

On your first run, you will need to create a conda environment:

cd /path/to/ExcisionFinder/scripts/; conda env create -f conda.yml

python3 gen_sgRNAs.py\
 INPUT.vcf.gz\
 INPUT_annots.hdf5\
 1:11980181-12013515\
 hg19_pams\
 chr1.fa\
 INPUT_sgrnas\
 SpCas9\
 20\
 --crispor\
 hg19

where hg19 is a folder with the following files:

hg19.2bit	hg19.fa.amb	hg19.fa.bwt	hg19.fa.sa
hg19.fa		hg19.fa.ann	hg19.fa.pac	hg19.sizes

Arguments and Options:

Arguments:	Details
`bcf`	BCF/VCF file with genotypes.
`annots_file`	Annotated variant targetability for allele-specificity.
`locus`	Locus of interest in format chrom:start-stop.
`pams_dir`	Directory where pam locations in the reference genome are located.
`ref_genome_fasta`	Fasta file for reference genome used, e.g. hg38.
`out`	Directory in which to save the output files.
`cas_types`	Cas types you would like to analyze, comma-separated (e.g. SpCas9,SaCas9).
`guide_length`	Guide length, commonly 20 bp, comma-separated if different for different cas types.
`gene_vars`	Optional. Gene variants HDF5 file originating from 1000 Genomes Data, formatted in order to add rsID and allele frequency (AF) data to variants. Pre-generated, download here.

Options:	Details
`-h` or `—help`	Displays information about `gen_sgRNAs.py`.
`-c`	Do not take the reverse complement of the guide sequence for '-' stranded guides (when the PAM is on the 5' end). Default behavior is to take the reverse complement.
`-v`	Run in verbose mode, nice to track progress or for debugging.
`--hom`	Use 'homozygous' mode, which is basically finding all CRISPR sites (non-allele-specific) in a more personalized way by taking in individual variants.
`—crispor=<ref_gen>`	Add CRISPOR specificity scores to outputted guides. From Haeussler et al. Genome Biology 2016. For more information, see Using CRISPOR with gen_sgRNAs.py. `<ref_gen>`is the directory name of reference genome (complete) which can be downloaded from UCSC (see wiki). This is required if you specify --crispor
`--bed`	Design sgRNAs for multiple regions specified in a BED file.
`--max_indel=<S>`	Maximum size for INDELS. Must be smaller than guide_length [default: 5].
`-r`	Returns sgRNA sequences as RNA, rather than DNA sequences. Useful for ordering guides.
`-d`	e.g. when variant makes or breaks a PAM. By default, when a position has an allele with no PAM, that allele's sgRNA is returned as a '-'. With `-d` a poly-G/C sequence is returned, in case a dummy sgRNA sequence is needed for downstream analysis.
`-C --cas-list`	List available cas types and exits.
`--ref_guides`	Design guides for reference genome, ignoring variants in region.
`--min-score`	User may specify minimum predicted CRISPOR specificity score to allow in sgRNAs

AlleleAnalyzer. Keough et al. 2019, Genome Biology.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly