-
Notifications
You must be signed in to change notification settings - Fork 5
Usage: gen_sgRNAs.py
Kathleen Keough edited this page May 20, 2019
·
14 revisions
gen_sgRNAs.py is a script that is designed to take the output of generate_targ_dfs (in preprocessing) and output allele-specific sgRNAs. This page is intended to help users understand the various options available when using gen_sgRNAs.py.
Note: This program requires an annotation file generated by annot_variants.py.
gen_sgRNAs.py [-chvrd] <bcf> <annots_file> <locus> <pams_dir> <ref_fasta> <out> <cas_types> <guide_length> [<gene_vars>] [--crispor=<ref_gen>] [--hom] [--bed] [--max_indel=<S>] [--ref_guides] [--min-score=<S>]
python3 gen_sgRNAs.py\
INPUT.vcf.gz\
INPUT_annots.hdf5\
1:11980181-12013515\
hg19_pams\
chr1.fa\
INPUT_sgrnas\
SpCas9,SaCas9\
20
python3 gen_sgRNAs.py\
-r\
INPUT.vcf.gz\
INPUT_annots.hdf5\
1:11980181-12013515\
hg19_pams\
chr1.fa\
OUTPUT_sgrnas\
SpCas9\
20
python3 gen_sgRNAs.py\
INPUT.vcf.gz\
INPUT_annots.hdf5\
1:11980181-12013515\
hg19_pams\
chr1.fa\
OUTPUT_sgrnas\
SpCas9\
20\
--hom
python3 gen_sgRNAs.py\
INPUT.vcf.gz\
INPUT_annots.hdf5\
1:11980181-12013515\
hg19_pams\
chr1.fa\
OUTPUT_sgrnas\
SpCas9\
20\
--max_indel=5
On your first run, you will need to create a conda environment:
cd /path/to/ExcisionFinder/scripts/; conda env create -f conda.yml
python3 gen_sgRNAs.py\
INPUT.vcf.gz\
INPUT_annots.hdf5\
1:11980181-12013515\
hg19_pams\
chr1.fa\
INPUT_sgrnas\
SpCas9\
20\
--crispor\
hg19
where hg19 is a folder with the following files:
hg19.2bit hg19.fa.amb hg19.fa.bwt hg19.fa.sa
hg19.fa hg19.fa.ann hg19.fa.pac hg19.sizes
Arguments: | Details |
---|---|
bcf |
BCF/VCF file with genotypes. |
annots_file |
Annotated variant targetability for allele-specificity. |
locus |
Locus of interest in format chrom:start-stop. |
pams_dir |
Directory where pam locations in the reference genome are located. |
ref_genome_fasta |
Fasta file for reference genome used, e.g. hg38. |
out |
Directory in which to save the output files. |
cas_types |
Cas types you would like to analyze, comma-separated (e.g. SpCas9,SaCas9). |
guide_length |
Guide length, commonly 20 bp, comma-separated if different for different cas types. |
gene_vars |
Optional. Gene variants HDF5 file originating from 1000 Genomes Data, formatted in order to add rsID and allele frequency (AF) data to variants. Pre-generated, download here. |
Options: | Details |
---|---|
-h or —help
|
Displays information about gen_sgRNAs.py . |
-c |
Do not take the reverse complement of the guide sequence for '-' stranded guides (when the PAM is on the 5' end). Default behavior is to take the reverse complement. |
-v |
Run in verbose mode, nice to track progress or for debugging. |
--hom |
Use 'homozygous' mode, which is basically finding all CRISPR sites (non-allele-specific) in a more personalized way by taking in individual variants. |
—crispor=<ref_gen> |
Add CRISPOR specificity scores to outputted guides. From Haeussler et al. Genome Biology 2016. For more information, see Using CRISPOR with gen_sgRNAs.py. <ref_gen> is the directory name of reference genome (complete) which can be downloaded from UCSC (see wiki). This is required if you specify --crispor |
--bed |
Design sgRNAs for multiple regions specified in a BED file. |
--max_indel=<S> |
Maximum size for INDELS. Must be smaller than guide_length [default: 5]. |
-r |
Returns sgRNA sequences as RNA, rather than DNA sequences. Useful for ordering guides. |
-d |
e.g. when variant makes or breaks a PAM. By default, when a position has an allele with no PAM, that allele's sgRNA is returned as a '-'. With -d a poly-G/C sequence is returned, in case a dummy sgRNA sequence is needed for downstream analysis. |
-C --cas-list |
List available cas types and exits. |
--ref_guides |
Design guides for reference genome, ignoring variants in region. |
--min-score |
User may specify minimum predicted CRISPOR specificity score to allow in sgRNAs |
AlleleAnalyzer. Keough et al. 2019, Genome Biology.