Skip to content

Usage: get_gens_dfs.py

Kathleen Keough edited this page May 22, 2018 · 6 revisions

get_gens_dfs.py generates a table (tsv file) listing all variants in a defined interval for a specified individual (based on input VCF file). This basically reformats genotypes from VCF for easier processing later when designing sgRNAs.

Usage:

get_gens_dfs.py <vcf_file> <locus> <out> [-fv] [--bed] [--chrom]

Expanded Examples:

Producing a gens file for one locus:

python3 get_gens_df.py\
 INPUT.vcf.gz\
 1:11980181-12013515\
 OUT_GENS

Producing a gens file from several loci:

python3 get_gens_df.py\
 INPUT.vcf.gz\
 loci.bed\
 OUT_multi_loci_gens\
 --bed

where the loci.bed file is formated like so:

1	11976269	12018380	MFN2
7	76298036	76308038	HSPB1
11	61940001	61963675	BEST1

Arguments and Options:

Arguments: Details
vcf_file BCF/VCF file with genotypes. Files should be gzipped (using bcftools or bgzip ) and include an index (using bcftools or tabix).
locus Locus from which to pull variants, in format chromosome:start-stop, or a BED file if --bed is specified.
out The name for the output file and directory in which to save the output files. The output is an .h5 file. Do not include the extention.
Options: Details
-f If this option is specified, keeps homozygous variants in output file.
-v Verbose mode.
--bed Indicates that a BED file is being used in place of a locus. BED files are expected to include the CHROM, START, STOP, and ID column.
--chrom Run on entire chromosome.