Skip to content

find sites

Brent Pedersen edited this page Jan 5, 2022 · 1 revision

Create a new set of sites given a population VCF. Usually, the sites distributed with somalier will work but this can be used to create, for example a set of sites that's optimal for a specific ancestry or new genome build.

Usage

  somalier find-sites [options] vcf

Arguments:
  vcf              population VCF to use to find sites

Options:
  -x, --exclude=EXCLUDE      optional exclude files
  -i, --include=INCLUDE      optional include file. only consider variants that fall in ranges within this file
  --gnotate-exclude=GNOTATE_EXCLUDE
                             sites in slivar gnotation (zip) format to exclude
  --snp-dist=SNP_DIST        minimum distance between autosomal SNPs to avoid linkage (default: 10000)
  --min-AN=MIN_AN            minimum number of alleles (AN) at the site. (must be less than twice number of samples in the cohort) (default: 115_000)
  --min-AF=MIN_AF            minimum allele frequency for a site (default: 0.15)
  -h, --help                 Show this help
this will write output sites to: ./sites.vcf.gz
Clone this wiki locally