Haplotype-based computational genetic mapping (a.k.a HBCGM)
Haplomap is a successor project of HBCGM, as development on the latter was last continued in 2010. Haplomap has been adopted as a replacement for the original HBCGM
Citation:
Zhuoqing Fang, Gary Peltz, An Automated Multi-Modal Graph-Based Pipeline for Mouse Genetic Discovery, Bioinformatics, 2022;, btac356, https://doi.org/10.1093/bioinformatics/btac356
see what's new in the CHANGELOG.
Works both on Linux
and MacOS
Haplomap:
- CMake
- GCC >= 4.8
- clang >= 11.0.3 (only tested with 11.x version)
- C++11
- GSL
For Variant Calling, you need:
- GATK 4.x
- SAMtools
- BCFtools
- BEDtools
- BWA
Running pipeline
- Snakemake
conda install -c bioconda haplomap
- Install GSL first e.g.
Ubuntu
sudo apt-get install libgsl-dev
MacOS
brew install gsl
or compile GSL(makesure that GSL include and lib path is exported)
./configure --prefix=${HOME}/program/gsl
make && make install
# you may need to add this line to your .bashrc
export LD_LIBRARY_PATH="${HOME}/program/gsl/lib:$LD_LIBRARY_PATH"
- build and install to path
cd ${haplomap_repo}
mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/path/to/directory/bin ..
make
See more detail in haplomap
subfolder: Run haplomap standalone
See variant calling using GATK, BCFtools, svtools.
e.g.
# modify the file path in haplomap and run with 12 cores
snakemake -s workflows/bcftools.call.smk --configfile config.yaml \
-k -p -j 12
Mouse Phenome Database have > 10K datasets. Try to configure the files below to run
26720-m
26720-f
9940-f
...
only edit HBCGM
section.
HBCGM:
# working directory
WORKSPACE: "/data/bases/fangzq/MPD/results_drug_diet"
# path to haplomap
BIN: "/home/fangzq/github/HBCGM/build/bin"
# MPD id file, one id per line
TRAIT_IDS: "/data/bases/fangzq/MPD/drug-diet.ids.txt"
# set to true will select individual animal data. Default: use strain means.
USE_RAWDATA: false
# strains metadata: map strain abbrev to full name, jax ids, etc.
# see docs folder to view examples
STRAIN_ANNO: "/data/bases/shared/haplomap/PELTZ_20210609/strains.metadata.csv"
# filtered VCF files after variant calling step
VCF_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VCFs"
# Ensembl-vep output after variant calling step
VEP_DIR: "/data/bases/shared/haplomap/PELTZ_20210609/VEP"
## Optional files
# genetic relation file from PLink output
GENETIC_REL: "/data/bases/shared/haplomap/PELTZ_20210609/mouse54_grm.rel"
# gene expression file
GENE_EXPRS: "/data/bases/shared/haplomap/PELTZ_20210609/mus.compact.exprs.txt"
conda create -n hbcgm -f environment.yaml
source activate hbcgm
# modify the file path in haplomap and run with 24 cores
snakemake -s workflows/haplomap.smk \
--configfile workflows/config.yaml
-k -p -j 24
e.g. Sherlock slurm
- edit
slurm.submit.sh
, change file path toHBCGM/workflows
- edit
workflows/slurm_config.yaml
, specify the resource you need. - submit
sbatch slurm.submit.sh
output explanation, see here: Run haplomap standalone
Email:
- Zhuoqing Fang: fangzq@stanford.edu
- Gary Peltz: gpeltz@stanford.edu
Copyright (C) 2019-2022 Stanford University, Zhuoqing Fang and Gary Peltz.
Authors: Zhuoqing Fang and Gary Peltz.
The original HBCGM (the maximal haplotype construction method) was developed by Dr. David Dill and Dr. Gary Peltz at Stanford.
HBCGM/Halomap is patented to Dr. Gary Peltz.
This program is licensed with commercial restriction use license. Please see the attached LICENSE file for details.