Pangenie is a software package written by Jana Ebler to genotype fastq files. Specifically, it uses a kmer based approach, it takes a reference pangenome (whose paths are represented as haplotypes in a vcf) and determines which of the vcf pangenome's variations are contained in your sample of interest.
This stub repository contains the scripts I used to generate a Pangenie-compatible vcf representation of the HPRC draft human pangenome. This work was performed in August 2022. To replicate, please clone this repository, ensure that vcfbub and bcftools are installed, and place the following files in the repository folder:
June 2022 working draft pangenome in gfa format (MC-Cactus)
- note: unzip this file using the command gzip -d hprc-jun1-mc-chm13-full.gfa.gz before running script
June 2022 working draft pangenome in vcf format (MC-Cactus)
Then run:
chmod +x create_panel_vcf.sh
./create_panel_vcf.sh
The resulting pangenie-compatible pangenome in vcf format can be found at this Zenodo repository.