GitHub - vgteam/vg_wdl: Workflow Description Language (WDL) scripts for common vg workflows

vg_wdl

Eric T Dawson, Mike Lin and Charles Markello, Jean Monlong, Adam Novak MIT License, 2023

Workflow Description Language (WDL) scripts for vg workflows.

Workflows
Usage
Testing locally
Citation
Contributing, Help, Bugs and Requests

Workflows

Giraffe-DeepVariant workflows. Either the full Giraffe-DeepVariant workflow, or parts of it are available:
- Giraffe-DeepVariant workflow to perform the full workflow: starting from FASTQs/CRAM, align reads to a pangenome and run DeepVariant.
- Giraffe workflow to map reads and produce BAMs ready to use by DeepVariant.
- Giraffe-DeepVariant from GAF workflow to project reads aligned to a pangenome (GAF), prepare them and run DeepVariant.
Happy workflow to evaluate small variants against a truthset using hap.py/vcfeval.
GAF to sorted GAM workflow to convert a GAF into a sorted and indexed GAM. E.g. to use with the sequenceTubeMap.
Giraffe SV workflow to map short reads to a pangenome and genotype SVs with vg.
Haplotype Sampling workflow to create a personalized pangenome using haplotype sampling
Map-call workflow to map reads and call small variants vg, DeepVariant and GATK (legacy?).
Map-call Pedigree workflow to map reads and call variants in a pedigree with vg (legacy?).

See also the Going further section for more details on some aspects and HOW-TOs:

Path list
Read realignment
Using the HPRC pangenomes
Reference prefix removal
CRAM input
Single-end reads
Unmapped reads
Reads chunking

Giraffe-DeepVariant workflow

The full workflow to go from sequencing reads (FASTQs, CRAM) to small variant calls (VCF).

workflow file: workflows/giraffe_and_deepvariant.wdl
Dockstore page
If you use this workflow, please cite the HPRC preprint.

Parameters (semi-auto-generated from the parameter_meta section):

INPUT_READ_FILE_1: Input sample 1st read pair fastq.gz
INPUT_READ_FILE_2: Input sample 2nd read pair fastq.gz
INPUT_CRAM_FILE: Input CRAM file
CRAM_REF: Genome fasta file associated with the CRAM file
CRAM_REF_INDEX: Index of the fasta file associated with the CRAM file
GBZ_FILE: Path to .gbz index file
DIST_FILE: Path to .dist index file
MIN_FILE: Path to .min index file
SAMPLE_NAME: The sample name
OUTPUT_GAF: Should a GAF file with the aligned reads be saved? Default is 'true'.
OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
PAIRED_READS: Are the reads paired? Default is 'true'.
READS_PER_CHUNK: Number of reads contained in each mapping chunk. Default 20 000 000.
PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE is set
LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
MIN_MAPQ: Minimum MAPQ of reads to use for calling. 4 is the lowest at which a mapping is more likely to be right than wrong. Default is 1
MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
GIRAFFE_OPTIONS: (OPTIONAL) extra command line options for Giraffe mapper
TRUTH_VCF: (OPTIONAL) Path to .vcf.gz to compare against
TRUTH_VCF_INDEX: (OPTIONAL) Path to Tabix index for TRUTH_VCF
EVALUATION_REGIONS_BED: (OPTIONAL) BED to restrict comparison against TRUTH_VCF to
DV_MODEL_META: (OPTIONAL) .meta file for a custom DeepVariant calling model
DV_MODEL_INDEX: (OPTIONAL) .index file for a custom DeepVariant calling model
DV_MODEL_DATA: (OPTIONAL) .data-00000-of-00001 file for a custom DeepVariant calling model
DV_KEEP_LEGACY_AC: Should DV use the legacy allele counter behavior? Default is 'true'.
DV_NORM_READS: Should DV normalize reads itself? Default is 'false'.
OTHER_MAKEEXAMPLES_ARG: Additional arguments for the make_examples step of DeepVariant
SPLIT_READ_CORES: Number of cores to use when splitting the reads into chunks. Default is 8.
MAP_CORES: Number of cores to use when mapping the reads. Default is 16.
MAP_MEM: Memory, in GB, to use when mapping the reads. Default is 120.
CALL_CORES: Number of cores to use when calling variants. Default is 8.
CALL_MEM: Memory, in GB, to use when calling variants. Default is 50.

Related topics: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes.

Test locally with:

miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant.json
miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant_single_end.json
miniwdl run --as-me workflows/giraffe_and_deepvariant.wdl -i params/giraffe_and_deepvariant_cram.json

Giraffe workflow

Core VG Giraffe mapping, usable for DeepVariant. Reads are mapped to a pangenome with vg giraffe and pre-processed (e.g. indel realignment).

workflow file: workflows/giraffe.wdl
Dockstore page
If you use this workflow, please cite the HPRC preprint.

Parameters (semi-auto-generated from the parameter_meta section):

INPUT_READ_FILE_1: Input sample 1st read pair fastq.gz
INPUT_READ_FILE_2: Input sample 2nd read pair fastq.gz
INPUT_CRAM_FILE: Input CRAM file
CRAM_REF: Genome fasta file associated with the CRAM file
CRAM_REF_INDEX: Index of the fasta file associated with the CRAM file
GBZ_FILE: Path to .gbz index file
DIST_FILE: Path to .dist index file
MIN_FILE: Path to .min index file
SAMPLE_NAME: The sample name
OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
OUTPUT_CALLING_BAMS: Should individual contig BAMs be saved? Default is 'false'.
OUTPUT_GAF: Should a GAF file with the aligned reads be saved? Default is 'false'.
PAIRED_READS: Are the reads paired? Default is 'true'.
READS_PER_CHUNK: Number of reads contained in each mapping chunk. Default 20 000 000.
PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE i
LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
GIRAFFE_OPTIONS: (OPTIONAL) extra command line options for Giraffe mapper
SPLIT_READ_CORES: Number of cores to use when splitting the reads into chunks. Default is 8.
MAP_CORES: Number of cores to use when mapping the reads. Default is 16.
MAP_MEM: Memory, in GB, to use when mapping the reads. Default is 120.
HAPLOTYPE_SAMPLING: Whether or not to use haplotype sampling before running giraffe. Default is 'true'
IN_DIPLOID:Whether or not to use diploid sampling while doing haplotype sampling. Has to use with Haplotype_sampling=true. Default is 'true'
HAPL_FILE: (OPTIONAL) Path to .hapl file used in haplotype sampling
R_INDEX_FILE: (OPTIONAL) Path to .ri file used in haplotype sampling
IN_KFF_FILE: (OPTIONAL) Path to .kff file used in haplotype sampling
IN_HAPLOTYPE_NUMBER: Number of generated synthetic haplotypes used in haplotype sampling. (Default: 4)

Related topics: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes, Haplotype Sampling.

Test locally with:

miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.json
miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.singleended.json
miniwdl run --as-me workflows/giraffe.wdl -i params/giraffe.singleended.cram.json
miniwdl run --as-me workflows/giraffe.wdl -i params/haplotype_sampling_and_giraffe.json

Giraffe-DeepVariant from GAF workflow

Surject a GAF and prepare the BAMs (e.g. fix names, indel realign), and call small variants with DeepVariant.

workflow file: workflows/giraffe_and_deepvariant_fromGAF.wdl
Dockstore page
If you use this workflow, please cite the HPRC preprint.

Parameters (semi-auto-generated from the parameter_meta section):

INPUT_GAF: Input gzipped GAF file
GBZ_FILE: Path to .gbz index file
SAMPLE_NAME: The sample name
OUTPUT_SINGLE_BAM: Should a single merged BAM file be saved? If yes, unmapped reads will be inluded and 'calling bams' (one per contig) won't be outputed. Default is 'true'.
PAIRED_READS: Are the reads paired? Default is 'true'.
PATH_LIST_FILE: (OPTIONAL) Text file where each line is a path name in the GBZ index, to use instead of CONTIGS. If neither is given, paths are extracted from the GBZ and subset to chromosome-looking paths.
CONTIGS: (OPTIONAL) Desired reference genome contigs, which are all paths in the GBZ index.
REFERENCE_PREFIX: Remove this off the beginning of path names in surjected BAM (set to match prefix in PATH_LIST_FILE)
REFERENCE_FILE: (OPTIONAL) If specified, use this FASTA reference instead of extracting it from the graph. Required if the graph does not contain all bases of the reference.
REFERENCE_INDEX_FILE: (OPTIONAL) If specified, use this .fai index instead of indexing the reference file.
REFERENCE_DICT_FILE: (OPTIONAL) If specified, use this pre-computed .dict file of sequence lengths. Required if REFERENCE_INDEX_FILE is set
LEFTALIGN_BAM: Whether or not to left-align reads in the BAM. Default is 'true'.
REALIGN_INDELS: Whether or not to realign reads near indels. Default is 'true'.
REALIGNMENT_EXPANSION_BASES: Number of bases to expand indel realignment targets by on either side, to free up read tails in slippery regions. Default is 160.
MIN_MAPQ: Minimum MAPQ of reads to use for calling. 4 is the lowest at which a mapping is more likely to be right than wrong. Default is 1
MAX_FRAGMENT_LENGTH: Maximum distance at which to mark paired reads properly paired. Default is 3000.
DV_MODEL_META: (OPTIONAL) .meta file for a custom DeepVariant calling model
DV_MODEL_INDEX: (OPTIONAL) .index file for a custom DeepVariant calling model
DV_MODEL_DATA: (OPTIONAL) .data-00000-of-00001 file for a custom DeepVariant calling model
DV_KEEP_LEGACY_AC: Should DV use the legacy allele counter behavior? Default is 'true'.
DV_NORM_READS: Should DV normalize reads itself? Default is 'fasle'.
OTHER_MAKEEXAMPLES_ARG: Additional arguments for the make_examples step of DeepVariant
VG_CORES: Number of cores to use when projecting the reads. Default is 16.
VG_MEM: Memory, in GB, to use when projecting the reads. Default is 120.
CALL_CORES: Number of cores to use when calling variants. Default is 8.
CALL_MEM: Memory, in GB, to use when calling variants. Default is 50.

Related topics: read realignment, reference prefix removal, path list, single-end reads, unmapped reads, HPRC pangenomes.

Test locally with:

miniwdl run --as-me workflows/giraffe_and_deepvariant_fromGAF.wdl -i params/giraffe_and_deepvariant_gaf.json
miniwdl run --as-me workflows/giraffe_and_deepvariant_fromGAF.wdl -i params/giraffe_and_deepvariant_gaf_single_end.json

Happy workflow

Evaluation of the small variant calls using hap.py.

workflow file: workflows/happy_evaluation.wdl
Dockstore page

Parameters (semi-auto-generated from the parameter_meta section):

VCF: bgzipped VCF with variant calls
VCF_INDEX: (Optional) If specified, use this tabix index for the VCF instead of indexing it
TRUTH_VCF: bgzipped VCF with truthset
TRUTH_VCF_INDEX: (Optional) If specified, use this index for the truth VCF instead of indexing it
REFERENCE_FILE: Use this FASTA reference.
REFERENCE_INDEX_FILE: (Optional) If specified, use this .fai index instead of indexing the reference file.
EVALUATION_REGIONS_BED: (Optional) BED to restrict comparison against TRUTH_VCF to
REFERENCE_PREFIX: (Optional) Remove this off the beginning of sequence names in the VCF

Test locally with:

miniwdl run --as-me workflows/happy_evaluation.wdl -i params/happy_evaluation.json

GAF to sorted GAM workflow

Currently, only GAM file can be sorted and indexed, for example to extract and subgraph and visualize, or use with the sequenceTubeMap. This workflow converts reads aligned to a pangenome in a GAF file to a sorted and indexed GAM file.

workflow file: workflows/sort_graph_aligned_reads.wdl
Dockstore page

Parameters (semi-auto-generated from the parameter_meta section):

GAF_FILE: GAF file to convert and sort.
GBZ_FILE: the GBZ index of the graph
SAMPLE_NAME: (Optional) a sample name

Giraffe SV workflow

Workflow for mapping short reads and genotyping the structural variants in a pangenome.

workflow file
parameter file
Dockstore page
If you use this workflow, please cite the Giraffe-SV article.

Haplotype Sampling workflow

Workflow for creating a personalized pangenome with haplotype sampling.

Parameters (semi-auto-generated from the parameter_meta section):

IN_GBZ_FILE: Path to .gbz index file
INPUT_READ_FILE_FIRST: Input sample 1st read pair fastq.gz
INPUT_READ_FILE_SECOND: Input sample 2st read pair fastq.gz
HAPL_FILE: Path to .hapl file
IN_DIST_FILE: Path to .dist file
R_INDEX_FILE: Path to .ri file
KFF_FILE: Path to .kff file
IN_OUTPUT_NAME_PREFIX: Name of the output file (Default: haplotype_sampled_graph)
IN_KMER_LENGTH: Size of kmer using for sampling (Up to 31) (Default: 29)
CORES: Number of cores to use with commands. (Default: 16)
WINDOW_LENGTH: Window length used for building the minimizer index. (Default: 11)
SUBCHAIN_LENGTH: Target length (in bp) for subchains. (Default: 10000)
HAPLOTYPE_NUMBER: Number of generated synthetic haplotypes. (Default: 4)
PRESENT_DISCOUNT: Multiplicative factor for discounting scores for present kmers. (Default: 0.9)
HET_ADJUST: Additive term for adjusting scores for heterozygous kmers. (Default: 0.05)
ABSENT_SCORE: Score for absent kmers. (Default: 0.8)
INCLUDE_REFERENCE: Include reference paths and generic paths from the full graph in the sampled graph. (Default: true)
DIPLOID: Activate diploid sampling. (Default: true)

Test locally with:

miniwdl run --as-me workflows/haplotype_sampling.wdl -i params/haplotype_sampling.json

Map-call workflow

Map-call Pedigree workflow

workflow file
parameter file
If you use this workflow, please cite the Pedigree-VG article.

Going further

See below more information about: read realignment, reference prefix removal, CRAM input, reads chunking, path list, single-end reads, unmapped reads, HPRC pangenomes.

Read realignment

Once the reads are projected to a linear reference, we've noticed that realigning the reads can improve the variant calling with DeepVariant. This helps mostly for the small insertions-deletions (indels).

The full realignment process involves:

Leftaligning the reads with freebayes' bamleftalign.
- Can be enabled/disabled with the LEFTALIGN_BAM parameter
Identifying regions to realign further with GATK RealignerTargetCreator.
Expand those regions with bedtools.
- Number of bases to expand controlled by the REALIGNMENT_EXPANSION_BASES parameter.
Realigning the reads in those regions with ABRA2.

The last 3 steps can be enabled/disabled with the REALIGN_INDELS parameter.

Although it produces the best variant calls, these extra steps increase the computational resources (and cost) of the workflow. For a lighter run, switch off those two realignment steps and use DeepVariant's integrated realigner instead with:

LEFTALIGN_BAM=false
REALIGN_INDELS=false
DV_NORM_READS=true

Reference prefix removal

The names of contigs/paths/haplotypes in pangenomes sometimes contains a prefix that we'd want to remove. In the HPRC pangenomes, for example, the chromosomal contigs from GRCh38 are named GRCh38.chr1, etc. In practice, we want to remove this prefix from the variant calls (VCFs), or reads aligned to that reference (BAMs).

This is controlled by the REFERENCE_PREFIX parameters in the workflows. Setting REFERENCE_PREFIX="GRCh38." for example will ensure the VCFs/BAMs have chr1, etc. for contig names.

Because the pangenome uses them, the prefix must still be present when specifying the paths to project the reads too though. Hence, the CONTIGS and PATH_LIST_FILE must use the prefix.

However, provided reference FASTAs or dictionary must not have the prefix. These could be FASTA or .dict files from the "official" reference genome or pre-computed for them, hence no prefix. So, no prefix in REFERENCE_FILE, REFERENCE_INDEX_FILE, REFERENCE_DICT_FILE.

CRAM input

When the input is a CRAM file (INPUT_CRAM_FILE) instead of a pair of FASTQ files (INPUT_READ_FILE_1/INPUT_READ_FILE_2), the user must also provide the appropriate reference FASTA to work with that CRAM file with CRAM_REF, and its index with CRAM_REF_INDEX.

The CRAM file will be converted back to a pair of FASTQs, so it costs a little bit more to analyze CRAMs than FASTQs currently.

Reads chunking

Sequencing reads are chunked to parallelize read mapping. The amount of chunking is controlled by the READS_PER_CHUNK parameter which specify how many reads each chunk should have. For a WGS experiment, we use chunks of about 20M reads.

Path list

We might not always want to project the reads alignments to all the paths in the pangenome. For example, we might only care about alignment to chromosomes and not alternate contigs. Or there might be multiple sets of paths like in the CHM13-based HPRC pangenome which contains both reference paths for CHM13 and GRCh38. In that case, we can specify a list of paths to project the reads to using one of the following.

PATH_LIST_FILE is a file which lists the paths names, one per line. For the HPRC pangenomes it looks like:

GRCh38.chr1
GRCh38.chr2
GRCh38.chr3
...etc

Otherwise, paths can be listed in the CONTIGS parameter as a list (WDL array).

Single-end reads

Workflows expect paired-end reads, but some workflows can also analyze single-end reads.

To use single-end reads:

If providing FASTQs, only provide INPUT_READ_FILE_1 (no INPUT_READ_FILE_2).
Use PAIRED_READS=false

Unmapped reads

If including unmapped reads in the BAMs is important, make sure to switch on OUTPUT_SINGLE_BAM=true in the Giraffe-DeepVariant workflow and Giraffe workflow.

HPRC pangenomes

We recommend using the filtered CHM13-based pangenome (freeze 1). It contains both the CHM13 and GRCh38 reference paths.

Use the following indexes for the pangenome:

GBZ_FILE: hprc-v1.0-mc-chm13-minaf.0.1.gbz
- GBZ with the pangenome and haplotypes.
MIN_FILE: hprc-v1.0-mc-chm13-minaf.0.1.min
- Minimizer index.
DIST_FILE: hprc-v1.0-mc-chm13-minaf.0.1.dist
- Distance index.

To project reads and call variants relative to the GRCh38 reference:

REFERENCE_PREFIX="GRCh38."
PATH_LIST_FILE containing GRCh38.chr1, GRCh38.chr2, etc. File available at GRCh38.path_list.txt
REFERENCE_FILE: hg38.fa
REFERENCE_INDEX_FILE: hg38.fa.fai. Optional, the workflow will create it if necessary (for a small extra cost/time).
REFERENCE_DICT_FILE: hg38.dict. Optional, the workflow will create it if necessary (for a small extra cost/time).

To project reads and call variants relative to the CHM13 reference:

REFERENCE_PREFIX="CHM13."
PATH_LIST_FILE containing CHM13.chr1, CHM13.chr2, etc. File available at CHM13.path_list.txt
REFERENCE_FILE: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.fa
REFERENCE_INDEX_FILE: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.fa.fai. Optional, the workflow will create it if necessary (for a small extra cost/time).
REFERENCE_DICT_FILE: chm13v2.0.plus_hs38d1_analysis_set.compact_decoys.dict. Optional, the workflow will create it if necessary (for a small extra cost/time).

For earlier versions of DeepVariant (<1.5), models were retrained using reads aligned to the HPRC pangenomes. The corresponding model files were deposited at: https://s3-us-west-2.amazonaws.com/human-pangenomics/index.html?prefix=publications/PANGENOME_2022/DeepVariant/models/DEEPVARIANT_MC_Y1/. They can be passed to the workflows using the DV_MODEL_META, DV_MODEL_INDEX, and DV_MODEL_DATA. Note that it is not necessary to use custom models in the latest version of the workflows as DeepVariant v1.5 includes default models suited for analyzing reads mapped to pangenomes (and projected back to a linear reference).

Usage

Dockstore

The workflows that were deposited on Dockstore can be launched using its command line or on platform like Terra.

Using miniwdl

Install miniwdl, for example, with pip:

pip3 install miniwdl

Clone this repo somewhere with git clone https://github.com/vgteam/vg_wdl.git

Run a workflow using:

miniwdl run /path/to/vg_wdl/workflows/WORKFLOW.wdl -i your-inputs.json

To modify the input parameters, edit the input .json with the necessary changes.

Using Cromwell

Cromwell can be run WDL workflows with:

java -jar $CROMWELL_JAR run workflow.wdl -i inputs.json

Where CROMWELL_JAR points at the Cromwell jar downloaded their release page, for example set with CROMWELL_JAR=/path/to/cromwell-<whatever>.jar in your shell.

To run one of the workflows in this repo, clone the repo somewhere with git clone https://github.com/vgteam/vg_wdl.git and run the desired workflow .wdl file:

java -jar $CROMWELL_JAR run /path/to/vg_wdl/workflows/WORKFLOW.wdl -i inputs.json

Docker Containers

WDL needs the runtime Docker image to be present online (e.g. Dockerhub). Cromwell/miniwdl will pull those images automatically. VG images are available at quay.io and can be pulled with:

docker pull quay.io/vgteam/vg:v1.44.0

Specific versions can be specified like above for version v1.44.0.

Testing locally

To test the workflow locally, e.g. on the small simulated dataset, you can run it with Cromwell or miniwdl (see Usage). So, from the root of this repo, run something like:

java -jar $CROMWELL_JAR run workflows/WORKFLOW.wdl -i params/INPUTS.json
## or
miniwdl run --as-me workflows/WORKFLOW.wdl -i params/INPUTS.json

Miniwdl might be slightly more useful when developing/testing a WDL because is catches errors in WDL syntax faster, and is a bit more explicit about them.

Citation

Cite HPRC

If you use the Giraffe-DeepVariant workflows, please cite the HPRC preprint:

Liao, Asri, Ebler, et al. A Draft Human Pangenome Reference. preprint, bioRxiv 2022; doi: https://doi.org/10.1101/2022.07.09.499321

Cite Giraffe-SV

If you use the SV genotyping workflow with vg giraffe, please cite this article:

Sirén, Monlong, Chang, Novak, Eizenga, et al. Pangenomics Enables Genotyping of Known Structural Variants in 5202 Diverse Genomes. Science, vol. 374, no. 6574, Dec. 2021; doi: https://doi.org/10.1126/science.abg8871.

Cite Pedigree-VG

If you use the pedigree-based workflow for rare variant discovery, please cite this article:

Markello et al. A Complete Pedigree-Based Graph Workflow for Rare Candidate Variant Analysis. Genome Research, Apr. 2022; doi: https://doi.org/10.1101/gr.276387.121.

Contributing, Help, Bugs and Requests

Please open an Issue on GitHub for help, bug reports, or feature requests. When doing so, please remember that vg_wdl is open-source software made by a community of developers. Please be considerate and support a positive environment.

Name		Name	Last commit message	Last commit date
Latest commit History 508 Commits
params		params
scripts		scripts
tasks		tasks
tests		tests
workflows		workflows
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vg_wdl

Workflows

Giraffe-DeepVariant workflow

Giraffe workflow

Giraffe-DeepVariant from GAF workflow

Happy workflow

GAF to sorted GAM workflow

Giraffe SV workflow

Haplotype Sampling workflow

Map-call workflow

Map-call Pedigree workflow

Going further

Read realignment

Reference prefix removal

CRAM input

Reads chunking

Path list

Single-end reads

Unmapped reads

HPRC pangenomes

Usage

Dockstore

Using miniwdl

Using Cromwell

Docker Containers

Testing locally

Citation

Cite HPRC

Cite Giraffe-SV

Cite Pedigree-VG

Contributing, Help, Bugs and Requests

About

Releases 1

Packages

Contributors 6

Languages

License

vgteam/vg_wdl

Folders and files

Latest commit

History

Repository files navigation

vg_wdl

Workflows

Giraffe-DeepVariant workflow

Giraffe workflow

Giraffe-DeepVariant from GAF workflow

Happy workflow

GAF to sorted GAM workflow

Giraffe SV workflow

Haplotype Sampling workflow

Map-call workflow

Map-call Pedigree workflow

Going further

Read realignment

Reference prefix removal

CRAM input

Reads chunking

Path list

Single-end reads

Unmapped reads

HPRC pangenomes

Usage

Dockstore

Using miniwdl

Using Cromwell

Docker Containers

Testing locally

Citation

Cite HPRC

Cite Giraffe-SV

Cite Pedigree-VG

Contributing, Help, Bugs and Requests

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 6

Languages

Packages