A modular, flexible, and reproducible Snakemake workflow to perform k-mers-based GWAS.
kGWASflow is a Snakemake pipeline developed for performing k-mers-based genome-wide association study (GWAS) based on the method developed by Voichek et al. (2020). It performs several pre-GWAS analyses, including read trimming, quality control, and k-mer counting. It implements the kmersGWAS method into an easy to use and accessible workflow. The pipeline also contains post-GWAS analyses, such as mapping k-mers to a reference genome, finding and mapping the source reads of k-mers, assembling source reads into contigs, and mapping them to a reference genome. kGWASflow is also highly customizable and offers users multiple options to choose from depending on their needs.
More information and explanations on how to install, configure and run kGWASflow are provided in the kGWASflow Wiki.
kGWASflow is out on G3 Genes|Genomes|Genetics:
- Corut, A. K., & Wallace, J. G. (2024). kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3: Genes, Genomes, Genetics, 14(1), jkad246. https://doi.org/10.1101/2023.07.10.548365
In order to use this workflow, you need conda
to be installed (to install conda
, please follow the instructions here).
# Create a new conda environment with kgwasflow
# and its dependencies
conda create -c bioconda -n kgwasflow kgwasflow
# Activate kGWASflow conda environment
conda activate kgwasflow
# test kGWASflow
kgwasflow --help
Alternatively, kGWASflow can be installed by cloning the GitHub repository.
# Clone this repository to your local machin
git clone https://github.com/akcorut/kGWASflow.git
# Change into the kGWASflow directory
cd kGWASflow
After cloning the GitHub repo, you can install snakemake and the other dependencies using conda
as below:
# This assumes conda is installed
conda env create -f environment.yaml
# Activate kGWASflow conda environment
conda activate kGWASflow
Finally, you can install kGWASflow using the setup script as below:
# Install kgwasflow
python setup.py install
# Test kgwasflow
kgwasflow --help
The other options on how to deploy this workflow can be found in the Snakemake Workflow Catalog.
To configure kGWASflow, you first need to initialize a new kGWASflow working directory by following the below steps:
# Activating the conda environment
conda activate kgwasflow
# Initializing a new kgwasflow working dir
kgwasflow init --work-dir path/to/your/work_dir
or
# Activating the conda environment
conda activate kgwasflow
# Change into your preferred working directory
cd path/to/your/work_dir
# Initializing a new kgwasflow working dir
kgwasflow init
This command will initialize a new kGWASflow working directory with the default configuration files. Below is the directory structure of the working directory:
path/to/your/work_dir
├── config
│ ├── config.yaml
│ ├── phenos.tsv
│ └── samples.tsv
└── test
Below are the configuration files generated by kgwasflow init
command:
-
config/config.yaml
is a YAML file containing the workflow configuration. -
config/samples.tsv
is a TSV file containing the sample information. -
config/phenos.tsv
is a TSV file containing the phenotype information.
For more information about each configuration file, please see kGWASflow Wiki.
After initializing (kgwasflow init
) step and modifying the configuration files, kGWASflow can be run as below:
# Activating the conda environment
conda activate kgwasflow
# Change into your preferred working directory
cd path/to/your/work_dir
# Run kgwasflow
kgwasflow run -t 16
Below are some of the run examples of kGWASflow:
Run examples:
1. Run kGWASflow with the default config file, default arguments and 16 threads:
kgwasflow run -t 16 --snake-default
2. Run kGWASflow with a custom config file and default settings:
kgwasflow run -t 16 -c path/to/custom_config.yaml
3. Run kGWASflow with user defined working directory:
kgwasflow run -t 16 --work-dir path/to/work_dir
4. Run kGWASflow in dryrun mode to see what tasks would be executed:
kgwasflow run -t 16 -n
5. Run kGWASflow using mamba as the conda frontend:
kgwasflow run -t 16 --conda-frontend mamba
6. Run kGWASflow and generate an HTML report:
kgwasflow run -t 16 --generate-report
Information about how to use kGWASflow with Snakemake commands can be found in the Snakemake Workflow Catalog.
In order to test kGWASflow using an E.coli dataset (Earle et al. 2016, Rahman et al. 2018)
# Activating the conda environment
conda activate kgwasflow
After activating kGWASflow, you can perform a test run as axplained below:
Test examples:
1. Run the kGWASflow test in dryrun mode to see what tasks would be executed:
kgwasflow test -t 16 -n
2. Run the kGWASflow test using the test config file with 16 threads:
kgwasflow test -t 16
3. Run the kGWASflow test and define the test working directory:
kgwasflow test -t 16 --work-dir path/to/test_work_dir
kGWASflow was developed by Adnan Kivanc Corut.
For Issues: https://github.com/akcorut/kGWASflow/issues
Contributions to the development of kGWASflow are welcome! Create Pull Requests to fix bugs or recommend new features!
If you use kGWASflow in your research, please cite using the DOI: https://doi.org/10.1101/2023.07.10.548365 and the original method paper by Voichek et al. (2020):
-
Corut, A. K., & Wallace, J. G. (2024). kGWASflow: a modular, flexible, and reproducible Snakemake workflow for k-mers-based GWAS. G3: Genes, Genomes, Genetics, 14(1), jkad246. https://doi.org/10.1101/2023.07.10.548365
-
Voichek, Y., Weigel, D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat Genet 52, 534–540 (2020). https://doi.org/10.1038/s41588-020-0612-7
kGWASflow is licensed under the MIT license.