Skip to content

Tutorial

Marius Wöste edited this page Jan 14, 2021 · 14 revisions

wg-blimp step-by-step guide

This tutorial gives you a step-by-step description on how to use wg-blimp.

Obtaining test data

We provide a small toy dataset based on a subsampled dataset containing blood and sperm methylomes. The following download contains read data in .fastq format and the reference sequence in .fasta format: https://uni-muenster.sciebo.de/s/7vpqRSEATYcvlnP. Please note that the results created by this test run are not meant for downstream analysis, and should only be seen as a feature demonstration.

After extraction you should see a folder fastq containing read data and the file chr22.fasta containing the reference. There are 4 test samples in total: blood1, blood2, sperm1 and sperm2.

Installation

The easiest way to install wg-blimp is to use Bioconda. If you do not already have a conda installation set up, please follow the instructions at https://docs.conda.io/projects/conda/en/latest/user-guide/install/.

Once conda is available you need to include the Bioconda channel using the commands:

conda config --add channels defaults
conda config --add channels bioconda
conda config --add channels conda-forge

For more details or troubleshooting installing Bioconda, see https://bioconda.github.io/.

Once Bioconda is available, you may install wg-blimp using the following command:

conda create -n wg-blimp wg-blimp r-base==4.0.3

This will create a new conda environment containing the wg-blimp installation. The installation process will require ~3GB of space and might take a while because all tools used by wg-blimp need to be downloaded and installed. Creating a fresh environment is highly advised to prevent incompatibilities causing errors. Please note that pinning r-base to a specific version here will drastically speed up conda dependency solving.

Running wg-blimp

Before you can use wg-blimp you need to activate the conda environment that was created in the previously described installation step:

conda activate wg-blimp

After that command you can use the command:

wg-blimp --help

If everything was set up correctly, you should see a help message similar to:

Usage: wg-blimp [OPTIONS] COMMAND [ARGS]...

Options:
  --help  Show this message and exit.

Commands:
  create-config              Create a config YAML file for running the...
  delete-all-output          Remove all files generated by the pipeline.
  run-shiny                  Start shiny GUI using configuration files for...
  run-snakemake              Run the Snakemake pipeline from command line.
  run-snakemake-from-config  Run the snakemake pipeline using a config file.

Whenever you don't know what command or parameters to use, you may use wg-blimp --help to gain information about the correct syntax to use. As displayed by the help message, there are multiple ways to run wg-blimp: You can either directly invoke the full workflow with a single command, or first create a configuration file and run the workflow using the created configuration file.

Invoking the whole workflow

You can use the command

wg-blimp run-snakemake --help

to get a detailed information about the syntax for running the whole workflow. Make sure your current working directory contains the downloaded fastq/ dir and chr22.fasta reference.

Before actually invoking a computationally heavy workflow, it is usually recommended to perform a dry run to see if everything is set up correctly:

wg-blimp run-snakemake --cores=8 fastq/ chr22.fasta blood1,blood2 sperm1,sperm2 results --dry-run

If you are satisfied with the steps executed, you may use

wg-blimp run-snakemake --cores=8 fastq/ chr22.fasta blood1,blood2 sperm1,sperm2 results 

to actually invoke runing the whole pipeline. If everything runs without errors, a folder results containing all analysis data will show up. This folder contains the annotated DMR lists as well as methylation reports and QC data. Please note that a configuration file results/config.yaml is automatically generated to see which parameters have been used.

Using configuration files

When dealing with analysis pipelines it is often useful to inspect and manually change analysis parameters if necessary. wg-blimp provides commands to first create a configuration file, and then run the analysis workflow from the configuration file. To create a configuration file, you may use the command:

wg-blimp create-config --cores-per-job=4 fastq/ chr22.fasta blood1,blood2 sperm1,sperm2 results-from-config wg-blimp-config.yaml

This syntax is very similar to the wg-blimp run-snakemake command, but instead of running the whole workflow, only a file wg-blimp-config.yaml will be created. In this file, you may change parameters as you wish, and run the workflow later on (see README for details on available parameters). Once you are satisfied with your configuration file, you can use

wg-blimp run-snakemake-from-config --cores=8 wg-blimp-config.yaml 

to invoke the actual analysis.

Accessing the Shiny interface

Once Snakemake finishes execution, you may use wg-blimp's user interface to inspect the analysis results. To start a Shiny web server, you can use the command:

wg-blimp run-shiny results/config.yaml

Once the server is running, you can access the interface by opening http://localhost:9898 in any web browser that Shiny supports. Please note that the port Shiny listens on can be configured, for details you can use wg-blimp run-shiny --help.

Where to go from here

Once you have finished this tutorial, you may have a deeper look at the repository's README, it contains some more in-depth explanations of pipeline parameters. If you encounter any errors or have any wishes for features, feel free to write a mail to mar.w@wwu.de or open an issue here on GitHub!