Skip to content

DAnIEL: a user-friendly web server for fungal ITS amplicon sequencing data

License

Notifications You must be signed in to change notification settings

bioinformatics-leibniz-hki/DAnIEL

Repository files navigation

DAnIEL

DOI License

A user-friendly web server for fungal ITS amplicon sequencing data

A running instance of DAnIEL: Describing, Analyzing and Integrating fungal Ecology to effectively study the systems of Life can be accessed at https://sbi.hki-jena.de/daniel.

Features

  • Analysis of paired-end ITS amplicon sequencing data in any web browser
  • Statistics and machine learning between groups of samples
  • Correlation networks
  • Integration of existing cohorts from NCBI SRA

Citation

DAnIEL: a user-friendly web server for fungal ITS amplicon sequencing data Daniel Loos, Lu Zhang, Christine Beemelmanns, Oliver Kurzai, and Gianni Panagiotou Frontiers in Microbiology, doi: 10.3389/fmicb.2021.720513

Getting Started

  1. Set up environmental variables to paths of this repository and the data base directory (not included):
export DANIEL_DIR=/my-path
export DANIEL_REPO_DIR=$DANIEL_DIR/repo
export DANIEL_USERDAT_DIR=$DANIEL_DIR/userdat
export DANIEL_DB_DIR=$DANIEL_DIR/db
  1. Clone the repository to build the web server from source:
git clone https://github.com/bioinformatics-leibniz-hki/DAnIEL.git $DANIEL_REPO_DIR
cd $DANIEL_DIR
git lfs pull
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker-compose build --parallel

Alternatively, download the docker images from Docker Hub:

docker pull bioinformaticsleibnizhki/daniel_backend
docker pull bioinformaticsleibnizhki/daniel_frontend

The latter method requires to change the image names in the file docker-compose.yml to use those pulled from Docker Hub (e.g. replace ìmage: daniel_backend with image: bioinformaticsleibnizhki/daniel_backend:v1.0).

  1. Create msmtp config file in back_end/msmtprc for email notifications. This file must be mounted to the back end container at /etc/msmtprc at runtime. notify_mail function in worker.sh will use this to send emails to the users once the pipeline is finished.

  2. Deploy a data base directory at $DANIEL_DB_DIR e.g. by unzipping the database DAnIEL DB v 1.0:

wget https://zenodo.org/record/4073125/files/daniel_db_v1.0.tar.gz?download=1 \
	-O daniel_db_v1.0.tar.gz
tar -xf daniel_db_v1.0.tar.gz
mv db $DANIEL_DB_DIR

Use docker-compose to start DAnIEL webserver containers:

wget https://raw.githubusercontent.com/bioinformatics-leibniz-hki/DAnIEL/main/docker-compose.yml
docker-compose up -d

The front end can be accessed at http://localhost.

Development

The software package is divided in the following sections:

  • front_end - Interactive website to upload data and to visualize the results
  • back_end - The analysis workflow called from the front end
  • danielLib - R package containing functions both front end and back end require

The aim of the front end is to create a directory for each project containing all files needed to start the analysis workflow. It is written in R Shiny. Input of reactive UI elements are merged into a file project.json. A queue file in the user directory is appended by the project id when the start pipeline button is clicked.

The aim of the back end is to process the analysis workflow. It is written in Snakemake. HTML reports are generated using R Markdown. They are stored in the directory back_end/reports.

Helper scripts e.g. to create bibtex files and the sqlite data base are stored at back_end/helper.

New features can be added by creating a new Snakemake rule in the directory back_end/rules and adding the result file as a target to the file targets.snakefile.py. A Conda environment can be defined for each rule in the directory back_end/envs. Visualization is done by creating a new shiny module in the directory front_end/modules and adding it to the app files front_end/server.R and front_end/ui.R.

Techstack

General tools used:

  • docker - app containerization
  • conda - management of environments and software packages
  • R shiny - Front end UI
  • tidyverse - Data manipulation and visualization
  • Snakemake - management of back end workflow

Tools used to perform the bioinformatical analysis:

  • Cutadapt - quality control of raw reads
  • FastQC - asses quality of read files
  • MultiQC - merging QC results from multiple samples
  • PIPITS - OTU profiling pipeline
  • DADA2 - ASV profiling pipeline
  • FastSpar - Correlation analysis aware of sparsity
  • BAnOCC - Correlation analysis aware of compositionality
  • caret - Machine learning
  • vegan - Diversity analysis

Author

Daniel Loos

Systems Biology and Bioinformatics

Leibniz Institute for Natural Product Research and Infection Biology

Hans Knöll Institute (HKI)