The bracken_plot application allows for quick and easy visualization of merged Bracken data with stacked bar plots. This repository contains a how-to guide, example files, and the app.R source code.
If you want more control over the plot style and parameters, you can download and run the plotting function locally.
Bracken is a companion program to Kraken that allows for estimation of relative abundance at any taxonomic level. For information regarding installation of Bracken and Kraken, see their GitHub pages:
https://github.com/DerrickWood/kraken2
https://github.com/jenniferlu717/Bracken
I recommend creating a conda enviroment and installing both from Bioconda:
https://anaconda.org/bioconda/kraken2
https://anaconda.org/bioconda/bracken
Here is an example script for running Kraken and Bracken on paired-end reads:
name=sample_A
kdb=/home/refdbs/kraken/Standard_DB
fq=/workdir/fastq
export OMP_NUM_THREADS=8
source /home/miniconda3/bin/activate
conda activate kraken2
mkdir -p ${name}
cd ${name}
kraken2 \
--gzip-compressed \
--paired \
--report ${name}.report.txt \
--db $kdb \
--threads $OMP_NUM_THREADS \
--output ${name}.out.txt \
${fq}/${name}_R1.fastq.gz ${fq}/${name}_R2.fastq.gz
conda activate bracken
levels=P,C,O,F,G,S,S1
for level in $(echo $levels | sed "s/,/ /g"); do
bracken \
-d $kdb \
-i ${name}.report.txt \
-o ${name}.bracken_${level}.txt \
-r 75 \
-l ${level}
done
Once you have Bracken reports for each sample at the desired taxonomic levels, reports can be combined by level using combine_bracken_outputs.py
:
source /home/miniconda3/bin/activate
conda activate bracken
levels=P,C,O,F,G,S,S1
for level in $(echo $levels | sed "s/,/ /g"); do
combine_bracken_outputs.py \
--files ./*/*.bracken_${level}.txt \
--names sample_A,sample_B,sample_C \
--output ./merged_bracken_${level}.txt
done
Note that globbing expansion processes files alphanumerically, so the sample identifiers supplied in the --names
option need to be in order or the columns of the merged file will be mislabeled.
Upload your merged Bracken file and click "Create Plot". To plot an example, you can download Bracken output files from this repository. The app will automatically detect the taxonomic level and print a stacked bar plot showing the relative abundance of each taxon. Often, there are many taxa with near-zero abundances, and plotting all taxa results in ambiguous labeling. If this is the case, use the "Maximum number of taxa to plot" field to subsample the dataset. Subsampling will reduce the number of taxa plotted to the n taxa with the greatest median relative abundances across samples. The relative abundances of all taxa not in the subset are summed and plotted as "other". Once a plot is rendered, click "Get PDF" to download a pdf version.
Custom color palettes can be added as a string of comma-separated hexadecimal values without spaces or #
characters. Colors are recycled in cases where the number of taxa exceeds the number of colors in a palette. If subsampling taxa, make sure that custom palettes do not contain the color used for the "other" label (gray 808080
by default). Some example palettes:
5c2751,ef798a,f7a9a8,00798c,6457a6,9dacff,76e5fc,a30000,ff7700,f5b841
05a8aa,b8d5b8,d7b49e,dc602e,bc412b,791e94,2f4858,293f14,386c0b,550527
99d5c9,6c969d,645e9d,392b58,2d0320,f9c784,fcaf58,ff8c42,cc2936,ebbab9
Coolors.co is great for manually picking your own palettes. fbparis's palette tool attempts to maximize the perceived distinctness between colors and is a good option if a large number of colors is desired.
Feel free to open an issue if you experience errors or would like to see specific features implemented in future updates. If you want to create and manipulate bracken relative abundance plots as vector images, please download and run the plotting function provided in the Rmarkdown document.
Note: bracken_plot is currently hosted on shinyapps.io under a free account, which means the app is restricted to 25 active hours per month.