GitHub - cgwyx/BGDMdocker: workflow based on Docker to analyze and visualize pan-genomes and biosynthetic gene clusters of bacteria

The command line and tip of BGDMdocker

1.Guide for BGDMdocker workflow usage：

installing latest Docker-CE (Ubuntu, Debian, Raspbian, Fedora, Centos, Redhat, Suse, Oracle,

Linux et al., all applicable):

$ curl -sSL https://get.docker.com/ | bash -x

or:

$ wget -qO- https://get.docker.com/ | bash -x

adding user to the "docker" group with something like:

$ sudo usermod -aG docker manager

Type the following commands at your shell prompt. If this outputs the Docker version, your

installation was successful:

$ docker version

On your host (with Docker), type the following command lines to build a BGDMdocker

workflow:

$ git clone https://github.com/cgwyx/BGDMdocker.git

Or: download “BGDMdocker-master.zip” file

$ unzip BGDMdocker-master.zip

Build workflow Images:

$ cd ./BGDMdocker

$ docker build -t BGDMdocker:latest .

Or:pull Images of BGDMdocker from DockerHub,such as :

$ docker pull cgwyx/BGDMdocker

Running Container and Prokka to genome annotation

Copy the following commands to run the analysis for genome annotation of Ba_xx strains (for

boldface text, please enter your data): (if you have your own genome sequences, you need this

step to generate “*.gbff” annotation files):

$ docker run —rm -v home:home BGDMdocker \

　prokka --kingdom Bacteria --gcode 11 --genus Bacillus \

　--species Amyloliquefaciens \

　--strain Ba_xx --locustag Ba_xx --prefix Ba_xx --rfam \

　--rawproduct --outdir /home/manager/PRJNA291327 \

　/home/manager/Ba_xx.fasta

“Ba_xx.fasta” is the genome sequence, “PRJNA291327” is the output folder of results(must be in

your host).

Running panX analysis pan-genome in Container of BGDMdocker in Command line

interaction patterns.

PanX starts with a set of annotated sequences files, *.gbff (.gbk) (e.g., NCBI RefSeq or

GenBank),and these data should also reside in a folder within“ /pan-genome-analysis/data/ ”

in Container,we will refer to this folder as run directory below. The name of the run directory is

used as a species name in down-stream analysis and visualization.Therefore,you need to enter the

Container to run the relevant commands,and commit the Container to save image of visualization

at last,Copy the following commands to run the analysis of the pan-genome of 44 B.

amyloliquefaciens strains from the command-line interface of Container (for boldface text, please

enter your data). For detailed parameters see here.

$ cd /pan-genome-analysis

$ cp -r /home/manager/B_amy /pan-genome-analysis/data/

$ ./panX.py -fn./data/B_amy -sl B_amy-RefSeq.txt -t 4

“/home/manager/B_amy” is your loclhost annotated sequences files of *.gbff (GenBank files)

and B_amy-RefSeq.txt (accession list for strains), should copy to Container and reside in

“/pan-genome-analysis/data/B_amy” folder, The result will also be output to the this folder in

Container.

Visualization of the pan-genome of 44 B. amyloliquefaciens strains (run in Container):

$ python link-to-server.py B_amy

$ add-new-pages-repo.sh B_amy

$ gulp

On you host ,open http://localhost:8000/B_amy with a web browser to access the visualization of

the pan-genome immediately.

Create a new Image from Container for saving changes of visualization data (running in host):

$ docker commit

Running Container and antiSMASH to search for gene clusters:

Copy the following commands to run the analysis of biosynthetic gene clusters of Y2 strain from

the command-line interface of Container (Y2.gbff) (for boldface text, please enter your data):

$ docker run —rm -v home:home BGDMdocker:latest \

　run_antismash.py /home/manager/input/Y2.gbff \

--outputfolder /home/manager/output/Y2_out \

--dbgclusterblast ./generic_modules/clusterblast \

　 --pfamdir ./generic_modules/fullhmmer --input-type nucl \

--knownclusterblast --clusterblast --subclusterblast --asf\

--inclusive --full-hmmer --smcogs --verbose --borderpredict

*.gbff (GenBank files) should reside in “input” folder,“Y2 _out” is the output folder for

results(must be in your host).

Building workflow using standalone Dockerfile (recommendation):

In order to meet the needs of different users, we also provide a standalone Dockerfile for Prokka,

panX, and antiSMASH. You can build images and run Container separately.

Building Image and run Container of Prokka standalone:

$ git clone https://github.com/cgwyx/prokka_conda_docker.git

Or: download “prokka_conda_docker-master.zip” file

$ unzip prokka_conda_docker-master.zip

$ cd ./prokka_conda_docker-master

$ sudo docker build -t conda:prokka .

Or: pull Image from DockerHub:

docker pull cgwyx/prokka_conda_docker

Run a Container from the image and copy the following commands to run the analysis of the

genome annotation of Ba_xx strains (for boldface text, please replace with your own data if

applicable):

$ docker run —rm -v home:home prokka:latest \

　prokka --kingdom Bacteria --gcode 11 --genus Bacillus \

--species Amyloliquefaciens \

--strain Ba_xx --locustag Ba_xx --prefix Ba_xx --rfam \

--rawproduct --outdir /home/manager/PRJNA291327 \

/home/manager/Ba_xx.fasta

“Ba_xx.fasta” is the sequence of the genome; “PRJNA291327” is the output folder of the

results,thye are all must be in your host.

Building Image and run Container of panX standalone:

$ git clone https://github.com/cgwyx/panx_conda_docker.git

Or: download “.zip” file

$ unzip panx_conda_docker-master.zip

$ cd ./panx_conda_docker-master

$ sudo docker build -t conda:panx .

Or:pull Image from dockerhub:

$ docker pull cgwyx/panx_conda_docker

$docker run -it --rm -v /home:/home cgwyx/panx_conda_docker:latest

Copy the following commands to run the analysis of the pan-genome of 44 B. amyloliquefaciens

strains from the command-line interface of Container (for boldface text, please replace with your

own data if applicable) ,you need to enter the container to run the relevant commands:

First your must copy data from your host to “/pan-genome-analysis/data/”,then run panX in

Container;

$ cp -r /home/manager/B_amy /pan-genome-analysis/data/

$ ./panX.py -fn ./data/B_amy -sl B_amy-RefSeq.txt -t 4

*.gbff (GenBank files) and B_amy-RefSeq.txt (accession list for strains) should be in the

“./data/B_amy” folder; output results will be also in “./data/B_amy” folder.

Visualize the pan-genome of 44 B. amyloliquefaciens strains (run in Container):

$ python link-to-server.py B_amy

$ add-new-pages-repo.sh B_amy

$ gulp

Open http://localhost:8000/B_amy with a web browser to access the visualization of the

pan-genome immediately.

Create a new Image for saving changes in Container data of visualization(running in host):

$ docker commit

Building Image and run Container of antiSMASH4 with database standalone:

$ git clone https://github.com/cgwyx/antismash4_db.git

Or: download “antismash4_db-master.zip” file

$ unzip antismash4_db-master.zip

$ cd ./antismash4_lite_docker-master

$ sudo docker build -t conda:antismash4_db .

Or: pull Image from dockerhub:

$ docker pull cgwyx/antismash4_db.

Run a Container from the image Copy the following commands to run the analysis of biosynthetic

gene clusters of Y2 strain from the command-line interface of Container (Y2.gbff) (for boldface

text, please replace with your own data if applicable):

$ docker run -it --rm -v home:home cgwyx/antismash4_db:latest \

　run_antismash.py /home/manager/input/Y2.gbff \

　--outputfolder /home/manager/output/Y2_out \

--dbgclusterblast ./generic_modules/clusterblast \

　--pfamdir ./generic_modules/fullhmmer --input-type nucl \

--knownclusterblast --clusterblast --subclusterblast --asf\

　--inclusive --full-hmmer --smcogs --verbose --borderpredict

*.gbff (GenBank files) reside in “input”folder; “Y2 _out” is the output folder for the results(must

be in your host).

Tip: How can I download all “genomic.gbff.gz” of a specified species from the

RefSeq or GenBank databases? Replace boldface text with your species if applicable:

Installing script on your host

$ wget ftp://ftp.ncbi.nlm.nih.gov/entrez/entrezdirect/versions/current/edirect.zip

$ unzip -u -q edirect.zip

$ export PATH=$PATH:$HOME/edirect

$ ./edirect/setup.sh

Download “*genomic.gbff.gz” of all strains of Bacillus amyloliquefaciens from GenBank

$ esearch -db assembly -query "Bacillus amyloliquefaciens [ORGN]" | efetch -format docsum | xtract -pattern "DocumentSummary" -element FtpPath_GenBank | sed 's/$//*genomic.gbff.gz/' |xargs wget -c -nd;sleep 3s;

Download “*genomic.gbff.gz” of all Bacillus amyloliquefaciens strains from RefSeq

$ esearch -db assembly -query "Bacillus amyloliquefaciens[ORGN]" | efetch -format docsum | xtract -pattern "DocumentSummary" -element FtpPath_RefSeq | sed 's/$//*genomic.gbff.gz/' |xargs wget -c -nd;sleep 3s;

Visualizing results (local host)

For visualizing the pan-genome of 44 B. amyloliquefaciens strains on your loclhost like our

website using Docker (Docker must be installed):

Access the web download page at http://bapgd.hygenomics.com/pangenome/homeand download

the file “B_amly_44_strans_pan_genome_panx_vis.tar”,store in a home directory of your host.

Copy the following commands to visualize the pan-genome of 44 B. amyloliquefaciens strains on

the local host (Docker must be installed):

$ docker load < B_amly_44_strans_pan_genome_panx_vis.tar

$ docker run -d --rm -p 8000:8000 busybox_nodejs:nodejs_v7.3.0

Open http:// localhost:8000/bamf_gbk44 with a web browser to access the visualization of

pan-genome of 44 B. amyloliquefaciens strains immediately.

For visualizing biosynthetic gene clusters of 44 B. amyloliquefaciens strains:

Access the web download page at http://pangenome.zggskj.com/home and download data of

genecluster of all strains,and you may download the standalone strain genecluster at home page

also.Extract it into any directory then into the strain folder. Use the browser to open “index.html”

to visualize the clusters of the strains.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dockerfile		Dockerfile
Fig.1 Schematic of BGDMdocker workflow.png		Fig.1 Schematic of BGDMdocker workflow.png
Fig.2 phylogenetic of 44 B.amy.png		Fig.2 phylogenetic of 44 B.amy.png
Fig.3 Screenshot of Gene cluster table an Sequence algnment.png		Fig.3 Screenshot of Gene cluster table an Sequence algnment.png
Fig.4 Screenshot of Phylogenetic tree of strains and gene.png		Fig.4 Screenshot of Phylogenetic tree of strains and gene.png
Fig.6 Screenshot of type cluster of Y2.png		Fig.6 Screenshot of type cluster of Y2.png
License.txt		License.txt
README.docx		README.docx
README.md		README.md
Table 1 Summary statistics of pan-genome of 44 B. amyloliquefaciens strains.docx		Table 1 Summary statistics of pan-genome of 44 B. amyloliquefaciens strains.docx
Table 2 Summary statistics of biosynthetic gene clusters of 44 B. amyloliquefaciens strains.docx		Table 2 Summary statistics of biosynthetic gene clusters of 44 B. amyloliquefaciens strains.docx
Table 3 Biosynthetic gene clusters of Y2(NC_017912) strain.docx		Table 3 Biosynthetic gene clusters of Y2(NC_017912) strain.docx
Table 4. Function of BGDMdocker workflow compared with others pan-genome tools.docx		Table 4. Function of BGDMdocker workflow compared with others pan-genome tools.docx
instance.cfg		instance.cfg
panx-docker-add-new-pages-repo.sh		panx-docker-add-new-pages-repo.sh
panx-docker-link-to-server.py		panx-docker-link-to-server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

License

cgwyx/BGDMdocker

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages