Skip to content

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

License

Notifications You must be signed in to change notification settings

HaploKit/StrainXpress

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

StrainXpress

Description

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

Installation and dependencies

Please note that StrainXpress is built for linux-based systems and python3 only. StrainXpress relies on the following dependencies: StrainXpress relies on the following dependencies:

To install strainxpress, firstly, it is recommended to intall the dependencies through Conda:

conda create -n strainxpress
conda activate strainxpress
conda install -c bioconda python=3.6 scipy pandas minimap2

Subsequently, pull down the code to the directory where you want to install, and compile the code:

git clone https://github.com/kangxiongbin/StrainXpress.git
cd StrainXpress
sh install.sh

Examples

Illumina miseq

python ../scripts/strainxpress.py -fq all_reads.fq

The input file must be interleaved FASTQ and format like below:

@S0R0/1
TATAAGTAAGGCGTTGCGAGCGGGTCGTAAAATATTTTTGATCCGT
+
EEEEEGEDJHJ3JHKJMMMLLLKNGOOLLNLOOOMJONLOOIOLMO
@S0R0/2
TTGATTATCATGCCGGAAGTGCTGCTCTTGTTCTCTGAAAGAGAAT
+
EEEGEHHHJHFJJJJBML2MMLNLLONNLNLOLJONOLNONNNMNF

When a data set is big, we recommend to use the fast cluster method:

python ../scripts/strainxpress.py -fq all_reads.fq -fast

- The result is in the stageb folder: final_contigs.fasta

Possible issues during installation (optional)

If g++ version of the system is not satisfied, one could try this to install:

conda install -c conda-forge gxx_linux-64=7.3.0
# replace the /path/to/ with your own path
ln -s /path/to/miniconda3/envs/strainxpress/bin/x86_64-conda-cos6-linux-gnu-g++ /path/to/miniconda3/envs/strainxpress/bin/g++
ln -s /path/to/miniconda3/envs/strainxpress/bin/x86_64-conda-cos6-linux-gnu-gcc /path/to/miniconda3/envs/strainxpress/bin/gcc

If boost library is not installed, you could try this to install:

conda install -c conda-forge boost
# set envionment variables
export LD_LIBRARY_PATH=/path/to/miniconda3/envs/strainxpress/lib/:$LD_LIBRARY_PATH
export CPATH=/path/to/miniconda3/envs/strainxpress/include/:$CPATH

If compile error occurs something like /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lboost_timer or cannot find -lgomp, which means it fails to link boost or libgomp library, one could try this to solve:

ln -s /path/to/miniconda3/envs/strainxpress/lib/libboost_* /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/lib/.
ln -s /path/to/miniconda3/envs/strainxpress/lib/libgomp* /path/to/miniconda3/envs/strainxpress/x86_64-conda_cos6-linux-gnu/lib/.
# then re-complile and install
sh install.sh

About

StrainXpress is a de novo assembly method which base on overlap-layout-consensus (OLC) paradigm and can fast and accurately assemble high complexity metagenome sequencing data at strain resolution.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 80.6%
  • Python 18.4%
  • Other 1.0%