Analysis accompanying "HINGE: Long-Read Assembly Achieves Optimal Repeat Resolution" http://biorxiv.org/content/early/2016/07/05/062117
This repository provides an analysis pipeline that reproduces the main results in the paper step-by-step.
The following software needs to be installed (and can be installed using apt-get).
build-essential
libhdf5-dev
libboost-all-dev
cmake-3.2
g++-4.9
gcc-4.9
python
python-pip
Most of these can be installed with apt-get. Cmake 3.2 can be installed from this ppa on ubuntu: ppa:george-edison55/cmake-3.x
on ubuntu, and gcc/g++-4.9 from ppa:ubuntu-toolchain-r/test
.
git clone https://github.com/govinda-kamath/HINGE-analyses.git
cd HINGE-analyses
git submodule foreach --recursive git submodule update --init
git submodule update --init --recursive
./build.sh
source setup.sh
# Optionally you can create a python virtual environment and then install the requirements
pip install -r requirements.txt
The python packages installed by the last line are the following.
- numpy
- ujson
- cython
- networkx
- matplotlib
- biopython
- bcbio-gff
- bcbio-nextgen
- colormap
- easydev
- forceatlas2
- jupyter
One may need to install matplotlib by installing the python-matplotlib
package. On ubuntu the command to do this would be sudo apt-get build-dep python-matplotlib
All of these packages can be alternatively installed with sudo pip install <package>
. While installing forceatlas2, one should make sure that the code is cython compiled to get a 10x improvement in speed. One explicit way to ensure that is by directly downloading the source from pypi and compiling the setup.py
.
We also need both ascp and Aspera connect to speed up the downloads.
The results of Figure 2 in the paper can be reproduced using this notebook.
Here is a tutorial on one way to set up an ipython/jupyter notebook it on a remote server.