Ra is short for RNA Assembler and it is a C++ implementation of an overlap-layout-consensus transcriptome assembler. It was developed as part of my master's thesis at FER.
- g++ (4.6.3 or higher)
- GNU Make
- doxygen (optional)
*note: tested on Linux and OS X (currently not working under clang)
To build the RA project run the following commands from your terminal:
git clone https://github.com/mariokostelac/ra.git ra
cd ra/
make
Running the 'make' command will create the bin folder where all executables will be stored.
Currently supported modules are:
- ra - Module is the main static library and is used by other modules (doesn't provide any executable).
- ra_overlap - Module is used for finding all overlaps between input single end reads. It also removes contained and transitive overlaps.
- ra_layout - Module is used to create a string graph from input overlaps. It then simplifies it with trimming and bubble popping. At the end it extracts longest contigs from every graph component. As the current overlapper is exact, it also extracts whole transcripts (transcripts.layout.fasta) so that the consensus phase can be avoided for now.
- ra_consensus - Module is used to build consensus sequences, it uses CPPPOA and outputs transcripts.
- ra_correct - Module is optional and is used to correct reads. If used, it should be called before ra_overlap.
- to_afg - Module is used for converting read sets from FASTA/FASTQ to afg format. It is neccessary to convert reads because all other modules are using the afg format.
- overlap2dot - Module used for converting overlap files to dot graphs. Cool stuff!
- zoom - Module used for "zooming" a part of overlaps graph. Actually just a simple DFS with depth limit.
Convert reads:
./bin/to_afg -i examples/ERR430949.fastq --fastq > ERR430949.afg
Correct reads:
./bin/ra_correct -i ERR430949.afg -k 25 -c 2 > ERR430949_c.afg
Overlap phase:
./bin/ra_overlap -i ERR430949_c.afg -m 25 -t 10 --reads-out ERR430949_u.afg > ERR430949_ovl.afg
Layout phase (OUTDATED):
./bin/ra_layout -i ERR430949_u.afg -j ERR430949_ovl.afg -t 10 > ERR430949_con.afg
Consensus phase (optional for now):
./bin/ra_consensus -i ERR430949_u.afg -j ERR430949_con.afg > transcripts.fasta
*note: First ra_correct and ra_overlap runs for every set of reads will cache the enhanced suffix arrays for future usage and therefore will be slower (files with .cra, .nra and .rra extensions will be created). Next runs with the same input will use the cached files and will be faster.