CRAM is a tool developed to analyze data produced from shotgun metagenome sequencing. CRAM was created for people who are familiar with UNIX and command-line interfaces and enables scientists to modify their analysis approach to fit their experiment.
The goals of CRAM are to:
- Provide a fully open source solution for metagenome analysis
- Create a codebase that promotes contribution by the community.
- Allow for analysis of large datasets on commodity hardware.
The default pipeline consists of an assemble, annotate and quantify approach to metagenome analysis. The creation of quantitative data allows for comparison between samples across other variables such as time and space. If this does not suit you, you can easily craft your own approach.
Oh yeah, CRAM and the tools that it is built from are 100% free and open source because science with black boxes is not science.
CRAM takes the following approach to metagenome assembly, annotation and analysis:
- Quality control (Trimming)
- De novo assembly (Velvet)
- Open Reading Frame prediction (Prodigal)
- ORF annotation (PHMMER/BLAST) using Subsystems (SEED)
- ORF coverage detection (SMALT) leading to quantitative measurement of metabolic potential.
- Quantitative measurement of community composition by comparison of 16S rRNA genes to the Ribosomal Database Project (RDP) database.
The end result is a matrix containing subsystems and their coverage in the metagenome. Samples can be standardized (by dividing by the total number of reads), and compared. Combined with metadata, this can lead to an understanding of the relationship between the a sample's metabolic potential and environmental factors.
-
You need to have Python version between 2.7 and 3.0. To check your version of Python, type:
$ python --version
-
Download and extract from here.
-
cd into the CRAM directory and type:
$ python setup.py install
You should see a bunch of output and "installation complete". This means that Cram is installed. You will also need to download the tools and databases used by CRAM. This can be accomplished by typing
make
in the same directory. This step takes a while as the databases are quite large. -
CRAM should be installed. Check by typing
$ metacram
You should see:
** MetaCRAM ** Pipelines: Type metacram <name> <directory> to create a new project. simple illumina
The
metacram
is used to create metagenome projects. To create a project:$ metacram <name of pipeline> <directory for project>
There are two pipelines that come with cram out of the box: Illumina and Simple. Illumina is for paired-end reads and simple is for single-end reads generated by any playform as long as the reads are in fasta/fastq or qseq format.
Once you have created the project directory with the
metacram
command, cd into that directory and make a directory calleddata/
.$ cd metagenome_projects $ metacram simple new_project $ cd new_project $ mkdir data $ cp my_raw_reads.fastq data/
For paired end reads, your reads by be called
*_left.qseq
and*_right.qseq
. (The extension doesn't matter as long as it's fastq or qseq). Only the left and right matter. This tells CRAM how the reads are oriented.Now that your reads are in the data/ directory, invoke the pipeline by typing
$ python simple.py # for the simple pipeline
Or
$ python illumina.py # for the illumina pipeline
You should see things beginning to happen (directories being made, reads trimmed, assemblages assembleD). If, at any point, the pipeline crashes or you stop it, you can resume it by invoking the script again. Cram will pick up where it left off.
BSD (see LICENSE.txt)