Skip to content
hyattpd edited this page Aug 3, 2014 · 31 revisions

Prodigal is a protein-coding gene prediction algorithm for bacterial and archaeal genomes. The acronym stands for PROkaryotic DYnamic Programming Genefinding ALgorithm. Dictionary.com provides several definitions of the word "prodigal". The one the authors wish to invoke is:

3. lavishly abundant; profuse: nature's prodigal resources.

and not the more common religious context of the Prodigal Son (extravagance or wastefulness).

History

Prodigal was developed jointly between Oak Ridge National Laboratory and the University of Tennessee-Knoxville in 2007 under the auspices of the Department of Energy Joint Genome Institute. The first paper was published in BMC Bioinformatics in 2010. Since that time, Prodigal has gone on to become the most popular microbial gene prediction algorithm in the world. As of August 2014, the publication had been cited more than 600 times. It has been downloaded thousands of times and is in use in over 50 countries around the world. The National Center for Biotechnology Information includes Prodigal gene predictions at its ftp site for all bacterial and archaeal genomes.

What does Prodigal do?

The following are key features of Prodigal:

  • Predicts protein-coding genes: Prodigal provides fast, accurate protein-coding gene predictions in GFF, Genbank, or Sequin table format.
  • Handles draft genomes and metagenomes: Prodigal runs smoothly on finished genomes, draft genomes, and metagenomes.
  • Runs quickly: Prodigal analyzes the E. coli K-12 genome in 10 seconds on a modern MacBook Pro.
  • Runs unsupervised: Prodigal is an unsupervised machine learning algorithm. It does not need to be provided with any training data, and instead automatically learns the properties of the genome from the sequence itself, including genetic code ( v3.0.0+ ), RBS motif usage, start codon usage, and coding statistics.
  • Handles gaps, scaffolds, and partial genes: The user can specify how Prodigal should deal with gaps and has numerous options for allowing or forbidding genes to run into or span gaps.
  • Translation initiation site prediction: Prodigal is highly accurate at predicting the correct translation initiation site for genes, and can output information about every potential start site in the genome, including confidence score, RBS motif, and much more.
  • Outputs detailed summary statistics for each genome ( v3.0.0+ ): Prodigal outputs detailed summary statistics for each genome, including contig length, gene length, GC content, GC skew, RBS motifs used, and start and stop codon usage.

What doesn't Prodigal do?

  • Predict RNA genes: For the time being, Prodigal does not predict RNA genes, although we haven't ruled out adding this capability in a future version.
  • Handle genes with introns: Genes with introns are rare enough that Prodigal doesn't bother trying to find them.
  • Functionally annotate genes: Prodigal does not provide functional annotations for the genes it predicts.
  • Viral gene prediction: Prodigal has not been tested by the authors on viruses, although it is likely the anonymous mode would work in such cases; however, Prodigal contains no special rules or routines to handle viral genomes.

License

Prodigal is open source and freely available under the GPL.