layout | permalink | title |
---|---|---|
page |
/quick_start/ |
Quick start |
This quick start guide will introduce you to adp-multi (and ADP if you don't know it yet) and points out differences to the original Haskell-ADP implementation.
The examples referenced below are related to RNA secondary structure prediction because this is where the ADP method is mainly used at the moment.
Welcome stranger! It's always nice to have new company around. Before you continue reading the next sections, please have a look at the ADP online course or the tutorial-style paper from its inventors.
When you're confident with the idea of ADP and if you're a bioinformatics guy, then you should ask yourself how pseudoknots or the RNA-RNA interaction problem can be modelled by context-free grammars. They can't! That's what multiple context-free grammars are for, and adp-multi extends ADP by supporting the class of languages that MCFG generate. Have a look at the slides of a talk I gave at the "10. Herbstseminar der Bioinformatik" (2012) called "Predicting Pseudoknots Without Hacking in C". Note that the adp-multi syntax in the slides is outdated.
If you are already familiar with the
Haskell implementation of regular ADP, then
have a look at
tests/ADP/Tests/NestedExample.hs
where adp-multi is used with only one-dimensional parsers/nonterminals.
This is equal to the working of regular ADP and should be familiar
to you. The additional syntax overhead (>>>
etc.) will be useful for higher dimensions.
Contrary to Haskell-ADP, adp-multi doesn't use different combinators for specifying the
yield sizes. Instead, a restricted yield size analysis is performed before running
the actual dynamic programming algorithm. Restricted, because it can't detect cycles
and would loop indefinitely if not told to stop. The special combinators yieldSize1
and yieldSize2
are used
for that case and have the effect that the minimum and maximum yield size of a nonterminal is
specified manually so that the automated yield size analysis is skipped and doesn't go into a loop.
If you haven't done so, now is a good time to learn what MCFGs are. When you're done, you can continue here.
For everything higher than one dimension you have to use a custom rewriting function for each grammar rule (instead of id
).
As explained in the MCFG page there are two representations of multiple context-free grammars:
the original functional-style representation by
Seki et al. (1991)
and the newer inlined-style one by Wild (2010).
As the functional-style representation fits better for implementation via parser combinators, this style
was chosen for adp-multi.
Have a look at
tests/ADP/Tests/RGExample.hs
and scroll to the bottom where the grammar is. You have to use >>>
to combine a rule with its rewriting function.
At the moment there is no type-safety for rewriting functions, so be careful with the list
argument and return value (it will fail at run-time though). If you want more information on that,
have a look at the syntax page.
As it is always good to learn by example, please have a look at the Tests folder. There you will find:
Copy language:
CopyExample.hs
a grammar for the copy language L = { ww | w in {a,b}^* }CopyTwoTrackExample.hs
a two-track grammar for the tupled copy language L = { (w,w) | w in {a,b}^* }
RNA secondary structure prediction:
-
NestedExample.hs
a grammar for nested RNA secondary structures (1-dim nonterminals) -
RGExample.hs
a grammar for RNA sec. structures with canonical simple recursive pseudoknots (only the first canonization rule is used) -
RGExampleDim2.hs
same asRGExample.hs
but with 1-dim nonterminals encoded as 2-dim nonterminals (for testing purposes) -
RGExampleStar.hs
same asRGExample.hs
but with$(\mathcal{A}^*)^i$ used in signature instead of$\mathcal{A}$ and$(\mathcal{A},\mathcal{A})$ -
OneStructureExample.hs
a grammar for RNA 1-structures (4 types of pseudoknots) -
ZeroStructureTwoBackbonesExample.hs
a grammar for RNA 0-structures over two backbones (RNA-RNA interaction problem) -
Nussinov.lhs
benchmark grammar using original Haskell-ADP, equal toNestedExample.hs
Sequence/tree alignment:
AlignmentExample.hs
Needleman-Wunsch global alignment of two sequencesTreeAlignExample.hs
Jiang-Wang-Zhang alignment of two trees encoded as sequences (terms)
Auxiliary files:
Suite.hs
a small test suiteMain.hs
the entry point of the test executable, an easy way to try out things
Although quite easily possible, higher dimensions haven't been implemented yet as a more generic approach instead of implementing code for every dimension (which of course isn't possible) hasn't been found yet. At least for the domain of RNA secondary structure prediction, having more than two dimensions isn't needed for the current state of research. See also "Algebraic and Combinatorial Properties of Common RNA Pseudoknot Classes with Applications" by Markus E. Nebel and Frank Weinberg (2012).