memerna is a library and set of programs for RNA folding. It includes algorithms for folding, suboptimal folding, and the partition function. memerna is very performant, as it uses both fast algorithms and a highly optimized implementation.
memerna was tested to build and run in Ubuntu 2022.04 LTS with up to date packages. The following instructions assume you are using Ubuntu 2022.04 LTS. However, memerna is mainly developed on Arch Linux, and is likely to work on any modern Linux distribution provided the toolchain is new enough.
Install the following packages:
sudo apt install build-essential cmake git libboost-dev libmpfr-dev
On Ubuntu 2022.04 LTS, the following packages are also required since the toolchain version is too old (note that this is a PPA, so use at your own risk):
sudo add-apt-repository ppa:ubuntu-toolchain-r/ppa
sudo apt install gcc-12 g++-12
Clone the repo:
git clone https://github.com/Edgeworth/memerna.git
cd memerna
Set up git submodules to pull in external dependencies:
git submodule update --init --recursive
memerna can be compiled directly with cmake. The following commands will build it for Ubuntu 2022.04 LTS:
CC=gcc-12 CXX=g++-12 cmake -B build -D CMAKE_BUILD_TYPE=Release .
cmake --build build
Alternatively, memerna can be compiled using the rnapy python helper script, if you have Python 3.11+ and poetry installed. This will output builds into the given prefix directory, which defaults to $HOME/bin/memerna.
poetry run python -m rnapy.run build --kind release
Optionally, you may set CPM_SOURCE_CACHE
to a directory to cache external
dependencies. This can be useful if you are building memerna with many separate
configurations.
Compiler | Version | Supported |
---|---|---|
GCC | <= 11 | ❌ |
GCC | 12 | ✅ |
GCC | 13 | ✅ |
Clang | <= 15 | ❌ |
Clang | 16 | ✅ |
Note that clang 14 and 15 will work with a sufficiently modern standard C++ library (but not gcc 11's, or libc++ 14 or 15's).
memerna has some build options that can be set with cmake (or passed via the rnapy helper build script) that may be useful for some users. See the cmake configuration for a full explanation.
- USE_MPFR: use MPFR for arbitrary precision floating point
- ENERGY_PRECISION: the number of decimal places to use for energy calculations
- FLOAT_PRECISION: the number of significant digits to use for floats
ENERGY_PRECISION is by default 2, meaning energies are specified to two decimal places. If you only need 1 decimal place, you can set this to 1 and performance will be better.
memerna supports minimum free energy (MFE) folding.
./fold GCGACCGGGGCUGGCUUGGUAA
There are several algorithms for MFE folding, which can be specified like so:
./fold --dp-alg sparse-opt GCGACCGGGGCUGGCUUGGUAA
Some of the algorithms are listed here:
Algorithm | Expected time | Expected memory | Example runtime (1000 nt) |
---|---|---|---|
sparse-opt | O(N^2) | O(N^2) | 0.18 seconds |
opt | O(N^3) | O(N^2) | 1.58 seconds |
debug | O(N^3) | O(N^2) | 3.87 seconds |
The sparse-opt algorithm is the default, and is the fastest algorithm.
memerna supports suboptimal folding. For example:
./subopt --ctd-output --subopt-delta 6 GCGACCGGGGCUGGCUUGGUAA
./subopt --subopt-delta 6 GCGACCGGGGCUGGCUUGGUAA
./subopt --subopt-strucs 7 GCGACCGGGGCUGGCUUGGUAA
./subopt --subopt-time-secs 2.5 GCGACCGGGGCUGGCUUGGUAA
There are several algorithms for suboptimal folding, which can be specified like so:
./subopt --subopt-alg iterative --ctd-output --subopt-delta 6 GCGACCGGGGCUGGCUUGGUAA
Some of the algorithms are listed here (where k is the number of structures produced). The example runtime is based on a 100 nt sequence and 1000000 structures.
Algorithm | Expected time | Expected memory | Example runtime |
---|---|---|---|
iterative | O(N^2 + k) | O(N^3) | 12.78 seconds |
persistent | O(N^2 + kN) | O(N^3 + k) | 2.71 seconds |
The iterative algorithm will be faster and use less memory for longer sequences. It's theoretically possible to implement it using O(N^2) memory trading off for worse time performance, but this is not currently implemented.
memerna supports computing the partition function. For example:
./partition GCGACCGGGGCUGGCUUGGUAA
There are several algorithms for the partition function, which can be specified like so:
./partition --pfn-alg opt GCGACCGGGGCUGGCUUGGUAA
Some of the algorithms are listed here:
Algorithm | Expected time | Expected memory | Example runtime (1000 nt) |
---|---|---|---|
opt | O(N^3) | O(N^2) | 9.23 seconds |
debug | O(N^3) | O(N^2) | 34.43 seconds |
./run_tests
poetry run python -m rnapy.run build --iwyu --no-build
Then from the cmake build directory: make -j$(nproc) 2> /tmp/iwyu.out
Then: iwyu-fix-includes --nocomments --blank_lines --nosafe_headers < /tmp/iwyu.out
General fuzzing:
./fuzz --mfe --mfe-table --subopt --pfn --random-models --energy-model t04 \
--backends base,baseopt 1 200
Exhaustive fuzzing:
./fuzz --mfe --mfe-table --subopt --pfn --energy-model t04 \
--backends base,baseopt --enumerate 1 10
Fuzzing for t22:
./fuzz --mfe --mfe-table --subopt --pfn --random-models --random-pf \
--energy-model t22 --backends stack 1 200
The below command runs fuzzing in parallel using GNU parallel with 16 jobs.
seq 16 | parallel -j 16 -n0 -u './fuzz --mfe --mfe-table --subopt --pfn \
--random-models --energy-model t04 --backends base,baseopt 1 200'
poetry run python -m rnapy.run build --kind relwithdebinfo --rnastructure --energy-precision 1
# Just MFE:
./fuzz -rd $MRNA/extern/rnastructure_bridge/data_tables/ --mfe \
--mfe-rnastructure --mfe-table --energy-model t04 --backends base,baseopt 1 200
Note that afl-fast seems to cause broken behaviour recently, compared to afl-lto. It's better to use a seeded energy model than random energy models for afl-fuzz since it will be more stable and easier to debug if it finds something.
poetry run python -m rnapy.run afl-fuzz --help
For example, try this command line:
poetry run python -m rnapy.run afl-fuzz --kind release --compiler afl-lto \
--num-procs 16 --mfe --mfe-table --seed 123 --max-len 200 \
--energy-model t22 --backends stack
To fuzz everything:
poetry run python -m rnapy.run afl-fuzz --kind release --compiler afl-lto \
--num-procs 16 --energy-precision 1 --mfe --mfe-rnastructure --mfe-table \
--subopt --max-len 200 --backends base,baseopt
Fuzz memerna only, with faster settings:
poetry run python -m rnapy.run afl-fuzz --kind release --compiler afl-lto \
--num-procs 16 --mfe --mfe-table --subopt --subopt-strucs 100 \
--subopt-delta 0.2 --seed 123 --max-len 200 --backends base,baseopt \
Checking progress:
afl-whatsup -s $PREFIX/memerna-afl/*/afl
Reproducing a crash:
cat ./afl/default/crashes/<crash> | ./fuzz --afl ...
Minimising test cases:
poetry run python -m rnapy.run afl-fuzz-min <crash-file>
Optionally, it can be useful to set the following variables, for example in a .env file (used by rnapy):
MRNA=<set to memerna source directory>
MEMEVAULT=${MRNA}/rnapy/data/memevault.db
MRNA_DIST=${HOME}/bin/memerna/relwithdebinfo-default-64-rnastructure
MRNA01_DIST=${HOME}/...
RNASTRUCTURE=${HOME}/...
SPARSEMFEFOLD=${HOME}/...
SPARSERNAFOLD=${HOME}/...
VIENNARNA=${HOME}/...
LINEARFOLD=${HOME}/...
Turner 1999 model (not implemented)
Turner 2004 model (t04):
- Adds coaxial stacking
- Lonely pairs are "soft disallowed" - only consider pairs where at least one of the adjacent two pairs could be made.
- Updates parameters for terminal mismatches, hairpin, bulge, internal, and multiloops.
- Special stacks of length > 2 base pairs are not handled.
- Internal loops are limited to size 30 in most implementations in memerna.
Update in 2012 model (t12):
- GU penalty removed from AU/GU penalty
- GU penalty removed from special hairpins, if closed by GU (none like this)
- Removed 0.45 portion of GU penalty from internal loop GU penalty.
- Stacking parameters changed (for GU stacks, not WC)
- See "Testing the nearest neighbor model for Canonical RNA base pairs" paper
Update in 2022 model (t22):
- AU penalty removed as well
- AU penalty removed from special hairpins, if closed by AU
- Lonely pairs implemented fully correctly.
- Removed 0.5 (not 0.45) portion of AU penalty from internal loop AU penalty.
- Stacking parameters changed
- Sequence dependent parameters for terminal base pairs based on the penultimate
pair (penultimate_stacking.data)
- Bulge loops of size 1 are continuous and don't have penultimate stacking applied inside (same as AU/GU penalty rules).
- Coaxial stacks are not treated as continuous for penultimate stacking (same as AU/GU penalty rules).
- See "Nearest neighbor rules for RNA helix folding thermodynamics: improved end effects"
- bulge_initiation.data
- dangle3.data
- dangle5.data
- hairpin.data Special hairpins. AU/GU penalties NOT baked in.
- hairpin_initiation.data
- internal_1x1.data 1x1 special internal loops. AU/GU penalties NOT baked in. N.B. internal loops are never treated as continuous.
- internal_1x2.data 2x1 special internal loops. AU/GU penalties NOT baked in.
- internal_2x2.data 2x2 special internal loops. AU/GU penalties NOT baked in.
- internal_2x3_mismatch.data 2x3 internal loop terminal mismatch parameters. AU/GU penalties NOT baked in.
- internal_initiation.data
- internal_other_mismatch.data Non 2x3 internal loop terminal mismatch parameters. AU/GU penalties NOT baked in.
- misc.data
- stacking.data
- terminal.data
In all cases where an ordering of base_t p is used (e.g. data tables), it will be ACGU.
For any commercial applications of this software, please contact the author for a license.