Skip to content

soedinglab/bipartite_motif_finder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BMF: Thermodynamic model for de novo bipartite RNA motif discovery

License Issues

BMF (Bipartite Motif Finder) is an open source tool for finding co-occurences of sequence motifs in genomic sequences.

BMF is also available as a webserver:

Publication

Sohrabi-Jahromi S. and Söding J. Thermodynamic model reveals most RNA-bindingproteins prefer simple and repetitive motifs, bioRxiv 2021.

Notebooks used to generate the analyses in the manuscript are available at soedinglab/bmf-paper.

Documentation

A more comprehensive BMF user guide is available in our GitHub Wiki. For questions please open an issue on GitHub.

Installation

Requirements

  • python>3.6
  • numpy
  • cython

Installing requirements with Conda:

Create a new conda environment with python, numpy, and cython:

conda create -n bmf python=3.6 numpy cython
conda activate bmf

Installing requirements on Ubuntu without Conda:

sudo apt-get update
sudo apt-get install python3.6 python3-pip
pip3 install numpy cython

Installing requirements on MacOS with brew:

brew install python3
pip install numpy cython

BMF installation:

  1. Optional: BMF is also available as a faster version for running on AVX2 extension capable processor. You can check if AVX2 is supported by executing cat /proc/cpuinfo | grep avx2 on Linux and sysctl -a | grep machdep.cpu.leaf7_features | grep AVX2 on MacOS). If your processor supports AVX2, run the following command to compile a faster version of BMF:
export USE_AVX=1
  1. Install BMF with pip:
pip install https://github.com/soedinglab/bipartite_motif_finder/releases/download/v1.0.0a/bmf_tool-1.0.0.tar.gz

See BMF help page:

bmf --help

Usage

Please refer to our GitHub Wiki for a more detailed description of BMF and all its input parameters. In the following we provide an example workflow.

bmf [-h] [--BGsequences BGSEQUENCES | --predict]
       [--input_type {fasta,fastq,seq}]
       [--model_parameters MODEL_PARAMETERS] [--motif_length MOTIF_LENGTH]
       [--no_tries NO_TRIES] [--output_prefix OUTPUT_PREFIX]
       [--var_thr VAR_THR] [--batch_size BATCH_SIZE]
       [--max_iterations MAX_ITERATIONS] [--no_cores NO_CORES]
       sequences

Example workflow

You can find the fasta files needed to run this example in data directory. Here we run BMF with one random parameter initialization. You can change the --no_tries to increase the number of BMF runs with new initial parameter values. The best likelihood solution would be used in this case to plot the BMF logo, and to predict binding to new sequences.

Motif discovery

You can use bmf in training mode for de novo motif discovery. By default, BMF runs over a maximum of 1000 iterations.

bmf positives_AAA_CCC.fasta --BGsequences negatives_AAA_CCC.fasta --input_type fasta --output_prefix AAA_CCC --motif_length 3  --no_tries 1

Getting sequence logo

You can use bmf_logo to plot the best likelihood motif model generated by BMF. Specify the output_prefix from the previous step to allow bmf_logo to find all associated parameter files. Here we use AAA_CCC to specify the outputs from the previous run:

bmf_logo AAA_CCC --motif_length 3

Predicting binding to new sequences

You can use the trained BMF model parameters to predict binding scores for new sequences. To specify --model_parameters, use the output_prefix from the first step (here AAA_CCC).

bmf test_sequences.fasta  --predict --input_type fasta --model_parameters AAA_CCC --output_prefix predict_test_sequences