BMF (Bipartite Motif Finder) is an open source tool for finding co-occurences of sequence motifs in genomic sequences.
BMF is also available as a webserver:
- Link: bmf.soedinglab.org
- Web server repository: soedinglab/bmf-webserver
Notebooks used to generate the analyses in the manuscript are available at soedinglab/bmf-paper.
A more comprehensive BMF user guide is available in our GitHub Wiki. For questions please open an issue on GitHub.
python>3.6
numpy
cython
Create a new conda environment with python
, numpy
, and cython
:
conda create -n bmf python=3.6 numpy cython
conda activate bmf
sudo apt-get update
sudo apt-get install python3.6 python3-pip
pip3 install numpy cython
brew install python3
pip install numpy cython
- Optional: BMF is also available as a faster version for running on AVX2 extension capable processor. You can check if AVX2 is supported by executing
cat /proc/cpuinfo | grep avx2
on Linux andsysctl -a | grep machdep.cpu.leaf7_features | grep AVX2
on MacOS). If your processor supports AVX2, run the following command to compile a faster version of BMF:
export USE_AVX=1
- Install BMF with pip:
pip install https://github.com/soedinglab/bipartite_motif_finder/releases/download/v1.0.0a/bmf_tool-1.0.0.tar.gz
See BMF help page:
bmf --help
Please refer to our GitHub Wiki for a more detailed description of BMF and all its input parameters. In the following we provide an example workflow.
bmf [-h] [--BGsequences BGSEQUENCES | --predict]
[--input_type {fasta,fastq,seq}]
[--model_parameters MODEL_PARAMETERS] [--motif_length MOTIF_LENGTH]
[--no_tries NO_TRIES] [--output_prefix OUTPUT_PREFIX]
[--var_thr VAR_THR] [--batch_size BATCH_SIZE]
[--max_iterations MAX_ITERATIONS] [--no_cores NO_CORES]
sequences
You can find the fasta files needed to run this example in data
directory. Here we run BMF with one random parameter initialization. You can change the
--no_tries
to increase the number of BMF runs with new initial parameter values. The best likelihood solution would be used in this case to plot the BMF logo, and to predict binding to new sequences.
You can use bmf
in training mode for de novo motif discovery. By default, BMF runs over a maximum of 1000 iterations.
bmf positives_AAA_CCC.fasta --BGsequences negatives_AAA_CCC.fasta --input_type fasta --output_prefix AAA_CCC --motif_length 3 --no_tries 1
You can use bmf_logo
to plot the best likelihood motif model generated by BMF. Specify the output_prefix
from the previous step to allow bmf_logo
to find all associated parameter files. Here we use AAA_CCC
to specify the outputs from the previous run:
bmf_logo AAA_CCC --motif_length 3
You can use the trained BMF model parameters to predict binding scores for new sequences. To specify --model_parameters
, use the output_prefix
from the first step (here AAA_CCC
).
bmf test_sequences.fasta --predict --input_type fasta --model_parameters AAA_CCC --output_prefix predict_test_sequences