Skip to content

EpistasisLab/EBIC.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

85 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EBIC.jl

EBIC is a next-generation biclustering algorithm based on artificial intelligence. EBIC is probably the first algorithm capable of discovering the most challenging patterns (i.e. row-constant, column-constant, shift, scale, shift-scale and trend-preserving) in complex and noisy data with average accuracy of over 90%. It is also one of the very few parallel biclustering algorithms that use at least one graphics processing unit (GPU) and is ready for big-data challenges.

The repository contains the new version of EBIC rewritten and improved.

Requirements

  • Julia 1.6 or higher
  • CUDA-capable GPU with compute capability 5.0 (Maxwell) or higher

Setting up project

  1. Clone the project.

  2. Start Julia in the repository root.

julia --project
  1. Enter the pkg mode (hit ]) and install all dependencies.
(EBIC) pkg> instantiate
  1. Start a quick test on data/example_input.csv to make sure everything works.
julia> using EBIC
julia> run_ebic("data/example_input.csv")
Progress: 100%|████████████████████| Time: 0:00:23 (11.68 ms/it)
Dict{String, Any} with 4 entries:
  "biclusters"     => [Dict("rows"=>[31, 32, 33, 
  "num_iterations" => 732
  "algorithm_time" => 33.9547

or provide ground truth to get biclustering metrics right away:

julia> using EBIC
julia> run_ebic("data/unibic/narrow_bic/narrow_100_10/narrow_100_10_data1.txt",
    "data/unibic/narrow_bic/narrow_100_10/narrow_100_10_data1_hiddenBics.txt")
Progress: 100%|████████████████████| Time: 0:00:27 (13.65 ms/it)
Dict{String, Any} with 6 entries:
  "recovery"       => 1.0
  "relevance"      => 1.0
  "ce"             => 1.0
  "biclusters"     => [Dict("rows"=>[201, 202, 203, 204, 2
  "num_iterations" => 758
  "algorithm_time" => 27.3088

For more information check: ?run_ebic.

Getting more test data

We provide three ready-to-use datasets which can be fetched from a remote DVC repository on Google Drive:

  • Unibic (69MB)
  • RecBic Maintext (211MB)
  • RecBic Sup (7.8GB)

DVC needs to be installed on your system to download the data.

pip install dvc[gdrive]

Run dvc pull to fetch all three datasets, whereas to download a particular one use dvc pull <path_to_dataset_dvc>, e.g., dvc pull data/unibic.dvc.

When using DVC for the first time in a repository, one must authenticate with their Google account following instructions given by DVC.

Ready-to-use benchmarks

Benchmarks:

The results of all benchmarks for a specific seed are available through DVC (dvc pull data/results.dvc).

To test all three datasets at once run the following:

julia> using EBIC
julia> benchmark_all()
┌ Info: #############################
│ TEST GROUP: 'narrow_100_10'###################################
┌ Info: #############################
│ Test case  : narrow_100_10_data1.txt
└ Groundtruth: narrow_100_10_data1_hiddenBics.txt
Progress: 100%|████████████████████| Time: 0:00:59
┌ Info: Metrics:
│ Prelic relevance   : 1.0
│ Prelic recovery    : 1.0
└ Clustering error   : 1.0

The above is the same as running the three functions one after another:

benchmark_unibic()
benchmark_recbic_maintext()
benchmark_recbic_sup()

The test results are saved in results/EBIC.jl folder in the repository root directory by default, a different result path can be specified as an argument (e.g., benchmark_unibic(out_dir = "new_results")).

Cite us

@inproceedings{10.1145/3449726.3463197,
    author = {Renc, Pawe\l{} and Orzechowski, Patryk and Byrski, Aleksander and W\u{a}s, Jaros\l{}aw and Moore, Jason H.},
    title = {EBIC.JL: An Efficient Implementation of Evolutionary Biclustering Algorithm in Julia},
    year = {2021},
    isbn = {9781450383516},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3449726.3463197},
    doi = {10.1145/3449726.3463197},
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
    pages = {1540–1548},
    numpages = {9},
    keywords = {data mining, biclustering, parallel algorithms, evolutionary computation, machine learning},
    location = {Lille, France},
    series = {GECCO '21}
}

@inproceedings{10.1145/3449726.3462739,
    author = {Renc, Pawe\l{} and Orzechowski, Patryk and Byrski, Aleksander and W\k{a}s, Jaros\l{}aw and Moore, Jason H.},
    title = {Rapid Prototyping of Evolution-Driven Biclustering Methods in Julia},
    year = {2021},
    isbn = {9781450383516},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3449726.3462739},
    doi = {10.1145/3449726.3462739},
    booktitle = {Proceedings of the Genetic and Evolutionary Computation Conference Companion},
    pages = {61–62},
    numpages = {2},
    keywords = {biclustering, evolutionary computation, parallel algorithms, data mining, machine learning},
    location = {Lille, France},
    series = {GECCO '21}
}