Skip to content

Scalable inference of gene regulatory networks using Apache Spark and XGBoost

License

Notifications You must be signed in to change notification settings

leqi0001/GRNBoost

 
 

Repository files navigation

GRNBoost

Build Status

Introduction

GRNBoost is a library built on top of Apache Spark that implements a scalable strategy for gene regulatory network (GRN) inference.

Inferring a gene regulatory network (GRN) from gene expression data is a computationally expensive task, exacerbated by increasing data sizes due to advances in high-throughput gene profiling technology.

GRNBoost was inspired by GENIE3, a popular algorithm for GRN inference. GENIE3 breaks up the inference problem into a number of tree-based ensemble (Random Forest) nonlinear regressions, building a predictive model for the expression profile of each gene in the dataset in function of the expression profiles of a collection of candidate regulatory genes (transcription factors). The regression models act as a feature selection mechanism, they yield the most predictive regulators for the target genes as candidate links in the resulting gene regulatory network.

GRNBoost adopts GENIE3's algorithmic blueprint and aims at improving its runtime performance and data size capability. GRNBoost does this by reframing the GENIE3 multiple regression approach into an Apache Spark MapReduce-style pipeline, and by replacing the regression algorithm by the current state-of-the-art among tree-based machine learning algorithms, a Gradient Boosting variant called xgboost.

Getting Started

License

GRNBoost is available via the 3-Clause BSD license.

References

GRNBoost was developed at the Laboratory of Computational Biology (Stein Aerts) as an optional component for the SCENIC workflow.

Sara Aibar, Carmen Bravo González-Blas, Thomas Moerman, Vân Anh Huynh-Thu, Hana Imrichova, Gert Hulselmans, Florian Rambow, Jean-Christophe Marine, Pierre Geurts, Jan Aerts, Joost van den Oord, Zeynep Kalender Atak, Jasper Wouters & Stein Aerts SCENIC: single-cell regulatory network inference and clustering. Nature Methods (2017) doi:10.1038/nmeth.4463

About

Scalable inference of gene regulatory networks using Apache Spark and XGBoost

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 95.0%
  • Java 3.7%
  • Shell 1.3%