pbdXGB

License:
Download:
Status:
Author: See section below.

In pbdXGB, we write programs in the "Single Program/Multiple Data" or SPMD style, typically executed via pbdMPI and MPI. Each process (MPI rank) gets runs the same copy of the program as every other process, but operates on its own data. The results will be collected, compared, and distributed to each process via MPI allreduce-like calls.

Usage

Below is a basic program siminar to the original "xgboost" example, but in SPMD concepts using MPI or pbdMPI:

suppressMessages(library(pbdMPI, quietly = TRUE))
suppressMessages(library(pbdXGB, quietly = TRUE))
init()

### Commonly owned all training data
data(agaricus.train, package = 'xgboost')
train.all <- agaricus.train
train.mat.all <- as.matrix(train.all$data)
label.all <- train.all$label

### Locally owned distributed training data
train.rows <- get.jid(nrow(train.mat.all))
train.mat <- train.mat.all[train.rows, ]
label <- label.all[train.rows]

### Train the model from distributed training data
mdl <- xgboost(data = train.mat, label = label,
               max.depth = 2, eta = 1, nthread = 1,
               nrounds = 10, objective = 'binary:logistic',
               verbose = 0)
comm.print(mdl$evaluation_log$train_error, all.rank = TRUE)

### Train with xgboost::xgboost() on all training data
if (comm.rank() == 0){
  mdl.all <- xgboost::xgboost(data = train.mat.all, label = label.all,
                              max.depth = 2, eta = 1, nthread = 2,
                              nrounds = 10, objective = 'binary:logistic',
                              verbose = 0)
  print(mdl.all$evaluation_log$train_error)
}


### Commonly owned all testing data
data(agaricus.test, package = 'xgboost')
test.all <- agaricus.test
test.mat.all <- as.matrix(test.all$data)

### Locally owned distributed testing data
test.rows <- get.jid(nrow(test.mat.all))
test.mat <- test.mat.all[test.rows, ]

### Predict the distributed testing data
pmdl <- predict(mdl, test.mat)
comm.print(pmdl[1:5], all.rank = TRUE)  # First five only

### Predict all testing data
if(comm.rank() == 0){
  pmdl.all <- predict(mdl.all, test.mat.all)

  tmp.rows <- get.jid(nrow(test.mat.all), all = TRUE)
  print(lapply(tmp.rows, function(x) pmdl.all[x[1:5]]))
}

finalize()

Save the code in a file, say, mpi_xgb.r and run it in 2 processes via:

mpirun -np 2 Rscript mpi_xgb.r

Installation

pbdXGB requires

R version 3.0.0 or higher
A system installation of MPI:
- SUN HPC 8.2.1 (OpenMPI) for Solaris
- OpenMPI for Linux
- OpenMPI for Mac OS X
- MS-MPI for Windows
pbdR/R packages:
- pbdMPI
- xgboost

Authors

pbdXGB is authored and maintained by the pbdR core team:

Wei-Chen Chen
Drew Schmidt

With additional contributions from:

Tianqi Chen (xgboost implementation)
Tong He (xgboost implementation)
xgboost R authors and contributors (some functions are modified from xgboost package)
XGBoost contributors (base XGBoost implementation)
The R Core team (some functions are modified from R)

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
R		R
man		man
src		src
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
.travis.yml		.travis.yml
ChangeLog		ChangeLog
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
appveyor.yml		appveyor.yml
cleanup		cleanup
configure		configure
configure.ac		configure.ac
configure.win		configure.win

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pbdXGB

Usage

Installation

Authors

About

Releases

Packages

Languages

License

RBigData/pbdXGB

Folders and files

Latest commit

History

Repository files navigation

pbdXGB

Usage

Installation

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages