BigOptim is an R package that implements the Stochastic Average Gradient(SAG)[1] optimization method. For strongly convex problems, SAG achieves batch gradient descent convergence rates while keeping the iteration complexity of stochastic gradient descent. This allows for efficient training of machine learning algorithms with convex cost functions.
install.packages("devtools")
devtools::install_github("hadley/devtools") ## Optional
devtools::install_github("IshmaelBelghazi/bigoptim")
## Loading Data set
data(covtype.libsvm)
## Normalizing Columns and adding intercept
X <- cbind(rep(1, NROW(covtype.libsvm$X)), scale(covtype.libsvm$X))
y <- covtype.libsvm$y
y[y == 2] <- -1
## Setting seed
#set.seed(0)
## Setting up problem
maxiter <- NROW(X) * 10 ## 10 passes throught the dataset
lambda <- 1/NROW(X)
sag_ls_fit <- sag_fit(X=X, y=y, lambda=lambda,
maxiter=maxiter,
tol=1e-04,
family="binomial",
fit_alg="linesearch",
standardize=FALSE)
## Getting weights
weights <- coef(sag_ls_fit)
## Getting cost
cost <- get_cost(sag_ls_fit)
demo("monitoring_training")
Ran on intel i7 4710HQ 16G with intel MKL and compilers.
demo("run_times")
Logistic Regression on Covertype – 581012 sample points, 55 variables
constant | linesearch | adaptive | glmnet | |
---|---|---|---|---|
Cost at optimum | 0.513603 | 0.513497 | 0.513676 | 0.513693 |
Gradient L2 norm at optimum | 0.001361 | 0.001120 | 0.007713 | 0.001806 |
Approximate gradient L2 norm at optimum | 0.001794 | 0.000146 | 0.000214 | NA |
Time(seconds) | 1.930 | 2.392 | 8.057 | 8.749 |
Logistic Regression on RCV1_train – 20242 sample points, 47237 variables
constant | linesearch | adaptive | glmnet | |
---|---|---|---|---|
Cost at optimum | 0.046339 | 0.046339 | 0.046339 | 0.046342 |
Gradient L2 norm at optimum | 3.892572e-07 | 4.858723e-07 | 6.668943e-10 | 7.592185e-06 |
Approximate gradient L2 norm at optimum | 3.318267e-07 | 4.800463e-07 | 2.647663e-10 | NA |
Time(seconds) | 0.814 | 0.872 | 1.368 | 4.372 |
[1] Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388 [cs, math, stat], September 2013. arXiv: 1309.2388. [ bib | http ]