Skip to content

Large Scale Machine learning Optimization through Stochastic Average Gradient

License

Notifications You must be signed in to change notification settings

IshmaelBelghazi/bigoptim

Repository files navigation

BigOptim – Large Scale Finite Sums Cost functions Optimization for R

https://travis-ci.org/IshmaelBelghazi/bigoptim.svg https://coveralls.io/repos/IshmaelBelghazi/bigoptim/badge.svg?branch=master&service=github

Description

BigOptim is an R package that implements the Stochastic Average Gradient(SAG)[1] optimization method. For strongly convex problems, SAG achieves batch gradient descent convergence rates while keeping the iteration complexity of stochastic gradient descent. This allows for efficient training of machine learning algorithms with convex cost functions.

Setup

install.packages("devtools")
devtools::install_github("hadley/devtools")  ## Optional
devtools::install_github("IshmaelBelghazi/bigoptim")

Example: Fit with Linesearch

## Loading Data set
data(covtype.libsvm)
## Normalizing Columns and adding intercept
X <- cbind(rep(1, NROW(covtype.libsvm$X)), scale(covtype.libsvm$X))
y <- covtype.libsvm$y
y[y == 2] <- -1
## Setting seed
#set.seed(0)
## Setting up problem
maxiter <- NROW(X) * 10  ## 10 passes throught the dataset
lambda <- 1/NROW(X) 
sag_ls_fit <- sag_fit(X=X, y=y, lambda=lambda,
                      maxiter=maxiter, 
                      tol=1e-04, 
                      family="binomial", 
                      fit_alg="linesearch",
                      standardize=FALSE)
## Getting weights
weights <- coef(sag_ls_fit)
## Getting cost
cost <- get_cost(sag_ls_fit)

Example: Demo – Monitoring gradient norm

demo("monitoring_training")

misc/readme/grad_norm_covtype.png

Runtime comparison

Ran on intel i7 4710HQ 16G with intel MKL and compilers.

demo("run_times")

Dense dataset: Logistic regression on covertype

Logistic Regression on Covertype – 581012 sample points, 55 variables

constantlinesearchadaptiveglmnet
Cost at optimum0.5136030.5134970.5136760.513693
Gradient L2 norm at optimum0.0013610.0011200.0077130.001806
Approximate gradient L2 norm at optimum0.0017940.0001460.000214NA
Time(seconds)1.9302.3928.0578.749

Sparse dataset: Logistic regression on rcv1_train

Logistic Regression on RCV1_train – 20242 sample points, 47237 variables

constantlinesearchadaptiveglmnet
Cost at optimum0.0463390.0463390.0463390.046342
Gradient L2 norm at optimum3.892572e-074.858723e-076.668943e-107.592185e-06
Approximate gradient L2 norm at optimum3.318267e-074.800463e-072.647663e-10NA
Time(seconds)0.8140.8721.3684.372

References

[1] Mark Schmidt, Nicolas Le Roux, and Francis Bach. Minimizing Finite Sums with the Stochastic Average Gradient. arXiv:1309.2388 [cs, math, stat], September 2013. arXiv: 1309.2388. [ bib | http ]

About

Large Scale Machine learning Optimization through Stochastic Average Gradient

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published