Online-Multiple-Kernel-Learning

Authors: Lu Jing, Wu Yue, Steven Hoi

Contact: chhoi@ntu.edu.sg, jing.lu.2014@phdis.smu.edu.sg

This is a package for solving large scale online multiple kernel learning tasks. The current version is in C++ and has a group of online multiple kernel learning algorithm for binary classification, which are all widely used in online kernel learning research. This is a part of my phd thesis.

This is a follow-up work for our research paper "large scale online kernel learning" in JMLR that deals with single kernel learning tasks. For details about the single kernel learning algorithms, please refer to https://github.com/LIBOL/KOL

The algorithms in this package includes:

kernel-dd, the deterministic online multiple kernel learning algorithm, no budget, Hedge combination.
kernel-ddave, the deterministic online multiple kernel learning algorithm, no budget, simple uniform combination.

3.kernel-sd, the stochastic-update algorithm for online multiple kernel learning algorithm, no budget, hedge combination. http://ink.library.smu.edu.sg/cgi/viewcontent.cgi?article=3294&context=sis_research

kerne-single, single kernel online learning, perceptron. http://cseweb.ucsd.edu/~yfreund/papers/LargeMarginsUsingPerceptron.pdf
kernel-ddpa, kernel-sdpa,

The deterministic online multiple kernel learning algorithm and the stochastic-update algorithm for online multiple kernel learning algorithm. Hedge combination. Each component classifier is updated in PA mode. http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf

kernel-bogd, kernel-bogd_sd

The deterministic online multiple kernel learning algorithm and the stochastic-update algorithm for online multiple kernel learning algorithm. Hedge combination. Each component classifier is updated in BOGD mode http://arxiv.org/ftp/arxiv/papers/1206/1206.4633.pdf

kernel-rbp, kernel-rbp_sd

The deterministic online multiple kernel learning algorithm and the stochastic-update algorithm for online multiple kernel learning algorithm. Hedge combination. Each component classifier is updated in rbp mode http://air.unimi.it/bitstream/2434/26350/1/J29.pdf

kernel-bpas, kernel-bpas_sd

The deterministic online multiple kernel learning algorithm and the stochastic-update algorithm for online multiple kernel learning algorithm. Hedge combination. Each component classifier is updated in bpas mode. http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_WangV10.pdf

kernel-forgetron, kernel-forgetron_sd

The deterministic online multiple kernel learning algorithm and the stochastic-update algorithm for online multiple kernel learning algorithm. Hedge combination. Each component classifier is updated in forgetron mode.

http://papers.nips.cc/paper/2806-the-forgetron-a-kernel-based-perceptron-on-a-fixed-budget.pdf

kernel-fogd. The proposed multiple kernel fourier gradient descent algorithm. Hedge Combination

http://jmlr.org/papers/v17/14-148.html

kernel-nogd. The proposed multiple kernel Nystrom gradient descent algorithm

http://jmlr.org/papers/v17/14-148.html

kernel-spasd The sparse passive aggressive algorithm for multiple kernel learning.

http://jingonline.weebly.com/uploads/5/3/7/3/53733905/spa.pdf

If you need to use this code package, please cite our paper as:

Lu J, Hoi S C H, Wang J, et al. Large scale online kernel learning[J]. Journal of Machine Learning Research, 2016, 17(47): 1.

or bib:

@article{lu2016large, title={Large scale online kernel learning}, author={Lu, Jing and Hoi, Steven CH and Wang, Jialei and Zhao, Peilin and Liu, Zhi-Yong}, journal={Journal of Machine Learning Research}, volume={17}, number={47}, pages={1}, year={2016}, publisher={Journal of Machine Learning Research/Microtome Publishing} }

Installation

The installation is the same as our KOL project. Refer to this for details. https://github.com/LIBOL/KOL After building, we get an executable file KOL and use it in command line.

Prepare for the input data

We use the LIBSVM dataset formate, which is an effcient sparse data representation as input. Each instance in the dataset is represented by a row of numbers ended by "\n". For example:

+1 5:1 16:1 20:1 37:1 40:1 63:1 68:1 73:1 74:1 76:1 82:1 93:1

-1 2:1 6:1 18:1 19:1 39:1 40:1 52:1 61:1 71:1 72:1 74:1 76:1 80:1 95:1

In the above dataset, there are 2 instances stored in two rows. Each row begins with the class label of this instance. In binary classification the label appears in two forms: {+1, -1}. Note that some dataset files might be labeled with {0, 1}, which is not allowed by our toolbox. They have to be preprocessed and transformed to the {-1,+1} formate. Following the label, the feature values appears in form feature_index:feature_value. This is a sparse feature representation. If one certain feature index does not appear, it indicates that its value is zero.

Our toolbox is well designed to follow the standard online learning setting and load the dataset sequentially. So there is no memory limitation at all for large scale datasets. Users are not required to input the feature dimension of the dataset before training, since the algorithm will automaticly adjust to the increase of feature dimension.

Command Line

After compiling the code of the toolbox and getting the executable file "KOL", we can use command line mode to run the algorithms:

KOL -i training_dataset -opt algorithm_name [parameter setting]

KOL is the name of the executable file we got from compiling the code. -i training_dataset is a necessary input indicating the training dataset name. -opt algorithm_name is another necessary input indicating the selected algorithm for learning. Parameter setting is also optional and diverses among different algorithms. If not indicated, the algorithm will use default setting.

A quick example:

We may download the a9a datasets and perform the online kernel learning using the perceptron algorithm. We try the following command line:

KOL -i a9a_train -opt kernel-dd

The ourput is as followings:

0 10000 20000 30000

#Training Instances:32561

Learn errRate: 0.19065754

Learning time: 513.57000732 s

The second line indicates the number of processed training samples until now, which can give an intuitive impression of the processing speed. This is a necessary output in the case when the training time is extremely long. The output includes the training accuracy, training time cost (including loading time), test time (including loading time).

Parameter Setting:

-lambda (0.1) regulizer for bogd

-eta (0.1) learning rate

-alpha ( 1 ) alpha for SPA

-beta ( 3 ) beta for SPA

-gamma (0.99) discount for Hedge

-delta (0.001) smoothing parameter for stochastic update

-k (20) nogd dimension

-B (100) budget size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Online-Multiple-Kernel-Learning

Files

README.md

Latest commit

History

README.md

File metadata and controls

Online-Multiple-Kernel-Learning