TinyClassifier

a tiny machine learning library for scripting

Homepage: http://github.com/whym/tinyclassifier

Contact: http://whym.org

Overview

Tuning machine-learning based systems is an art. You need to do try-and-error again and again. In such development cycles, you might want to avoid using such a language like C++; it can be painful to spend the time for solving compilation errors.

Using scripting language is good for rapid programming. But it's not good for the CPU-intensive calculations, which should be optimized and/or parallelized by compilers.

TinyClassifier tries to fill the gap between a high-performance inner implementation and application programs that use it. With TinyClassifier, you don't need to use pipes or temporary files nor to encode/decode feature vectors into/from strings. Instead, you will have transparent access to the data structures and class libraries implemented in C++ from scripting languages like Python or Ruby.

TinyClassifier is a fast and flexible machine learning library that provides you:

Small and self-contained software package of machine learning with minimum dependency to external libraries
Reasonably efficient and readable implementation as C++ header libraries
Language bindings to Ruby, Perl, Python, etc. via SWIG

On the other hand, TinyClassifier is not for those people

Who want the best accuracy and efficiency of machine learning.
Who can productively implement anything in C++.

Implemented algorithms

Averaged Perceptron for binary classification

Non-kernelized version and kernelized [1] version are implemented (currently polynomial kernel only).

[1]

The implementation of 'PKPerceptron' is based on Ling-Pipe's explanation of Kernel Averaged Perceptron and descriptions in the thesis of Dr. Hal Daumé III. See below for further information.

Planned to implement

Maximum entropy classifier with Stochastic Gradient Descent algorithm
Complementary naive Bayes classifier

Requirements

Following softwares are required.

gcc and g++ 4.3 (possibly gcc 4.x)
swig 1.3.35 (possibly swig 1.3.x)
make
makedepend

The development environment needs to be prepared for each language you will use with TinyClassifier. [2] Currently, language bindings are maintained for the languages below.

Python
Perl 5
Ruby
Java

[2]

For every language you wish to have the TinyClassifier library, you need to prepare development environment; normally you need to set up runtimes, compilers and API files appropriately. Note that they sometimes are provided separately. For example, you may need to install something like 'libruby' or 'sun-jdk-*' to have API files installed.

Build & Install

C++ library

(Optional) Type 'make' at the top directory of tinyclassifier and the tests will run.
Copy the header files in 'include/' to an appropriate directory included in CPATH.

Recommended compile options

For a multicore processor,

CXXFLAGS="-ftree-vectorizer-verbose=1 -msse2 -ftree-vectorize -O3"

SWIG bindings

Type 'make -C swig' at the top directory of tinyclassifier.
(for Ruby) Go to the directory 'swig/ruby' and type 'make

install'.

(Alternatively) Copy 'swig/ruby/TinyClassifier.so' to somewhere included in RUBYLIB.

For some language bindings, you might have to manually install library files.

# Details to be written

Sample codes

Python

from TinyClassifier import *

# Prepare examples
SAMPLES = sorted([
    [[-2, +1, -1], +1],
    [[-1, +2, +1], +1],
    [[-1, -1, -1], -1],
    [[+1, +1, -1], +1],
    [[-1, +1, -1], +1],
    [[+1, -2, -1], -1],
    [[+1, -1, +1], -1]
    ], key=lambda x: x[0])

vecs = [x[0] for x in SAMPLES] # Obtain feature vectors
labs = [x[1] for x in SAMPLES] # Obtain labels
print vecs
p = IntPKPerceptron(len(SAMPLES[0]), 10) # Construct a perceptron that stops after 10 iterations
p.train(IntVectorVector(vecs),           # Give the training examples to the perceptron
        IntVector(labs))
for (i, k) in enumerate(vecs):  # Print the prediction for the training examples (closed set evaluation)
    pred = p.predict(k)
    print "%d: %f" % (SAMPLES[i][1], pred)

Ruby

require 'TinyClassifier'
include TinyClassifier

# Prepare examples
SAMPLES = {
  [-2, +1, -1] => +1,
  [-1, +2, +1] => +1,
  [-1, -1, -1] => -1,
  [+1, +1, -1] => +1,
  [-1, +1, -1] => +1,
  [+1, -2, -1] => -1,
  [+1, -1, +1] => -1
}

vecs   = SAMPLES.keys.sort        # Obtain feature vectors
labels = SAMPLES.values_at(*vecs) # Obtain labels
p = IntPKPerceptron.new(SAMPLES.vecs[0].length, 10) # Construct a perceptron that stops after 10 iterations
p.train(IntVectorVector.new(vecs),                  # Give the training examples to the perceptron
        IntVector.new(labels))
keys.each do |k|          # Print the prediction for the training examples (closed set evaluation)
  pred = p.predict(k)
  puts "#{SAMPLES[k]}: #{pred}"
end

See the tests included in the package for further examples. Tests are located at 'test', 'swig/ruby/test', etc.

Known bugs

The implementation of projectron is wrong. It has to be fixed referring [3] and [4]

[3]	http://portal.acm.org/citation.cfm?id=1755875

[4]	http://portal.acm.org/citation.cfm?id=1390247

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
include		include
lib		lib
swig		swig
test		test
.gitignore		.gitignore
Makefile		Makefile
README.rst		README.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyClassifier

a tiny machine learning library for scripting

Overview

Implemented algorithms

Planned to implement

Requirements

Build & Install

C++ library

SWIG bindings

Sample codes

Known bugs

About

Releases

Packages

Languages

Homepage:	http://github.com/whym/tinyclassifier
Contact:	http://whym.org

whym/tinyclassifier

Folders and files

Latest commit

History

Repository files navigation

TinyClassifier

a tiny machine learning library for scripting

Overview

Implemented algorithms

Planned to implement

Requirements

Build & Install

C++ library

SWIG bindings

Sample codes

Known bugs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages