GitHub - gwf/NODElib: The Neural Optimization Development Engine Library

gwf / NODElib Public
Notifications You must be signed in to change notification settings
Fork 2
Star 7
The Neural Optimization Development Engine Library
Notifications
Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
etc		etc
examples		examples
html		html
include		include
lib		lib
man		man
src		src
test		test
GNUmakefile		GNUmakefile
Makefile		Makefile
README		README
configure		configure
Repository files navigation

                             NODElib Documentation

   By Gary William Flake

NAME

   nodelib.h - the Neural Optimization Development Engine library

SYNOPSIS

   NODElib (which stands for Neural Optimization Development Engine
   library) is a programming library for rapidly developing powerful
   neural network simulations. Very few assumptions have been made
   regarding the system in which the code is to be used; thus, NODElib is
   suitable as a back-end engine for most applications.

   But wait, there's more.

   NODElib is general enough that you can probably find many other uses
   for it. The code is extremely modular, compact, and robust. It is
   written in an object oriented manner. All of the library code, example
   and test program source, documentation, and supporting text is only on
   the order of about 20,000 lines, which means that NODElib is extremely
   compact. This is important from the point of view of comprehending the
   code, and for the memory requirements of the library.

   To use NODElib, you need only use the include directive below, and
   compile your program with a '-lnode -lm'.

   #include <nodelib.h>

DESCRIPTION

Quick Start

   To get up and running with NODElib as fast as possible, consult the
   examples directory, which contains several source code examples for
   using NODElib in a variety of ways. In general, simple tasks will
   require simple programs, and complex tasks will require slightly
   complex programs.

   The Makefile in the examples directory can also be used as a template
   for your own Makefile.

   Keep in mind that the examples attempt to show a wide range of NODElib
   functionality; thus, you may do well to take an example program and
   strip out the functionality that you do not need (e.g., advance
   optimization or data source hooks, etc.).

   The next section gives a bird's eye view of NODElib, with the aim of
   giving you a general feel for the entire contents of NODElib.

Overview

   In general, NODElib is divided into several modules that can be used
   more or less independently of each other. These modules can be roughly
   categorized in one of four groups: core routines (required by all
   other modules), basic data types, data set handling routines and
   interfaces, and numerical routines. A brief introduction to each
   category is provided below.

   Core Routines - The core routines handle memory management and basic
   exception handling. The key idea is that localizing these two tasks
   simplify debugging and allow for error messages (for all of NODElib)
   to be directed to a single source that can be easily selected by a
   user.

     * MISC - miscellaneous but useful routines. This module includes
       multidimensional array allocation and deallocation routines, a
       command-line parsing function, and random number generators with
       uniform and Gaussian distributions.
     * ULOG - standardized I/O routines with many features. Because of
       this module, there is not a single plain printf() in all of
       NODElib. All screen I/O is passed through the ULOG package, which
       means that you can arbitrarily set how verbose the messages are
       and where they should go (i.e., log files, error message window,
       etc.).
     * XALLOC - smarter memory allocation. This module provides memory
       allocation with automatic error checking. It also keeps track of
       the memory used by each allocated pointer and the total used for a
       process. It supports debugging with checks for valid frees and
       gives a summary of outstanding pointers.

   Basic Data Types - The basic data types are used as building blocks by
   the other modules. All of the basic data types in NODElib are generic
   in the sense that they accept a void pointer and a set of user hooks
   so that they can operate over any other data type.

     * ARRAY - a generic array type that grows as needed. Use this for
       stacks, queues, or linear lists of any type. The bounds will grow
       transparently, so you never need to be concerned about hard coded
       limits.
     * HASH - generic but expandable hash tables. This module defines a
       hash table for objects that can be expressed as a void pointer.
       The table size dynamically doubles in size if a load factor is
       exceeded, but without recomputation of the hash keys. Very nifty.
     * LIST generic doubly linked lists. This module defines a linked
       list for objects that can be expressed as a void pointer. Just the
       basics are provided.
     * SCAN - a simple reentrant text scanner. These routines only
       distinguish delimiter characters from comment characters and white
       space characters. In other words, these scanners are powerful
       enough to scan languages such as the Bourne-shell or lisp, but
       could not handle C language comments. However, we can scan
       character arrays and files identically.
     * SERIES - data stream handling. This modules allows one to access
       data in flat-files in a unified manner. For time series analysis,
       a one-dimensional stream of data is efficiently stored in memory
       but can still be easily accessed in terms of delayed coordinates
       as if it actually consists of multi-dimensional vectors. This
       module can also be used for numerical pattern classification as
       well, making it a suitable data type for neural networks to
       operate on.

   Data Set Interfaces - Most of the numerical methods in NODElib are
   defined with respect to an abstract ``dataset''. In some applications,
   you may want a dataset to be a chunk of memory in your C code, or a
   flat memory-mapped binary file, or perhaps a delayed coordinate
   embedding of ASCII time series data. NODElib can handle all of these
   cases because it has an abstract DATASET type that defines the
   interface for how data from a dataset should be accessed.

   The details for how NODElib implements DATASETs may not be too
   interesting to you; however, you may wish to scan the list below to
   get a feel for the types of DATASETs that are implemented. The first
   two modules define the DATASET and DATASET_METHOD types; the latter
   type defines what methods are required to implement the DATASET
   interface. All other modules in this list are specific implementation
   of DATASET_METHODs.

     * DATASET - A generic dataset type with custom methods. The data
       type and routines contained in this module represent a generic
       interface that is applicable to a broad class data sources.
       Practically any data source can be massaged into a DATASET, which
       means that all of NODElib can use the types.
     * DSMETHOD - methods for various DATASET instance types. The
       DATASET_METHODs defined in this module can be used to transform
       other types into DATASETs. Methods currently in this module
       include ones for C matrices, the SERIES type, among others
     * DSDBLPTR - DATASET_METHOD for double pointer matrix types. These
       routines define methods for accessing matrices that are
       dynamically allocated as pointers to pointers of doubles. With
       this module, a DATASET can consist of a single matrix or two
       matrices.
     * DSFIFO - DATASET_METHOD for fixed-size FIFO.. A FIFO (first in,
       first out) allows you to hold the most current training patterns
       in a fixed size set. Whenever a new pattern is added, the oldest
       pattern is overwritten. This is handy for incremental learning.
     * DSFILE - DATASET_METHOD for flat binary files. These routines
       define methods for accessing files. With this module, a DATASET
       can consist of a large binary file. Patterns are loaded on a
       need-only basis, so memory is preserved.
     * DSISUBSET - DATASET_METHOD subset type. Given an existing DATASET,
       one can define a new subset of the first data set. The actual
       subset is determined by a user specified vector of indices. This
       is useful for incremental techniques applied to huge amounts of
       data.
     * DSMATRIX - DATASET_METHOD for matrix types. These routines define
       methods for accessing matrices. With this module, a DATASET can
       consist of a single matrix or two matrices.
     * DSSUBSET - DATASET_METHOD subset type. Given an existing DATASET,
       one can define a new subset of the first data set. The actual
       subset is determined by a user specifed range. This is useful for
       incremental techniques applied to huge amounts of data.
     * DSUNION - DATASET_METHOD union of other DATASETs. Given multiple
       existing DATASETs, one can define a new DATASET that is the union
       of original ones. This is useful for when multiple data sources
       have to be conceptually merged (e.g., training on multiple files
       as if theory were one).

   Numerical Routines - The numerical routines are the heart of NODElib.
   If you are reading this, then you probably downloaded NODElib
   specifically for the neural network or support vector machine code. In
   this case, jump directly to those packages. Otherwise, you may wish to
   skim the contents below.

     * DENSE - basic density estimation routines. Given a DATASET, the
       routines in this package allow you to construct a simple density
       estimator with the kernel function of your choice.
     * ERRFUNC - error functions for optimization routines. Error
       functions currently in this module include the standard Quadratic,
       the logistic error function, and Huber's error function.
     * KMEANS - perform k-means clustering on a DATASET. The routines in
       this module will compute clusters of a DATASET with the k-means
       clustering algorithm. The resulting clusters are returned in
       another DATASET.
     * NN - a generic neural network package. The neural network package
       allows for generic feedforward neural networks to be connected
       with arbitrary activation functions, net input functions,
       connections types, error functions, and learning algorithms.
     * OPTIMIZE - numerical optimization routines. This module provides a
       generic and uniform interface to optimization routines that can be
       used on a wide variety of problems. This package currently
       includes multi-dimensional search algorithms (steepest descent,
       conjugate gradient, and quasi-Newton) and line search routines
       (cubic interpolation, golden section, and a hybrid of the first
       two).
     * SVD - singular value decomposition and friends. Included here is a
       basic SVD call and some basic applications of the SVD, such as
       pseudo matrix inversion and principal component analysis.
     * SVM - support vector machines and the SMORCH algorithm.. This
       package defines support vector machines (SVMs) for both
       classification and regression problems. The SVMs can use a wide
       variety of kernel functions. Optimization of the SVMs is performed
       by a variation of John Platt's sequential minimal optimization
       (SMO) algorithm. This version of SMO is generalized for
       regression, uses kernel caching, and incorporates several
       heuristics; for these reasons, we refer to the optimization
       algorithm as SMORCH. SMORCH has been shown to be over an order
       magnitude faster than SMO, QP, and decomposition.

EXAMPLES

   NODElib contains several example programs in the examples subdirectory
   of the source distribution. The examples do not exhaustively show how
   to use every feature of NODElib but, in general, they should help a
   new user quickly come up to speed with the library.

   A partial list of example programs includes:
     * aa.c - Train a NN auto-associator. Can use a variety of
       optimization routines. Data is taken from an ASCII file.
     * classify.c - Train a NN classifier. Can use a variety of
       optimization routines. Data is taken from an ASCII file.
     * cluster.c - An example of using the k-means code.
     * cnls.c - Shows how to build an exotic NN architecture. In this
       case, a CNLS network is built with sublayers, a Euclidean net
       input function, a pairwise product net input function, and other
       NN features. This also shows how to build a SERIES DATASET on the
       fly.
     * dwjacob.c - This program tests the J-prop features NODElib. After
       training an auto-associator, a function of the Jacobian matrix is
       differentiated numerically and analytically, so that the results
       can be compared.
     * hopfield.c - This convoluted example shows how to build a Hopfield
       network with NODElib. I don't recommend using NODElib in this way,
       but only wish to show that that static feedback networks are
       possible.
     * hwnn.c - Hello World for NN. This is the simplest example of how
       to use an NN. However, this example does use short-circuit links,
       so it is slightly advanced.
     * optother.c - This example shows how to use NODElib's fancy
       command-line parsing functions.
     * rbf.c - Constructs an RBFN on logistic map data. The training is
       ``single step'' in that it uses the k-means and a SVD to optimize
       the weights.
     * search.c - Shows how to use NODElib's line search routines to find
       the minimum of a scalar function.
     * smorch.c - This is a more advanced version of John Platt's
       sequencial minimal optimization (SMO) code, called SMORCH. It uses
       caching, many heuristics, and is generalize for regression. See
       the SVM documentation for a complete description, as this is the
       one example that is meant to stand on its own.
     * smlp.c - Similar to rbf.c, but uses an SMLP.
     * xor.c - The fruit-fly of neural networks.
     * xorhess.c - Trains a NN on XOR data, analytically and numerically
       calculates the Hessian matrix, and compares the results.

TEST PROGRAMS

   NODElib has a basic set of test programs in the test directory. These
   test programs aren't particularly thorough; they are just there for
   testing minor new features. Nevertheless, you may find some of the
   useful, as they tend to exercise little known features of the library.

REFERENCES

   While NODElib is lacking formal user and developer manuals (other than
   this online documentation), there exists a few publications that
   describe some of the more subtle features of NODElib. In no particular
   order:
     * G. W. Flake and S. Lawrence. Efficient SVM Regression Training
       with SMO. Submitted to Machine Learning, 2000.
     * G. W. Flake. The Calculus of Jacobian Adaptation. Submitted to
       Neural Computation, 2000.
     * G. W. Flake and B. A. Pearlmuter. Differentiating Functions of the
       Jacobian with Respect to the Weights. In S. A. Solla, T. K. Leen,
       and K.-R. Müller, editors, Advances in Neural Information
       Processing Systems, volume 12. The MIT Press, 2000.
     * G. W. Flake. Square Unit Augmented, Radially Extended, Multilayer
       Perceptrons. In G. Orr, K.-R. Müller, and R. Caruana, editors,
       Tricks of the Trade: How to Make Algorithms Really Work, LNCS
       State-of-the-Art-Surveys. Springer-Verlag, 1998.

DOWNLOAD

   The unofficial home page of NODElib is located at:
   http://flakentstein.net/nodelib/html.

   The latest version of NODElib can be downloaded from:
   http://flakentstein.net/lib/nodelib.tgz.

   The author's home page is: http://flakenstein.net/.

COPYING

   Copyright (c) 1992-2005 by Gary William Flake.

   NODElib is free software; you can redistribute it and/or modify it
   under the terms of the GNU General Public License as published by the
   Free Software Foundation; either version 2 of the License, or (at your
   option) any later version.

   This program is distributed in the hope that it will be useful, but
   WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
   General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

AUTHOR

   Gary William Flake (gary.flake@usa.net).