-
Notifications
You must be signed in to change notification settings - Fork 2
/
README
315 lines (270 loc) · 16 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
NODElib Documentation
By Gary William Flake
NAME
nodelib.h - the Neural Optimization Development Engine library
SYNOPSIS
NODElib (which stands for Neural Optimization Development Engine
library) is a programming library for rapidly developing powerful
neural network simulations. Very few assumptions have been made
regarding the system in which the code is to be used; thus, NODElib is
suitable as a back-end engine for most applications.
But wait, there's more.
NODElib is general enough that you can probably find many other uses
for it. The code is extremely modular, compact, and robust. It is
written in an object oriented manner. All of the library code, example
and test program source, documentation, and supporting text is only on
the order of about 20,000 lines, which means that NODElib is extremely
compact. This is important from the point of view of comprehending the
code, and for the memory requirements of the library.
To use NODElib, you need only use the include directive below, and
compile your program with a '-lnode -lm'.
#include <nodelib.h>
DESCRIPTION
Quick Start
To get up and running with NODElib as fast as possible, consult the
examples directory, which contains several source code examples for
using NODElib in a variety of ways. In general, simple tasks will
require simple programs, and complex tasks will require slightly
complex programs.
The Makefile in the examples directory can also be used as a template
for your own Makefile.
Keep in mind that the examples attempt to show a wide range of NODElib
functionality; thus, you may do well to take an example program and
strip out the functionality that you do not need (e.g., advance
optimization or data source hooks, etc.).
The next section gives a bird's eye view of NODElib, with the aim of
giving you a general feel for the entire contents of NODElib.
Overview
In general, NODElib is divided into several modules that can be used
more or less independently of each other. These modules can be roughly
categorized in one of four groups: core routines (required by all
other modules), basic data types, data set handling routines and
interfaces, and numerical routines. A brief introduction to each
category is provided below.
Core Routines - The core routines handle memory management and basic
exception handling. The key idea is that localizing these two tasks
simplify debugging and allow for error messages (for all of NODElib)
to be directed to a single source that can be easily selected by a
user.
* MISC - miscellaneous but useful routines. This module includes
multidimensional array allocation and deallocation routines, a
command-line parsing function, and random number generators with
uniform and Gaussian distributions.
* ULOG - standardized I/O routines with many features. Because of
this module, there is not a single plain printf() in all of
NODElib. All screen I/O is passed through the ULOG package, which
means that you can arbitrarily set how verbose the messages are
and where they should go (i.e., log files, error message window,
etc.).
* XALLOC - smarter memory allocation. This module provides memory
allocation with automatic error checking. It also keeps track of
the memory used by each allocated pointer and the total used for a
process. It supports debugging with checks for valid frees and
gives a summary of outstanding pointers.
Basic Data Types - The basic data types are used as building blocks by
the other modules. All of the basic data types in NODElib are generic
in the sense that they accept a void pointer and a set of user hooks
so that they can operate over any other data type.
* ARRAY - a generic array type that grows as needed. Use this for
stacks, queues, or linear lists of any type. The bounds will grow
transparently, so you never need to be concerned about hard coded
limits.
* HASH - generic but expandable hash tables. This module defines a
hash table for objects that can be expressed as a void pointer.
The table size dynamically doubles in size if a load factor is
exceeded, but without recomputation of the hash keys. Very nifty.
* LIST generic doubly linked lists. This module defines a linked
list for objects that can be expressed as a void pointer. Just the
basics are provided.
* SCAN - a simple reentrant text scanner. These routines only
distinguish delimiter characters from comment characters and white
space characters. In other words, these scanners are powerful
enough to scan languages such as the Bourne-shell or lisp, but
could not handle C language comments. However, we can scan
character arrays and files identically.
* SERIES - data stream handling. This modules allows one to access
data in flat-files in a unified manner. For time series analysis,
a one-dimensional stream of data is efficiently stored in memory
but can still be easily accessed in terms of delayed coordinates
as if it actually consists of multi-dimensional vectors. This
module can also be used for numerical pattern classification as
well, making it a suitable data type for neural networks to
operate on.
Data Set Interfaces - Most of the numerical methods in NODElib are
defined with respect to an abstract ``dataset''. In some applications,
you may want a dataset to be a chunk of memory in your C code, or a
flat memory-mapped binary file, or perhaps a delayed coordinate
embedding of ASCII time series data. NODElib can handle all of these
cases because it has an abstract DATASET type that defines the
interface for how data from a dataset should be accessed.
The details for how NODElib implements DATASETs may not be too
interesting to you; however, you may wish to scan the list below to
get a feel for the types of DATASETs that are implemented. The first
two modules define the DATASET and DATASET_METHOD types; the latter
type defines what methods are required to implement the DATASET
interface. All other modules in this list are specific implementation
of DATASET_METHODs.
* DATASET - A generic dataset type with custom methods. The data
type and routines contained in this module represent a generic
interface that is applicable to a broad class data sources.
Practically any data source can be massaged into a DATASET, which
means that all of NODElib can use the types.
* DSMETHOD - methods for various DATASET instance types. The
DATASET_METHODs defined in this module can be used to transform
other types into DATASETs. Methods currently in this module
include ones for C matrices, the SERIES type, among others
* DSDBLPTR - DATASET_METHOD for double pointer matrix types. These
routines define methods for accessing matrices that are
dynamically allocated as pointers to pointers of doubles. With
this module, a DATASET can consist of a single matrix or two
matrices.
* DSFIFO - DATASET_METHOD for fixed-size FIFO.. A FIFO (first in,
first out) allows you to hold the most current training patterns
in a fixed size set. Whenever a new pattern is added, the oldest
pattern is overwritten. This is handy for incremental learning.
* DSFILE - DATASET_METHOD for flat binary files. These routines
define methods for accessing files. With this module, a DATASET
can consist of a large binary file. Patterns are loaded on a
need-only basis, so memory is preserved.
* DSISUBSET - DATASET_METHOD subset type. Given an existing DATASET,
one can define a new subset of the first data set. The actual
subset is determined by a user specified vector of indices. This
is useful for incremental techniques applied to huge amounts of
data.
* DSMATRIX - DATASET_METHOD for matrix types. These routines define
methods for accessing matrices. With this module, a DATASET can
consist of a single matrix or two matrices.
* DSSUBSET - DATASET_METHOD subset type. Given an existing DATASET,
one can define a new subset of the first data set. The actual
subset is determined by a user specifed range. This is useful for
incremental techniques applied to huge amounts of data.
* DSUNION - DATASET_METHOD union of other DATASETs. Given multiple
existing DATASETs, one can define a new DATASET that is the union
of original ones. This is useful for when multiple data sources
have to be conceptually merged (e.g., training on multiple files
as if theory were one).
Numerical Routines - The numerical routines are the heart of NODElib.
If you are reading this, then you probably downloaded NODElib
specifically for the neural network or support vector machine code. In
this case, jump directly to those packages. Otherwise, you may wish to
skim the contents below.
* DENSE - basic density estimation routines. Given a DATASET, the
routines in this package allow you to construct a simple density
estimator with the kernel function of your choice.
* ERRFUNC - error functions for optimization routines. Error
functions currently in this module include the standard Quadratic,
the logistic error function, and Huber's error function.
* KMEANS - perform k-means clustering on a DATASET. The routines in
this module will compute clusters of a DATASET with the k-means
clustering algorithm. The resulting clusters are returned in
another DATASET.
* NN - a generic neural network package. The neural network package
allows for generic feedforward neural networks to be connected
with arbitrary activation functions, net input functions,
connections types, error functions, and learning algorithms.
* OPTIMIZE - numerical optimization routines. This module provides a
generic and uniform interface to optimization routines that can be
used on a wide variety of problems. This package currently
includes multi-dimensional search algorithms (steepest descent,
conjugate gradient, and quasi-Newton) and line search routines
(cubic interpolation, golden section, and a hybrid of the first
two).
* SVD - singular value decomposition and friends. Included here is a
basic SVD call and some basic applications of the SVD, such as
pseudo matrix inversion and principal component analysis.
* SVM - support vector machines and the SMORCH algorithm.. This
package defines support vector machines (SVMs) for both
classification and regression problems. The SVMs can use a wide
variety of kernel functions. Optimization of the SVMs is performed
by a variation of John Platt's sequential minimal optimization
(SMO) algorithm. This version of SMO is generalized for
regression, uses kernel caching, and incorporates several
heuristics; for these reasons, we refer to the optimization
algorithm as SMORCH. SMORCH has been shown to be over an order
magnitude faster than SMO, QP, and decomposition.
EXAMPLES
NODElib contains several example programs in the examples subdirectory
of the source distribution. The examples do not exhaustively show how
to use every feature of NODElib but, in general, they should help a
new user quickly come up to speed with the library.
A partial list of example programs includes:
* aa.c - Train a NN auto-associator. Can use a variety of
optimization routines. Data is taken from an ASCII file.
* classify.c - Train a NN classifier. Can use a variety of
optimization routines. Data is taken from an ASCII file.
* cluster.c - An example of using the k-means code.
* cnls.c - Shows how to build an exotic NN architecture. In this
case, a CNLS network is built with sublayers, a Euclidean net
input function, a pairwise product net input function, and other
NN features. This also shows how to build a SERIES DATASET on the
fly.
* dwjacob.c - This program tests the J-prop features NODElib. After
training an auto-associator, a function of the Jacobian matrix is
differentiated numerically and analytically, so that the results
can be compared.
* hopfield.c - This convoluted example shows how to build a Hopfield
network with NODElib. I don't recommend using NODElib in this way,
but only wish to show that that static feedback networks are
possible.
* hwnn.c - Hello World for NN. This is the simplest example of how
to use an NN. However, this example does use short-circuit links,
so it is slightly advanced.
* optother.c - This example shows how to use NODElib's fancy
command-line parsing functions.
* rbf.c - Constructs an RBFN on logistic map data. The training is
``single step'' in that it uses the k-means and a SVD to optimize
the weights.
* search.c - Shows how to use NODElib's line search routines to find
the minimum of a scalar function.
* smorch.c - This is a more advanced version of John Platt's
sequencial minimal optimization (SMO) code, called SMORCH. It uses
caching, many heuristics, and is generalize for regression. See
the SVM documentation for a complete description, as this is the
one example that is meant to stand on its own.
* smlp.c - Similar to rbf.c, but uses an SMLP.
* xor.c - The fruit-fly of neural networks.
* xorhess.c - Trains a NN on XOR data, analytically and numerically
calculates the Hessian matrix, and compares the results.
TEST PROGRAMS
NODElib has a basic set of test programs in the test directory. These
test programs aren't particularly thorough; they are just there for
testing minor new features. Nevertheless, you may find some of the
useful, as they tend to exercise little known features of the library.
REFERENCES
While NODElib is lacking formal user and developer manuals (other than
this online documentation), there exists a few publications that
describe some of the more subtle features of NODElib. In no particular
order:
* G. W. Flake and S. Lawrence. Efficient SVM Regression Training
with SMO. Submitted to Machine Learning, 2000.
* G. W. Flake. The Calculus of Jacobian Adaptation. Submitted to
Neural Computation, 2000.
* G. W. Flake and B. A. Pearlmuter. Differentiating Functions of the
Jacobian with Respect to the Weights. In S. A. Solla, T. K. Leen,
and K.-R. Müller, editors, Advances in Neural Information
Processing Systems, volume 12. The MIT Press, 2000.
* G. W. Flake. Square Unit Augmented, Radially Extended, Multilayer
Perceptrons. In G. Orr, K.-R. Müller, and R. Caruana, editors,
Tricks of the Trade: How to Make Algorithms Really Work, LNCS
State-of-the-Art-Surveys. Springer-Verlag, 1998.
DOWNLOAD
The unofficial home page of NODElib is located at:
http://flakentstein.net/nodelib/html.
The latest version of NODElib can be downloaded from:
http://flakentstein.net/lib/nodelib.tgz.
The author's home page is: http://flakenstein.net/.
COPYING
Copyright (c) 1992-2005 by Gary William Flake.
NODElib is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2 of the License, or (at your
option) any later version.
This program is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
AUTHOR
Gary William Flake (gary.flake@usa.net).