BSA is a Bayesian neural network classifier for time series data with genetic network tuning. BSA will train on existing time series data and iterate towards a locally optimal generalization network to predict forward time series value probabilities.
An input time series s(t)
is preprocessed into a set of network training samples t(t)
such that for all t
:
- The inputs for each training sample are the
L
samples ofs(t)
in(s(t-L), s(t)
) - The outputs for each training sample are the posterior classification probabilities for
N
discrete classes that characterize the percentage change ins(t+D)
relative tos(t)
for some forward distanceD
. The sum of theN
class probabilities is 1.0.
A fully connected neural network is instantiated with L
input nodes, M
internal nodes, N
output nodes, and random weights. This network is trained using the prepared training samples t(t)
for all t
. The network supports a variety of non-linear activation functions for forward propagation, and it uses back propagation for error propagation. In addition, a variety of preprocessing techniques can be used on the input data prior to the creation of training samples.
The variables L
, M
, N
, and D
are optimized using a genetic evolution algorithm in which the genome is the binary representation of these values. Phenotype fitness is measured as the success rate of the network to classify known inputs from the training samples t(t)
. Phenotype evolution uses genetic crossover, mutation, and elitism, and each successive generation includes the most successful phenotypes from the previous generation competing against a new generation of genetic successors. Evolution can be configured to end when the percentage change in success rate drops below a threshold value.
The primary code is in Java, and it has been recently updated (2012) to use maven rather than ant. The neural network algorithms are multithreaded and the number of threads can be tuned to fit the performance of the resident hardware. There is native C code for high-performance sections of the neural network algorithms. The backend database is assumed to be MySQL, although it uses hibernate and it can be reconfigured easily for other databases. There are test cases for all aspects of the neural network and genetic algorithms, and there are test cases for a variety of input sources. There are also some code-specific test cases for the use of xmlbeans, hibernate, network persistence, and memory management. JUnit is used to run the tests; tests are specified by annotation.
A web application, implemented in Ruby on Rails, provides a monitoring user interface for evolution in progress, with displays to list all current phenotypes, success rates, and estimated time to completion for each generation.
The native code must be built first, following by a full build of all modules. From the project root directory:
mvn clean install -pl native
mvn clean install
The test cases will be run via surefire as part of the full build; this will take a long time, as some of the genetic evolution tests are substantial. The test cases will be logged to java/target/surefire-reports
. To run the build without running the test cases:
mvn clean install -pl native
mvn clean install -Dmaven.test.skip=true
Take a look at the test cases in BsaTest.java
to see how BSA can be used. The tests use a naming convention that describes the aspects of BSA that are being exercised. Many tests approximate BSA in its full operating mode. See the tests prefixed with geneticThreadedNetworkGenomeTest
for more information.
A prototype using hadoop for neural network training is currently being tested using AWS for hadoop cluster deployment.
Comments welcomed.
-DT 4/12/2012