DiscriminantAnalysis.jl is a Julia package for multiple linear and quadratic regularized discriminant analysis (LDA & QDA respectively). LDA and QDA are distribution-based classifiers with the underlying assumption that data follows a multivariate normal distribution. LDA differs from QDA in the assumption about the class variability; LDA assumes that all classes share the same within-class covariance matrix whereas QDA relaxes that constraint and allows for distinct within-class covariance matrices. This results in LDA being a linear classifier and QDA being a quadratic classifier.
The package is currently a work in progress work in progress - see issue #12 for the package status.
A bare-bones implementation of LDA is currently available but is not exported. Calls to the solver must be prefixed with DiscriminantAnalysis
after running using DiscriminantAnalysis
. Below is a brief overview of the API:
lda(X, y; kwargs...)
: construct a Linear Discriminant Analysis model.X
: the matrix of predictors (design matrix). Data may be per-column or per-row; this is specified by thedims
keyword argument.y
: the vector of class indices. For c classes, the values must range from 1 to c.dims=1
: the dimension along which observations are stored. Use 1 for row-per-observation and 2 for column-per-observation.canonical=false
: compute the canonical coordinates if true. For c classes, the data is mapped to a c-1 dimensional space for prediction.compute_covariance=false
: compute the full class covariance matrix if true. Data is whitened prior to compute discriminant values, so generally the covariance is not computed unless specified.centroids=nothing
: matrix of pre-computed class centroids. This can be used if the class centroids are known a priori. Otherwise, the centroids are estimated from the data. The centroid matrix must have the same orientation as specified by thedims
argument.priors=nothing
: vector of pre-computed class prior probabilities. This can be used if the class prior probabilities are known a priori. Otherwise, the priors are estimated from the class frequencies.gamma=nothing
: real value between 0 and 1. Gamma is a regularization parameter that is used to shrink the covariance matrix towards an identity matrix scaled by the average eigenvalue of the covariance matrix. A value of0.2
retains 80% of the original covariance matrix.
posteriors(LDA, Z)
: compute the class posterior probabilities on a new matrix of predictorsZ
. This matrix must have the samedims
orientation as the original design matrixX
.classify(LDA, Z)
: compute the class label predictions on a new matrix of predictorsZ
. This matrix must have the samedims
orientation as the original design matrixX
.
The script below demonstrates how to fit an LDA model to some synthetic data using the interface described above:
using DiscriminantAnalysis
using Random
const DA = DiscriminantAnalysis
# Generate two sets of 100 samples of a 5-dimensional random normal
# variable offset by +1/-1
X = [randn(250,5) .- 1;
randn(250,5) .+ 1];
# Generate class labels for the two samples
# NOTE: classes must be indexed by integers from 1 to the number of
# classes (2 in this case)
y = repeat(1:2, inner=250);
# Construct the LDA model
model = DA.lda(X, y; dims=1, canonical=true, priors=[0.5; 0.5])
# Generate some new data
Z = rand(10,5) .- 0.5
# Get the posterior probabilities for new data
Z_prob = DA.posteriors(model, Z)
# Get the class predictions
Z_class = DA.classify(model, Z)