Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
hmm-exercise.sh	hmm-exercise.sh
hmm-summary-1.png	hmm-summary-1.png
hmm-summary-gt-79.png	hmm-summary-gt-79.png
hmm-summary.csv	hmm-summary.csv
hmm-summary.png	hmm-summary.png
nb-exercise.sh	nb-exercise.sh
nb-summary.csv	nb-summary.csv
nb-summary.png	nb-summary.png
tt-list.csv	tt-list.csv
vq-exercise.sh	vq-exercise.sh

On a 4.5 hour recording

In this exercise:

P = 20 (as in exerc05)
Only classes with at least 100 instances (i.e., 80 for training)

Generating the LPC prediction vector sequences

$ ecoz2 lpc -P 20 -W 45 -O 15 -m 100 ../exerc02/data/signals

$ ls data/predictors
A  Bm C  D  E  F  G2 H  I  I2 I3 P

Let's see the actual number of instances per class:

$ for c in `ls data/predictors/`; do echo "`ls -l data/predictors/$c/*.prd | wc -l` $c instances"; done | sort
       141 H instances
       171 P instances
       175 D instances
       307 G2 instances
       324 I3 instances
       340 F instances
       471 I instances
       512 A instances
       550 C instances
       608 Bm instances
       713 E instances
       714 I2 instances

Generating the TRAIN and TEST predictor lists

tt-list.csv will contain all the available predictor filenames with ~80% per class marked as "TRAIN" and ~20% as "TEST":

echo "tt,class,selection" > tt-list.csv
for class in `ls data/predictors/`; do
  ecoz2 util split --train-fraction 0.8 --file-ext .prd --files data/predictors/${class} >> tt-list.csv
done

The totals:

grep TRAIN tt-list.csv| wc -l
    4016
grep TEST tt-list.csv| wc -l
    1010

VQ based classification

VQ based training and classification done with the help of this script: vq-exercise.sh.

Classification results:

CLASSIFYING TRAINING PREDICTORS
ecoz2 vq classify --codebooks data/codebooks/[A-Z]*/eps_0.0005_M_1024.cbook --predictors tt-list.csv --tt=TRAIN
number of codebooks: 12  number of predictors: 4016

     Confusion matrix:
            0   1   2   3   4   5   6   7   8   9  10  11     tests   errors

   A   0  408   0   1   0   0   0   0   0   0   0   0   0      409       1
  Bm   1    1 472   0   1   0   0   0   4   0   2   6   0      486      14
   C   2    0   0 438   0   0   0   0   0   1   1   0   0      440       2
   D   3    0   0   0 140   0   0   0   0   0   0   0   0      140       0
   E   4    0   0   0   0 570   0   0   0   0   0   0   0      570       0
   F   5    0   0   0   0   1 271   0   0   0   0   0   0      272       1
  G2   6    0   0   1   0   2   0 242   0   0   0   0   0      245       3
   H   7    0   0   0   0   0   0   0 112   0   0   0   0      112       0
   I   8    0   0   0   0   0   0   0   0 376   0   0   0      376       0
  I2   9    0   0   0   0   0   0   0   0   0 571   0   0      571       0
  I3  10    0   0   0   0   0   0   0   0   0   0 259   0      259       0
   P  11    0   0   0   0   0   0   1   0   0   0   0 135      136       1

     class     accuracy    tests      candidate order
   A     0       99.76%   409        408   1   0   0   0   0   0   0   0   0   0   0
  Bm     1       97.12%   486        472   7   6   0   1   0   0   0   0   0   0   0
   C     2       99.55%   440        438   2   0   0   0   0   0   0   0   0   0   0
   D     3      100.00%   140        140   0   0   0   0   0   0   0   0   0   0   0
   E     4      100.00%   570        570   0   0   0   0   0   0   0   0   0   0   0
   F     5       99.63%   272        271   1   0   0   0   0   0   0   0   0   0   0
  G2     6       98.78%   245        242   3   0   0   0   0   0   0   0   0   0   0
   H     7      100.00%   112        112   0   0   0   0   0   0   0   0   0   0   0
   I     8      100.00%   376        376   0   0   0   0   0   0   0   0   0   0   0
  I2     9      100.00%   571        571   0   0   0   0   0   0   0   0   0   0   0
  I3    10      100.00%   259        259   0   0   0   0   0   0   0   0   0   0   0
   P    11       99.26%   136        135   1   0   0   0   0   0   0   0   0   0   0

       TOTAL     99.45%   4016        3994  15   6   0   1   0   0   0   0   0   0   0
  avg_accuracy   99.51%
    error_rate    0.49%


CLASSIFYING TEST PREDICTORS
ecoz2 vq classify --codebooks data/codebooks/[A-Z]*/eps_0.0005_M_1024.cbook --predictors tt-list.csv --tt=TEST
number of codebooks: 12  number of predictors: 1010

     Confusion matrix:
            0   1   2   3   4   5   6   7   8   9  10  11     tests   errors

   A   0   99   0   0   0   1   0   1   0   0   2   0   0      103       4
  Bm   1    1  93   0   0   0   0   0   5   0   6  16   1      122      29
   C   2    0   0  90   2   4  13   0   0   0   0   0   1      110      20
   D   3    0   0   0  28   1   6   0   0   0   0   0   0       35       7
   E   4    0   0   6   5 115   0   6   1   9   0   0   1      143      28
   F   5    0   0   7   2   0  59   0   0   0   0   0   0       68       9
  G2   6    0   0   6   0   9   1  45   0   0   0   0   1       62      17
   H   7    0   0   0   0   0   0   0  28   0   1   0   0       29       1
   I   8    0   0   4   0  10   0   0   0  63  17   0   1       95      32
  I2   9    0   1   2   0   4   0   3   0   9 119   3   2      143      24
  I3  10    0   2   0   0   0   0   0   0   0   5  58   0       65       7
   P  11    0   0   0   0   7   1   0   1   0   4   0  22       35      13

     class     accuracy    tests      candidate order
   A     0       96.12%   103         99   0   1   1   1   0   1   0   0   0   0   0
  Bm     1       76.23%   122         93  15   9   4   0   1   0   0   0   0   0   0
   C     2       81.82%   110         90  17   0   0   3   0   0   0   0   0   0   0
   D     3       80.00%    35         28   7   0   0   0   0   0   0   0   0   0   0
   E     4       80.42%   143        115  22   5   0   1   0   0   0   0   0   0   0
   F     5       86.76%    68         59   7   1   1   0   0   0   0   0   0   0   0
  G2     6       72.58%    62         45   9   2   2   3   1   0   0   0   0   0   0
   H     7       96.55%    29         28   1   0   0   0   0   0   0   0   0   0   0
   I     8       66.32%    95         63  20   5   2   4   0   1   0   0   0   0   0
  I2     9       83.22%   143        119  13   4   4   2   1   0   0   0   0   0   0
  I3    10       89.23%    65         58   6   1   0   0   0   0   0   0   0   0   0
   P    11       62.86%    35         22   7   2   1   1   2   0   0   0   0   0   0

       TOTAL     81.09%   1010        819 124  30  15  15   5   2   0   0   0   0   0
  avg_accuracy   81.01%
    error_rate   18.99%

So, this gets avg_accuracy = 81.01%

Regular traning and classification based on quantized observation sequences

The following as in most of the exercises, that is, first a general codebook generation pahse using all training predictors; then classification with various techniques.

Codebook generation

Using all TRAIN instances:

$ ecoz2 vq learn --prediction-order 20 --epsilon 0.0005 --predictors tt-list.csv
vq_learn: base_codebook_opt=None prediction_order=Some(20), epsilon=0.0005 codebook_class_name=_ predictor_filenames: 4016

Codebook generation:

prediction_order=20 class='_'  epsilon=0.0005

355436 training vectors (ε=0.0005)
Report: data/codebooks/_/eps_0.0005.rpt
data/codebooks/_/eps_0.0005_M_0002.cbook

(desired_threads=8)
...

Vector quantization

Quantize all vectors (TRAIN and TEST) using some of the various codebook sizes:

$ for M in 0032 0064 0128 0256 0512 1024 2048 4096; do 
   ecoz2 vq quantize --codebook data/codebooks/_/eps_0.0005_M_${M}.cbook data/predictors
done

HMM training and classification

./hmm-exercise.sh

ipython3 ../exerc01/summary-parallel.py hmm-summary.csv

All parameter combinations that were tried:

The best combination (N=9, M=4096, I=1):

Combinations resulting in average accuracy >= 79%:

Naive Bayes training and classification

./nb-exercise.sh

ipython3 ../exerc01/summary-parallel.py nb-summary.csv

So, the larger the codebook the better the performance with Naive Bayes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

exerc05b

exerc05b

README.md

On a 4.5 hour recording

Generating the LPC prediction vector sequences

Generating the TRAIN and TEST predictor lists

VQ based classification

Regular traning and classification based on quantized observation sequences

Codebook generation

Vector quantization

HMM training and classification

Naive Bayes training and classification

Files

exerc05b

Directory actions

More options

Directory actions

More options

Latest commit

History

exerc05b

Folders and files

parent directory

README.md

On a 4.5 hour recording

Generating the LPC prediction vector sequences

Generating the TRAIN and TEST predictor lists

VQ based classification

Regular traning and classification based on quantized observation sequences

Codebook generation

Vector quantization

HMM training and classification

Naive Bayes training and classification