Commands described below refer to speech wave files from the
SR4X v1.2 corpus sample.
With a slight file reorganization to facilitate the use of the ECOZ2 tools,
the files corresponding to the word <className>
are located
(in a separate space) under ../SR4X/speech/<className>/
.
Satisfactory but basic testing complete. Minimal model tuning. Further updates unlikely. Main goal has been to capture initial tests on a revision of the VQ/HMM code I wrote many years ago.
Note: These notes are pretty terse in general, but the reported confusion matrices and ranked candidate tables can probably at least provide a good sense of the results.
sgn.endp ../SR4X/speech/*/*[0-9].wav
The files from this command get also generated under the separate space
mentioned above.
They go to the same original directory and same name as
prefix, and info about the extracted interval
<path-to-name>__S<start>_L<length>$.wav
as suffix,
where <start>
is the start index of the detection wrt to input signal,
and <length>
is the size of the detection.
Using the endpoint-detected files above, the following starts
populating the ./data/
subdirectory here with corresponding
"predictor" files:
lpc -P 12 -W 45 -O 15 -m 10 -s 0.9 ../SR4X/speech/*/*$.wav
-P 12
: 12-order prediction;-W 45
: 45-ms analysis window size;-O 15
: 15-ms window offset;-m 10
: only consider classes with at least 10 signal files;-s 0.9
: to split the set of files into approximately 90% for a training subset and 10% for a testing subset. With this option the resulting predictor files get generated underdata/predictors/TRAIN/
anddata/predictors/TEST/
respectively.
None of the following files are committed to version control in this repository:
- The WAV files indicated above
- ECOZ2 executables (
lpc
,vq.learn
,hmm.classify
, etc.) - Binary files generated by the commands in the exercises
(only the various "report" files (
.rpt
) created during codebook and HMM model training.
See:
- vq.md
- hmm.md