Go Wrapper for SCTK

SCTK is a toolkit made by NIST that can be used for evaluating the output of automatic speech recognition systems (ASR). It can be used to:
- Calculate Word Error Rate (WER) and Character Error Rate (CER)
- Analyze different types of errors made by ASR systems: Substitutions, Insertions and Deletions.
- Generate alignments between multiple sets of transcripts (reference and hypotheses)
- Use statistical tests to evaluate the significance in performance delta between ASR systems.
This repository offers a single binary with a command line interface (CLI) that wraps around the all different tools in SCTK; the CLI has a simple and easy to use interface.
This repo is a work-in-progress.

Usage Examples

Evaluating WER

This example evaluates the word error rate (WER) between reference transcripts and hypothesis transcript generated by a ASR system. The example uses Bengali text but SCTK supports most languages since it expects text with UTF-8 encoding.

# Creating dummy reference transcript file in CSV format.
cat << EOF > reference.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ষিক দশ লক্ষ ইউরো।
spk02-utt02,খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল।
EOF

# Creating dummy hypothesis transcript from an ASR system, in CSV format.
cat << EOF > hypothesis.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ দশ লক ইউর।
spk02-utt02,খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল।
EOF

# Getting the sctk CLI tool from this repository and giving it executable permissions.
version=v0.3.0
wget -O sctk https://github.com/shahruk10/go-sctk/releases/download/${version}/sctk
chmod +x sctk

# Using sctk CLI to evaluate WER and check errors.
#
# Setting `--ignore-first=true` to ignore header row.
# Check `sctk score --help` for documentation of each argument.
#
# To compare characters instead of words, and calculate the
# character error rate (CER) instead of WER, set --cert=true.
./sctk score \
  --ignore-first=true \
  --delimiter="," \
  --col-id=0 \
  --col-trn=1 \
  --normalize-unicode=true \
  --cer=false \
  --out=./report \
  --ref=reference.csv \
  --hyp=hypothesis.csv

Now we can check generated reports in the ./report directory.

  report/
  ├── hyp1.trn
  ├── hyp1.trn.dtl
  ├── hyp1.trn.raw
  ├── hyp1.trn.sgml
  ├── hyp1.trn.sys
  ├── hyp1.trn.pra.html
  ├── hyp1.trn.pra.md
  ├── hyp1.trn.pra.csv
  ├── hyp1.trn.pra.json
  ├── hyp1.trn.pra
  └── ref.trn

The *.sys file contains a table showing a breakdown of the different types of errors.

The results are aggregated for each speaker; Corr, Sub, Del and Ins stands for the percentage of words (characters in case of CER) that were correctly decoded, substituted, deleted and inserted in the hypothesis respectively.

                   SYSTEM SUMMARY PERCENTAGES by SPEAKER                      

     ,----------------------------------------------------------------.
     |                              hyp1                              |
     |----------------------------------------------------------------|
     | SPKR   | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err |
     |--------+-------------+-----------------------------------------|
     | spk01  |    1      6 | 50.0   50.0    0.0    0.0   50.0  100.0 |
     |--------+-------------+-----------------------------------------|
     | spk02  |    1      6 | 50.0   50.0    0.0    0.0   50.0  100.0 |
     |================================================================|
     | Sum/Avg|    2     12 | 50.0   50.0    0.0    0.0   50.0  100.0 |
     |================================================================|
     |  Mean  |  1.0    6.0 | 50.0   50.0    0.0    0.0   50.0  100.0 |
     |  S.D.  |  0.0    0.0 |  0.0    0.0    0.0    0.0    0.0    0.0 |
     | Median |  1.0    6.0 | 50.0   50.0    0.0    0.0   50.0  100.0 |
     `----------------------------------------------------------------'

The *.pra.md and *.pra.html file shows alignments between the reference and hypothesis text in markdown and html format respectively. These alignment files make it easy to see errors in context. In the table below, taken from hyp1.trn.pra.md, S indicates substitutions. D and I would represent deletions and insertions respectively.


REF	খেলাটি	চার	টেস্ট	সিরিজের	চূড়ান্ত	ছিল।
HYP1	খেলা	ছার	টেস্ট	শিরিজের	চূড়ান্ত	ছিল।
EVAL	S	S		S

These alignments are also available in json format in the *.pra.json file, which can be easily loaded into different programs and used for analysis or combining different ASR results.
Further more, multiple ASR systems can be evaluated together by providing more than one hypothesis with additional uses of the --hyp flag when using the sctk CLI.

The *.dtl file shows further details of each type of error. This can reveal systematic errors and patterns in how the ASR system is transcribing the audio. When evaluating CER, this file will show character level information, instead of word level.

... (other useful stuff)

CONFUSION PAIRS                  Total                 (6)
                               With >=  1 occurrences (6)
 1:    1  ->  ইউরো। ==> ইউর।
 2:    1  ->  খেলাটি ==> খেলা
 3:    1  ->  চার ==> ছার
 4:    1  ->  বার্ষিক ==> বার্
 5:    1  ->  লক্ষ ==> লক
 6:    1  ->  সিরিজের ==> শিরিজের
   -------                                                                                          
       6  

... (other useful stuff)

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
SCTK @ 62aabfa		SCTK @ 62aabfa
cmd/sctk		cmd/sctk
internal		internal
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
shell.nix		shell.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Go Wrapper for SCTK

Usage Examples

Evaluating WER

License

About

Releases 3

Packages

Languages

License

shahruk10/go-sctk

Folders and files

Latest commit

History

Repository files navigation

Go Wrapper for SCTK

Usage Examples

Evaluating WER

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages