-
SCTK is a toolkit made by NIST that can be used for evaluating the output of automatic speech recognition systems (ASR). It can be used to:
-
Calculate Word Error Rate (WER) and Character Error Rate (CER)
-
Analyze different types of errors made by ASR systems: Substitutions, Insertions and Deletions.
-
Generate alignments between multiple sets of transcripts (reference and hypotheses)
-
Use statistical tests to evaluate the significance in performance delta between ASR systems.
-
-
This repository offers a single binary with a command line interface (CLI) that wraps around the all different tools in SCTK; the CLI has a simple and easy to use interface.
-
This repo is a work-in-progress.
- This example evaluates the word error rate (WER) between reference transcripts and hypothesis transcript generated by a ASR system. The example uses Bengali text but SCTK supports most languages since it expects text with UTF-8 encoding.
# Creating dummy reference transcript file in CSV format.
cat << EOF > reference.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ষিক দশ লক্ষ ইউরো।
spk02-utt02,খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল।
EOF
# Creating dummy hypothesis transcript from an ASR system, in CSV format.
cat << EOF > hypothesis.csv
utterance_id,transcript
spk01-utt01,এর মূল্য বার্ দশ লক ইউর।
spk02-utt02,খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল।
EOF
# Getting the sctk CLI tool from this repository and giving it executable permissions.
version=v0.3.0
wget -O sctk https://github.com/shahruk10/go-sctk/releases/download/${version}/sctk
chmod +x sctk
# Using sctk CLI to evaluate WER and check errors.
#
# Setting `--ignore-first=true` to ignore header row.
# Check `sctk score --help` for documentation of each argument.
#
# To compare characters instead of words, and calculate the
# character error rate (CER) instead of WER, set --cert=true.
./sctk score \
--ignore-first=true \
--delimiter="," \
--col-id=0 \
--col-trn=1 \
--normalize-unicode=true \
--cer=false \
--out=./report \
--ref=reference.csv \
--hyp=hypothesis.csv
- Now we can check generated reports in the
./report
directory.
report/
├── hyp1.trn
├── hyp1.trn.dtl
├── hyp1.trn.raw
├── hyp1.trn.sgml
├── hyp1.trn.sys
├── hyp1.trn.pra.html
├── hyp1.trn.pra.md
├── hyp1.trn.pra.csv
├── hyp1.trn.pra.json
├── hyp1.trn.pra
└── ref.trn
-
The
*.sys
file contains a table showing a breakdown of the different types of errors.- The results are aggregated for each speaker;
Corr
,Sub
,Del
andIns
stands for the percentage of words (characters in case of CER) that were correctly decoded, substituted, deleted and inserted in the hypothesis respectively.
SYSTEM SUMMARY PERCENTAGES by SPEAKER ,----------------------------------------------------------------. | hyp1 | |----------------------------------------------------------------| | SPKR | # Snt # Wrd | Corr Sub Del Ins Err S.Err | |--------+-------------+-----------------------------------------| | spk01 | 1 6 | 50.0 50.0 0.0 0.0 50.0 100.0 | |--------+-------------+-----------------------------------------| | spk02 | 1 6 | 50.0 50.0 0.0 0.0 50.0 100.0 | |================================================================| | Sum/Avg| 2 12 | 50.0 50.0 0.0 0.0 50.0 100.0 | |================================================================| | Mean | 1.0 6.0 | 50.0 50.0 0.0 0.0 50.0 100.0 | | S.D. | 0.0 0.0 | 0.0 0.0 0.0 0.0 0.0 0.0 | | Median | 1.0 6.0 | 50.0 50.0 0.0 0.0 50.0 100.0 | `----------------------------------------------------------------'
- The results are aggregated for each speaker;
-
The
*.pra.md
and*.pra.html
file shows alignments between the reference and hypothesis text in markdown and html format respectively. These alignment files make it easy to see errors in context. In the table below, taken fromhyp1.trn.pra.md
,S
indicates substitutions.D
andI
would represent deletions and insertions respectively.REF খেলাটি চার টেস্ট সিরিজের চূড়ান্ত ছিল। HYP1 খেলা ছার টেস্ট শিরিজের চূড়ান্ত ছিল। EVAL S S S -
These alignments are also available in json format in the
*.pra.json
file, which can be easily loaded into different programs and used for analysis or combining different ASR results. -
Further more, multiple ASR systems can be evaluated together by providing more than one hypothesis with additional uses of the
--hyp
flag when using thesctk
CLI. -
The
*.dtl
file shows further details of each type of error. This can reveal systematic errors and patterns in how the ASR system is transcribing the audio. When evaluating CER, this file will show character level information, instead of word level.... (other useful stuff) CONFUSION PAIRS Total (6) With >= 1 occurrences (6) 1: 1 -> ইউরো। ==> ইউর। 2: 1 -> খেলাটি ==> খেলা 3: 1 -> চার ==> ছার 4: 1 -> বার্ষিক ==> বার্ 5: 1 -> লক্ষ ==> লক 6: 1 -> সিরিজের ==> শিরিজের ------- 6 ... (other useful stuff)