Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[src] Bug-fix to conceptual bug in Minimum Bayes Risk/sausage code. Thx:@jtrmal #2056

Merged
merged 1 commit into from
Dec 3, 2017

Conversation

danpovey
Copy link
Contributor

@danpovey danpovey commented Dec 1, 2017

Fixing this bug makes the sausage stats from the MBR code significantly less sparse.

@danpovey danpovey merged commit 77ae8fe into kaldi-asr:master Dec 3, 2017
@jtrmal
Copy link
Contributor

jtrmal commented Dec 3, 2017

I finally got to test this briefly (on babel pashto)

before:

%WER 38.5 | 21825 101803 | 65.4 24.6 10.0 3.8 38.5 29.7 | -0.418 | 

after:

%WER 38.4 | 21825 101803 | 65.4 24.6 10.0 3.9 38.4 29.7 | -0.419 |

For that given setup, I haven't seen more than 0.1 or 0.2 (rarely) % improvement.

On my second test-case (where I have noticed the issue), everything seems ok, but I don't have any experiment set up yet.

@jtrmal
Copy link
Contributor

jtrmal commented Dec 3, 2017

BTW, the key for interpreting the columns is the same as in the ctm files:

Err | # Snt # Wrd | Corr    Sub    Del    Ins    Err  S.Err | NCE

@jtrmal
Copy link
Contributor

jtrmal commented Dec 3, 2017

To explain, the babel recipe uses MBR decoding even during scoring of a single system.
I don't have any intuition if the gain would be bigger or smaller for let's say lattice combination (via lattice-union+mbr).

kronos-cm added a commit to kronos-cm/kaldi that referenced this pull request Dec 18, 2017
* 'master' of https://github.com/kaldi-asr/kaldi: (58 commits)
  [src] Fix bug in nnet3 optimization, affecting Scale() operation; cosmetic fixes. (kaldi-asr#2088)
  [egs] Mac compatibility fix to SGMM+MMI: remove -T option to cp (kaldi-asr#2087)
  [egs] Copy dictionary-preparation-script fix from fisher-english(8e7793f) to fisher-swbd and ami (kaldi-asr#2084)
  [egs] Small fix to backstitch in AMI scripts (kaldi-asr#2083)
  [scripts] Fix augment_data_dir.py (relates to non-pipe case of wav.scp) (kaldi-asr#2081)
  [egs,scripts] Add OPGRU scripts and recipes (kaldi-asr#1950)
  [egs] Add an l2-regularize-based recipe for image recognition setups (kaldi-asr#2066)
  [src] Bug-fix to assertion in cu-sparse-matrix.cc (RE large matrices) (kaldi-asr#2077)
  [egs] Add a tdnn+lstm+attention+backstitch recipe for tedlium (kaldi-asr#1982)
  [src,egs] Small cosmetic fixes (kaldi-asr#2074)
  [src] Small fix RE CuSparse error code printing (kaldi-asr#2070)
  [src] Fix compilation error on MSVC: missing include. (kaldi-asr#2064)
  [egs] Update to CSJ example scripts, with chain+TDNN recipes.  Thanks: @rickychanhoyin (kaldi-asr#2035)
  [scripts,egs] Convert ". path.sh" to ". ./path.sh" (kaldi-asr#2061)
  [doc] Add documentation about matrix row and column ranges in scp files.
  [egs] Add recipe for Mozilla Common Voice corpus v1 (kaldi-asr#2057)
  [scripts] Fix bug in slurm.pl affecting log format (kaldi-asr#2063)
  [src] Fix some small typos (kaldi-asr#2060)
  [scripts] Adding --num-threads option to ivector extraction scripts; script fixes (kaldi-asr#2055)
  [src] Bug-fix to conceptual bug in Minimum Bayes Risk/sausage code.  Thanks:@jtrmal (kaldi-asr#2056)
  ...
mahsa7823 pushed a commit to mahsa7823/kaldi that referenced this pull request Feb 28, 2018
…hanks:@jtrmal (kaldi-asr#2056)

Note: this may improve results where this code is used, e.g. for lattice combination or MBR decoding.
Skaiste pushed a commit to Skaiste/idlak that referenced this pull request Sep 26, 2018
…hanks:@jtrmal (kaldi-asr#2056)

Note: this may improve results where this code is used, e.g. for lattice combination or MBR decoding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants