Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use an n-gram LM to rescore the lattice from fast_beam_search. #365

Closed
wants to merge 3 commits into from

Conversation

csukuangfj
Copy link
Collaborator

The PR adds another two decoding methods

  • fast_beam_search_nbest, similar to fast_beam_search, but it uses k2.random_paths() to sample n paths from the lattice instead of using k2.shortest_path()
  • fast_beam_search_with_nbest_rescoring: It uses an n-gram LM to rescore the lattice obtained from fast_beam_search. However, it does not seem to be helpful.
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.1  2.15    best for test-clean
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.2  2.41
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.3  2.61
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.4  2.77
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.5  2.9
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.6  2.96
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.7  3.02
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.8  3.08
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.9  3.13
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.0  3.17
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.1  3.21
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.2  3.24
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.3  3.25
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.4  3.27
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.5  3.3
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0    1.99    best for test-clean
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.01 2.0
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.02 2.01
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.02        2.02
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.05 2.04
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.05        2.05
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.1 2.12
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.2 2.27
beam_4.0_max_contexts_32_max_states_8_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.3 2.54

@danpovey
Copy link
Collaborator

Hm, interesting. Regarding the n-best stuff, we should try to figure out why your results seem to be different from Liyong's.
I wonder whether there might be something going wrong regarding epsilons somehow? I expect we would have to add epsilon self-loops to both the lattice and the language model for composition, since the lattice naturally has epsilons.

@csukuangfj
Copy link
Collaborator Author

I expect we would have to add epsilon self-loops to both the lattice and the language model for composition, since the lattice naturally has epsilons.

I just added also epsilon self-loops to G and remove epsilon self-loops from the rescored word_fsas after instercting with G. The results are the same, i.e., not improved.

@danpovey
Copy link
Collaborator

why does 'fast_beam_search_with_nbest_rescoring' have nbest in it if it is lattice based?

@danpovey
Copy link
Collaborator

Anyway the decoding method is cool. I looked briefly at the code and did not see any obvious problems.
Perhaps we can merge this and then Liyong can try various comparisons vs. his KenLM setup to try to debugthis.

@csukuangfj
Copy link
Collaborator Author

why does 'fast_beam_search_with_nbest_rescoring' have nbest in it if it is lattice based?

I am using nbest rescoring, i.e., extracting n paths from the lattice, unique them, and then intersect them with the given G.

That is why the decoding name contains nbest.

I am not intersecting the generated lattice with the G directly since the generated lattice is an acceptor containing token IDs, while G contains word IDs.

I think we can intersect the genreated lattice with an LG graph, instead of a G.

@glynpu
Copy link
Collaborator

glynpu commented May 17, 2022

Liyong can try various comparisons vs. his KenLM setup to try to debugthis.

Nice implementation. I will study this.

@danpovey
Copy link
Collaborator

One possibility: Liyong might simply be using a larger LM, since KenLM is a compact format?

@glynpu
Copy link
Collaborator

glynpu commented May 17, 2022

One possibility: Liyong might simply be using a larger LM, since KenLM is a compact format?

I am using this one downloaded by torchaudio. https://github.com/pytorch/audio/blob/8fd60cc89fb0973c10b1c37ef77f0f22ddd47bd0/examples/asr/librispeech_ctc_decoder/inference.py#L19

After checking the config, I was using a larger LM, with 23G(mine) vs. 4.1G(downlowned from https://www.openslr.org/11/) .
/ceph-data2/ly/kenlm/train_lm/train.arpa

@danpovey
Copy link
Collaborator

that is probably converted from the 4-gram.arpa.gz downloaded from here https://www.openslr.org/11/
but IDK if that is what fangjun is using?

@csukuangfj
Copy link
Collaborator Author

that is probably converted from the 4-gram.arpa.gz downloaded from here https://www.openslr.org/11/ but IDK if that is what fangjun is using?

I have tried the 4-gram and 3-gram that are used by the conformer_ctc setup.

"3-gram.pruned.1e-7.arpa.gz",
"4-gram.arpa.gz",

@glynpu
Copy link
Collaborator

glynpu commented May 18, 2022

Here is the Larger arpa I am using trained by myself.
You can try it if you want. @csukuangfj
/ceph-data2/ly/kenlm/train_lm/train.arpa

@ezerhouni ezerhouni mentioned this pull request Jun 13, 2022
4 tasks
@ezerhouni
Copy link
Collaborator

@csukuangfj I am testing this branch on my machine and I am not seeing the same results. Could you tell me which models are you using (epoch and average) ? Thank you !

@csukuangfj
Copy link
Collaborator Author

@ezerhouni
Could you try https://huggingface.co/csukuangfj/icefall-asr-librispeech-pruned-transducer-stateless3-2022-05-13 ?

I just uploaded two new checkpoints to the exp directory.

Screen Shot 2022-07-08 at 08 07 12

@ezerhouni
Copy link
Collaborator

@csukuangfj I am not seeing them
Screenshot 2022-07-08 at 08 44 33

Last commit is from 21 days ago (the yesterday commit is only about modifying the readme)

@csukuangfj
Copy link
Collaborator Author

csukuangfj commented Jul 8, 2022

@ezerhouni

Sorry. Please check again.

Screen Shot 2022-07-08 at 2 48 51 PM

@ezerhouni
Copy link
Collaborator

@csukuangfj Thanks ! I just test it. I got the same result as yours for the test-clean (i.e ngram_lm_scale_0) is the best, for test-other I am getting slightly better :

For test-other, WER of different settings are:
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.05	4.85	best for test-other
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.1	4.85
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.01	4.88
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.02	4.88
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0	4.89
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.02	4.95
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.05	5.02
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.3	5.02
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.1	5.16
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.5	5.3
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.8	5.53
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.2	5.6
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.0	5.61
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_1.5	5.76
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_2.5	5.89
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_3	5.94
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_-0.5	6.63

I will try to work on it a bit in the coming days (if I can find some spare time) and I will let you know if we can improve the results

@csukuangfj
Copy link
Collaborator Author

1.0_ngram_lm_scale_0.01 4.88
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0.02 4.88
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature_1.0_ngram_lm_scale_0 4.89
beam_4.0_max_contexts_8_max_states_32_num_paths_200_nbest_scale_0.5_temperature

Thanks!

@ezerhouni
Copy link
Collaborator

I think this PR can be closed

@csukuangfj csukuangfj closed this Jul 18, 2022
@csukuangfj csukuangfj deleted the rnnt-lm-rescoring branch July 28, 2023 02:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants