You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to migrate my ASR model from OpenSeq2Seq decoder to Flashlight. Currently, I am using Nemo Conformer large as the acoustic model, which is tuned on my data. I also use KenLM as the language model, which is trained using this script provided by NeMo. However, OpenSeq2Seq decoder does not support BPE tokens, so subwords are mapped to chars and KenLM is trained on these chars.
To decode log_probs from Conformer, I am using the following code:
tokenizer = load_sentencepiece_tokenizer("tokenizer.model")
tokens_dict = Dictionary("encoded_decoder_vocabulary.txt") # vocab with chars (encoded subwords), 1024 chars in total, blank token not included here
sil_idx = 262 # ▁ token
blank_idx = 1024
lm = KenLM("kenlm.bin", tokens_dict)
transitions = np.zeros((tokens_dict.index_size() * tokens_dict.index_size()))
options = LexiconFreeDecoderOptions(
beam_size=128,
beam_size_token=1024,
beam_threshold=500.0,
lm_weight=1.0,
sil_score=0.0,
log_add=False,
criterion_type=CriterionType.CTC
)
_decoder = LexiconFreeDecoder(options=options,
lm=lm,
sil_token_idx=sil_idx,
blank_token_idx=blank_idx,
transitions=transitions)
def decoder(log_probs, log_probs_len):
# log_probs shape: batch, time, dictionary (1025 = 1024 tokens + blank)
# log_probs_len: batch
decode_result = []
for i in range(log_probs.shape[0]):
decode_beam = _decoder.decode(log_probs[i].numpy().ctypes.data, log_probs_len[i], log_probs.shape[-1])
tokens = [t for t in decode_beam[0].tokens if t < 1024]
decode_result.append(tokens)
return decode_result
However, I am getting a minimal WER value of 11.02 with this code, while with the OpenSeq2Seq decoder, I am getting a WER value of 5.23. Can you please help me identify what I am doing wrong?
Also, is it okay to use a model trained without a sil token and use a zero score for it? Moreover, according to the documentation found here, the lexicon-free decoder should use the defined wordseparator or runs with --usewordpiece=true. However, I couldn't find such parameters in the Python bindings. Should I define this parameter in my case, or is there an alternative in the Python bindings?
Thank you.
The text was updated successfully, but these errors were encountered:
Hello,
I am trying to migrate my ASR model from OpenSeq2Seq decoder to Flashlight. Currently, I am using Nemo Conformer large as the acoustic model, which is tuned on my data. I also use KenLM as the language model, which is trained using this script provided by NeMo. However, OpenSeq2Seq decoder does not support BPE tokens, so subwords are mapped to chars and KenLM is trained on these chars.
To decode log_probs from Conformer, I am using the following code:
However, I am getting a minimal WER value of 11.02 with this code, while with the OpenSeq2Seq decoder, I am getting a WER value of 5.23. Can you please help me identify what I am doing wrong?
Also, is it okay to use a model trained without a sil token and use a zero score for it? Moreover, according to the documentation found here, the lexicon-free decoder should use the defined
wordseparator
or runs with--usewordpiece=true
. However, I couldn't find such parameters in the Python bindings. Should I define this parameter in my case, or is there an alternative in the Python bindings?Thank you.
The text was updated successfully, but these errors were encountered: